AutoCropper plugin (2 Viewers)

ziphnor · August 9, 2006

One thing that has continually annoyed me in MediaPortal is the problem of handling 'fake' 16:9 material. Im talking about the kind of TV channel where they transmit a 16:9 movie as a 'letterbox', only utilising ~50% of the 576 PAL lines for the actual image. The Zoom mode in MP can show this correctly(by manual change) in SOME cases. However, in Denmark on digital cable tv most of the channels are transmitted with subtitles as part of the image. When showing letterboxed images, these subtitles are sometimes placed in the bottom-most black bar. As we keep the original voice track in Denmark, we want to avoid cropping the subtitles.

What im doing right now, is creating a bit of stand-alone C# code that can take a Bitmap and output a bounding box indicating the area that should be displayed. When done it will crop out logo's if placed in the upper black bar's corners, but retain subtitles.

That part i can figure out on my own, the next step is how to integrate this into mediaportal. I would need some way to get access to a single frame of the current video(preferably before it being upscaled to the display resolution), and afterwards a way to set the appropiate zoom. The first part could be achieved in a very primitive manner by taking a screenshot, but i hope it can be donein a more clever way. As to the second part, i dont even know where in the MP code to look

Could anyone give me some pointers, or perhaps even offer to help integrate this if i supply the code to detect the proper bounding box?

knutinh · August 9, 2006

You should definately join this thread:

https://forum.team-mediaportal.com/viewtopic.php?t=17576&highlight=

-k

ziphnor · August 10, 2006

I managed to get something decent working(still taking a bitmap file and outputting another with a bounding box indicated), i posted some sample results in the thread mentioned in knutinh's post.

I could really use some pointers on how i could integrate this with MP!

knutinh · August 11, 2006

"Not at all, its an embarrasingly simple approach. First of i rely on the assumption that logos will be either to left or top right and define an interval on the x-axis where i dont expect to find logos. In that interval i choose a number of random x values which are my samples points. I then start scanning from the bottom up. For each line y, i calculate the average R,G,B values of the samples points, and then sum over the sample points deviation from the average. This gives an indications of how different the image is at the sample points, so it will find consider any non-uniform line as image content. This catches most things, but not uniform white for example, so another check is added considering the max of any color component(R,G,B) seen by a sample point, if that is high enough it is also considered as content(would capture any kind of at least somewhat bright uniform background). The reason i dont just use the last method alone, is that very dark images will have low color components, but still have a variation i dark colors which will yield a variance from the average not seen in the black edges. Noise on analogue signals might confuse it though(i use DVB-C) since it can add 'snow' to the black edges.

In order to avoid considering VBI data or whatever that whiteblack line is in some of my screenshots as content, a line is only considered image content if the next line also fullfills this criteria.

For the above i think i used 20 sample points, meaning i sample 20*576 pixels in total. I havent optimized it yet, the annoying lookahead mentioned above causes some double-work right now(easily fixed). I could also reduce the scanned lines by ~50% by scanning bottom up, and then top down instead of going right through the image. Furthermore i have considered trying a binary search style scan, which i think would make it capable of running in real-time. I havent timed it as such yet, but will do."

Are the sample points random per line, or are you re-using the same pattern for all lines?

I see that just selecting random pixels reduce the number of instructions needed, but perhaps this situation is more cache-limited (a large video frame residing in memory), so one may process entire lines without much more cpu time?

In what way does your basic algo differ from, say, doing a histogram of intensity (discaring color information), and demanding that all pixels must be below a threshold?

regards
Knut

ziphnor · August 11, 2006

knutinh said:
Are the sample points random per line, or are you re-using the same pattern for all lines?

Its chosen at the beginning and is fixed. I did consider to make it random per line, but thought id try it without first. Actually i think it might be better to just use equi-distant points instead,

I see that just selecting random pixels reduce the number of instructions needed, but perhaps this situation is more cache-limited (a large video frame residing in memory), so one may process entire lines without much more cpu time?

I tried that first when i tried to just process brightness and it slowed things down alot. But that may have been because i was calling image.GetBrightness() while i am now relying on image.B/R/G.

In what way does your basic algo differ from, say, doing a histogram of intensity (discaring color information), and demanding that all pixels must be below a threshold?

Well, i suppose that for example a dark red-ish and dark blue-ish pixel might have the same intensity, which would land them in the same low intensity histogram 'bucket', while the above method would count such color difference along way towards being image content. My intuition was just to get a measure of the overall variance of the line(color included), as even very dark images will vary quite a bit, while the bars will have a more uniform dark.

My previous attempts tried to spot jumps in the average brightness of lines, and it failed miserably.

cheffe97 · August 11, 2006

Hi ziphnor,
very much appreciate what u do since I waited a looong time for this to happen.
You should definitely contact frodo or somebody else from dev team on IRC and discuss it there. I guess it's the faster method to get things integrated

hth
cheffe

knutinh · August 11, 2006

Perhaps frodo can give you a "hook" where you can work on YUV data? In that case, 8bit intensity data would be available for "free". Also, a single Y frame should be 1/3 the size of a combined rgb frame, so perhaps cache issues etc are better?

It is also interesting to discuss the multi-frame behaviour. A very dark frame would as you say not be detected as any format. Detecting "uncertain" conditions and keeping the last known "good" format would probably be simple.

But what if the user change channels into a very dark one? Then no history is available. And sometimes, formats change alle the time. Such as commercials and shows that use 16:9 letterbox as an "artistic" element. Should MP "lock on to" the format instantaneously, or should there be 500msec of delay to not give the user a head-ache?

In case there is a delay, then perhaps you only need to process every X frames, decreasing the avg load?

-k

ziphnor · August 11, 2006

knutinh said:
Perhaps frodo can give you a "hook" where you can work on YUV data? In that case, 8bit intensity data would be available for "free". Also, a single Y frame should be 1/3 the size of a combined rgb frame, so perhaps cache issues etc are better?

That might be the case, it would be nice to process entire lines instead. If a binary search style approach can be used, i am sure it will be possible to take whole lines.

It is also interesting to discuss the multi-frame behaviour. A very dark frame would as you say not be detected as any format. Detecting "uncertain" conditions and keeping the last known "good" format would probably be simple.

Its pretty simple to see when the approach fails as it will almost always find a way too small bounding box. When that happens i think as you say that its a good idea to just stick with the last detected format.

But what if the user change channels into a very dark one? Then no history is available. And sometimes, formats change alle the time. Such as commercials and shows that use 16:9 letterbox as an "artistic" element. Should MP "lock on to" the format instantaneously, or should there be 500msec of delay to not give the user a head-ache?

I am thinking that it will start by showing the whole image until it strikes a frame that can be used. After that it would sample once in a while and if discovering a larger box immediatly change it, but be more reluctant to decrease the bounding box(to avoid it jumping up and down due to subtitles). On the other hand this might lead to a delay in reducing the format again when coming out of commercials.

I think that there definetly should be a mode where the user hits a key to autodetect the format each time it needs to change. That can be made very reliable. We can then provide a full auto mode where we use the above ideas. I think its important to do it this way, because the full auto mode will probably be a bit experimental for a looong time, while the hit-key-to-autodetect can be reliable implemented in the short-term.

After all this will mostly be used in movies, so you just hit autodetect when the movie begins and leave it. THis will crop some commercials, but who cares

knutinh · August 11, 2006

Perhaps aquiering a "database" of boxed recordings from tv and running the algo on that would allow more rapid development than basing on user reports?

After all, a few 1000 frames of short snips of all kinds of letterboxed material could be automatically scanned, and the algo could leave a log of frames that didnt give good sense.

-k

ziphnor · August 11, 2006

knutinh said:
Perhaps aquiering a "database" of boxed recordings from tv and running the algo on that would allow more rapid development than basing on user reports?

After all, a few 1000 frames of short snips of all kinds of letterboxed material could be automatically scanned, and the algo could leave a log of frames that didnt give good sense.
-k

Thats an excellent idea, but then there is just the matter of actually acquiring it. MP really should start a centralized collection of sample DVB transport streams, they would be useful for several types of development(ie this and subtitles, correct identification of streams etc).

AutoCropper plugin (2 Viewers)

ziphnor

Retired Team Member

knutinh

Portal Pro

ziphnor

Retired Team Member

knutinh

Portal Pro

ziphnor

Retired Team Member

cheffe97

Portal Pro

knutinh

Portal Pro

ziphnor

Retired Team Member

knutinh

Portal Pro

ziphnor

Retired Team Member

Users who are viewing this thread