AutoCropper plugin (1 Viewer)

ziphnor · August 15, 2006

jawbroken said:
Math is nice and all, but a picture is worth a thousand words:

But maybe a few words are still needed, what exactly are you showing, the suggested sample points?

If so, why make it so complicated? Whats wrong with just sampling the (or in the) middle column of the screen, hence avoiding logos in either corner.

jawbroken · August 16, 2006

Because the middle of the frame could be dark and hard to determine picture content. This way you get sample points across the frame, but weighted towards centre content, which gets sampled more. It is not as complicated as it looks, just a simple lookup table and a multiplication you have to do anyway to determine sample points.

The pictures show sample points. For example, if you pick a row in the first picture and scan across it, the points to sample are the points where you cross the blue line. I have given it a shot and it works very well for image content determination. I will give you some code, for an example.

http://pastebin.team-mediaportal.com/10378

So it works pretty much like your random sampling worked, except instead of random points you can just multiply samplePoints by the image width for all i and get a series of sample points that are well distributed for finding image content.

Perhaps you have a better solution for a good sampling pattern, I would be happy to hear of it.

Thanks,
Daniel

knutinh · August 16, 2006

What is the cost of doing a simple processing of the entire frame? If working on Y only that should be 576x720 8bit numbers to crunch.

If this is to expensive, what is the benefit of your method compared to doing a simple "cross" at width/2 then at height/2?

-k

ziphnor · August 16, 2006

jawbroken said:
Because the middle of the frame could be dark and hard to determine picture content.

Thats certainly correct, but i would tend to prefer waiting for the next frame instead. Usually the middle of the image contains the 'main' content. My main worry with your approach is that if the centre is too dark, and the logo is just above the same spot as the image content near the edge, wont you have trouble discovering the logo? Ie, if the image content is included you would include the logo as well. This is assuming a top down + bottom up scan, see below for discussion of pure bottom up.

Perhaps you have a better solution for a good sampling pattern, I would be happy to hear of it.

What is the cost of doing a simple processing of the entire frame? If working on Y only that should be 576x720 8bit numbers to crunch.

Currently im considering just sampling the whole bottom half of the image , and scanning the middle column at the top(keeping away from logos). Due too cache issues this might be well fast enough, and is easy to test before trying more complicated techniques.

Alternatively ive also considered scanning the whole image bottom up and trying to find the topmost edge of the image and stop there(of course geting the bottom most in the same scan). That avoids running into logos as long as there is at least some space between the letter box and the logo. In that scenario i would use either your sampling pattern or scan full lines.

I will simply try the simplest first and see whats efficient

As an update on the DirectShow stuff i have discovered that i apparently need a Transform filter(as opposed to a transform in place filter) if i want to crop the video directly. Programming this is slightly more difficult and im having some problems with it. I think im going to work a bit more with the inplace filter to get to grips with working with YUY2 encoded frames.

jawbroken · August 17, 2006

ziphnor said:
Currently im considering just sampling the whole bottom half of the image , and scanning the middle column at the top(keeping away from logos). Due too cache issues this might be well fast enough, and is easy to test before trying more complicated techniques.

Alternatively ive also considered scanning the whole image bottom up and trying to find the topmost edge of the image and stop there(of course geting the bottom most in the same scan). That avoids running into logos as long as there is at least some space between the letter box and the logo. In that scenario i would use either your sampling pattern or scan full lines.

The problem with this is that not all logos are at the top. In fact, in Australia, all logos are found in the bottom right. With my sampling pattern, on all the frames I have looked at, you sample at most 2 pixels from the logo (and only if it is very wide, your logos seem much wider than ours) so it is easy to discard logo sections, even with very simple threshhold/brightness testing.

I have thought of another way to help refine the guesses. In frames such as your 2film3 frame it is easy to discover the vertical bounds (how high the frame is) and the right bound, but the left bound is difficult because of low brightness. However, you can take advantage of the fact that frames are (in every case I have ever seen) always pretty much centred horizontally (not necessarily vertically). So if the distance from the left of the screen to the bound is very different to that of the right side of the screen to the bound (or either are undecidable from the current frame) you can just take the smallest distance and use that for the other side. This allows you to get much better frames in general.

I also have an easy way of picking up the tails on the 'g' and the like in subtitled frames, as they are easily missed at the moment. If you are interested I can elaborate.

It is up to you how you go about it, though, as you are the one doing the hard stuff like working on the transform filter.

ziphnor · August 21, 2006

Too bad about the lost posts in this thread

I thought id give my current status:

I changed to collecting histogram data for the YUV components. When collected the histogram allows me to easily compute the max/min, average and the variance. Its a bit faster on SD material, and probably alot faster on HD material(because the histograms are fixed size, 3 arrays of length 255).

I managed to get things running with subtitles being moved into the frame automaticly, it works very well except its a bit jittery because i havent implemented a check preventing very small changes in the bounding box.

My next move is to:

1. Clean up the code, seperating image analysis from subtitle transform etc
2. Implement an intelligent bounding box tracker such that the bounding box remains stable.
3. Only sample every frame if subtitles are to be moved.
4. Change to a more intelligent sample pattern as described above(though im still not convinced its a good idea to sample in potential logo areas).
5. Offer some kind of means to set parameters from MediaPortal etc.

When i get to 4, ill be back here to discuss the strategy, but it might be a while, i have a busy week.

jawbroken · August 21, 2006

Wow, sounds amazing. I am suprised moving the subtitles is up and running already (although I still have my reservations as a lot of subtitles I see overlap the picture a little if they get large enough).

I have been giving a lot of thought to the intelligent bounding box tracker and still haven't figured out a great way to do it yet, I hope you are having more luck. If only subtitles weren't so variant.

ziphnor · August 21, 2006

jawbroken said:
Wow, sounds amazing. I am suprised moving the subtitles is up and running already (although I still have my reservations as a lot of subtitles I see overlap the picture a little if they get large enough).

Its not perfect, but i rely on a black gap between the subtitles and the actual image. If no such gap exists i do not move the subtitles. This combined with a sanity check on subtitle areas seem to prevent most problems. The thing i fear most is one subtitle line in the black bar with a gap above it and one line completely in the image. In that case you might end up with two subtitle lines overlaying each other and its virtually impossible to tell the difference reliably.

It sounds better than it is, my code is a mess

I have been giving a lot of thought to the intelligent bounding box tracker and still haven't figured out a great way to do it yet, I hope you are having more luck. If only subtitles weren't so variant.

Im trying to to keep seperate tabs on subtitle lines and image lines. My current technique for subtitle lines is not very good. I was surprised that the U and V variance wasnt significantly lower in the subtitle lines. Otherwise i was hoping that the distinct component count for Y would be low(very few different shades). Right now it is 100% reliable for catching subtitle lines, but sometimes also catch image content as subtitles, meaning you have to do alot of sanity checks. Its a work in progress

The image detection is much more robust using variance and a few other parameters.

Basicly my plan is to track subtitle boxes and image boxes seperatly and react VERY slowly to reductions in subtitle size, while being somewhat faster to react when the actual image changes. In addition i will generally react much slower to smaller boxes, and immediatly to larger boxes.

ziphnor · August 29, 2006

Spent the last weekend on trying to get VMR9 to clip the video on my filters request, cant get it to work

I guess ill probably just end up scanning and modifying the frame and then have callback in MP that gets new bounding boxes. Im not going to waste more time on that, but i wont have time to work on this before next weekend

jawbroken · August 29, 2006

Perhaps someone with more DirectShow experience can chime in with some cropping information?

AutoCropper plugin (1 Viewer)

ziphnor

Retired Team Member

jawbroken

Portal Pro

knutinh

Portal Pro

ziphnor

Retired Team Member

jawbroken

Portal Pro

ziphnor

Retired Team Member

jawbroken

Portal Pro

ziphnor

Retired Team Member

ziphnor

Retired Team Member

jawbroken

Portal Pro

Users who are viewing this thread