Canny based Segmentation

Posted: February 24, 2012 at 5:00 pm

Following from my previous post on the temporal instability of mean-shift segmentation I’ve been looking at the feasibility of using an edge detector to do segmentation. A simple test with Canny() showed that the edges are very very stable over time. So I went ahead and used standard OpenCV methods, using findContours() and approximating them into polygons. At this point I realized a familiar problem has returned. The vast majority of segmented regions are not regions at all, but the empty space between regions. Following is an image of the Canny output (with some morphology operations to reduce noise):

From this image things look pretty good, the regions look fairly well defined except where motion blur softens edges, eg. the fore figure. So I went ahead and converted these contours into abstracted polys using approxPolyDP(), and filled them:

Yes, you read correctly, these are filled polys. Some experiments with flood fill confirm it, with complex natural images like these, there is little chance of getting closed contours. Most contours are simply isolated arcs separated by numerous gaps. Following is another image of the abstracted and filled polys, but this time filtered so that only those with an area greater than 200 are drawn:

At least we’re seeing some of the fills, but again the fact that these polys are lines and not regions is clear. Perhaps the fill would be better if drawContours() was not intelligent about holes and intersections. Either way the result is very clear, breaking an arbitrary natural image into regions that are temporally stable is not trivial.

I only see a few other fallback plans, (1) looking into other methods (likely these are current and therefore slow and tricky) for natural images, (2) using some method to automate grabcut or watershed segmentation (though there is no guarantee these will perform any better, or be temporally stable), or (3) give up on the background entirely and use a background subtraction method. The biggest issue with 1-2 is that they are totally unknown and may be bleeding edge. I am familiar enough with (3) to know that with such a complex background there will still be segmentation problems and background / foreground interactions that will result in some lack of temporal stability. Also with (3) the foreground itself will need to be segmented, and therefore all these same problems will simply reappear. Not to mention the massive decrease of image area fed into the system.

One of the only aspects I can think of that may help the project is the fact that the perceptual system need not match a human perceptual system. In short, what it decides an object is may not match what we think an object is. It just has to have stability over time in terms of its features.