Clustering Test 5 (10,000 Frames)

Posted: April 4, 2013 at 11:39 am

The long-term clustering test was successful. This is the first time in this project that I’ve gotten clustering code to deal with the massive amount of data from real-world frames. These 10,000 frames represent about 3 hours ending near dusk. Following is a set of plots that show the behaviour of the system over time, images follow.


The fields are as follows and are recorded for each frame processed (the x axis):

  1. memoryUsage is the amount of memory used by the application, in megabytes.
  2. numPercepsBG is the total number of background clusters in the system. It stabilizes because it reached the max number of clusters (set to 1000 in this case).
  3. numPercepsFG is the total number of foreground clusters in the system. It does not stabilize because foreground percepts accumulate much more slowly, and never reach the 1000 cluster limit.
  4. numScratchPerceptsBG is the number of background regions segmented for this frame. The general decrease is likely due to the shift toward dusk where the lack of visual contrast leads to fewer distinct segmented regions. The number of background regions segmented for each frame is quite stable.
  5. numScratchPerceptsFG is the number of foreground regions segmented for this frame. It lacks stability because many frames don’t contain foreground percepts.


At the end of processing the system had 1020 background percepts and 320 foreground percepts. The upper limit for the number of clusters is exceeded because whole frames are processed at a time. Even if the limit is exceeded, the remaining regions in the current frame are still segmented and stored. The plots above show the distributions of the number of merges for foreground and background percepts. The number of merges correlates with the number of constituent percepts in that cluster. The max number of background merges was 5642, and mean was 597. Foreground percepts don’t occur constantly, tend to have a lot of transformation between frames, and may only be present for a very small number of frames (due to the 1fps capture). The result is few merges compared to the background clusters. Still the majority of the foreground percepts were merged once. A very small amount were merged twice. Percepts are kept in memory for possible future merging until the max cluster limit has been reached, since it had not in the case above, there were unmerged (numMerges = 0) percepts still in memory.

I had code in this test to dump all the percepts out so they could be looked at individually, to get a sense of their variation. Unfortunately a bug prevented them from being written to disk. This is resolved for the next test, which will not be fed dusk-nearing frames, the cluster limit will be increased to 2000, and 20,000 frames will be processed.

Following are a number of selected frames from the test, as the video is 1.2GB. The quality is quite low due to the video encoding. Large regions in the video appear and disappear and this is expected to be a result of 1000 clusters being too few, and the very simple method of rendering only those percepts that were merged in the current frame. Note this section of the video is nearly the worse case scenario, where the sun is facing the camera (we see the reflection of the camera in the window here). This causes the trees to cast shadows into the bulk of the visual area, so foreground objects vary in light extremely between frames. Toward the end of the test most objects show a silhouette effect and thus contrast is greatly reduced. Eventually I’ll have to do a test of a full day-night cycle and see how many percepts are required and how the transitions to darkness look. Considering the prominence of the white areas I should consider a background. The white works fairly well during the day, but at night where percepts are solid black it may not be appropriate. Perhaps the mean colour of the input frame, or the background model, could serve as a background. This makes sense as we extend patterns in our memory to fill in spaces where we don’t have sensory information.

Grab1 Grab2 Grab3 Grab5 Grab6 Grab7 Grab8 Grab10 Grab11 Grab12 Grab14 Grab15 Grab16 Grab17