Updates to Segmentation and Clustering

After implementing the predictor in the main DM program, I had the chance to run the and them dump percepts to give some form to the ML generated output previously posted. The results were quite weak. It appears that there is simply too much information to be encapsulated by the small number (~2000) of percepts in the system. The first issue was the way that percepts had a tendency to crawl in from the edges due to the centring of clusters. I resolved this by treating percepts on edges differently which merge while still being anchored to the edges. Additionally, the precept’s averaging of constituent regions were weighted to emphasize the most recent percepts (something like 75-85% weighting of current stimulus). This made percepts appear much more stable (over time) than they actually were. In short, a very unstable cluster was represented by a highly specified image. The idea with this was that the presentation of the percepts would be recognition, and in perception the display would show a reconstruction of sensory data from clusters alone. The result was very little correlation between the reconstruction and the sensory information:


This is supposed to be all background but includes the remnants of a foreground object (truck) making the slow transition into the MOG background model. These results have left me deciding I need to rethink perception and the nature of these clusters. The new idea is to accept that these clusters will be ephemeral, period. There is just not enough temporal stability for the small number of clusters to be concrete / photo-realistic. The literature on the nature of dream images, as reported by lucid dreamers reporting while dreaming, does indicate they lack the crisp fidelity of images produced from external perception. In the perceptual case, the raw sensory image will serve as background while the ephemeral percepts will serve to modulate the appearance of the sensory data.

The result is still consistent with the model because the images are the result of both sensory information and constructive processes anchored in that sensory information. In addition, this unexpected appearance of foreground objects in background has lead me to drop the distinction. Now the method of segmenting the background model is applied to the live camera frame, no foreground or background differentiation. The following images show the results. The first image shows all percepts in the system after processing 10,000 frames, with a black background. The second image is the result of percepts after 1,000 frames which are reduced in opacity to 75% and over-layed on the most recently captured sensory image.



One final issue is that the lack of foreground separation means there is a lot more instability in clusters, as the MOG model acted to smooth our variation in individual frames, which seems to lead to a lot more variation in the number of segmented clusters. This has deeply increased the processing time for clustering, as apparent in the log of the 10,000 frame test above:


Though, I admit the increasing clustering time does highly resemble the increasing memory usage… 7.5s per frame is not workable, but I hope to get a faster machine that could help…