Clustering Update

Posted: April 24, 2013 at 9:10 am

After a few more short tests, it is clear that the spikes are actually caused by the clustering algorithm. Some additional testing showed that multiple percepts may have the same distance to their nearest cluster. The code to calculate the minimum distance assumed an upper limit of 100, since the initial plan was to use features normalized to 0-1. At some point I changed the distance function for foreground percepts to only use colour features, and since CIELuv distances are perceptually correct, there is no need to normalize them. The result is that distances could exceed 100, thus the calculation of the minimum is incorrect in some cases. While debugging I also found that the number of new units could exceed the number of scratch units. It is expected that this is due to the same scratch unit being merged with multiple clusters. I assumed that this was due to the false minimums, but after changing the upper distance limit to the max possible distance in Luv colourspace, the spikes in clustering time still persist. So the same scratch percept is merged in multiple clusters for some other yet unknown reason. The clustering method is BSAS, but only until we have gotten to the max number of clusters, at which case we’re using our own method similar to the SOM where a scratch unit (input) is merged with the closest cluster (Best Matching Unit), which may explain these problems. As we have a fixed number of clusters, and many inputs, the clustering algorithm used after the fixed number of clusters has been reached is even more crucial. The following plot seems to indicate that indeed the problem is caused when multiple clusters are updated by a single percept. Note the spike in “extraNewUnitsFG”, which is the number of updated clusters (numNewUnitsFG)  minus the number of scratch percepts (numScratchFG).