Dual SOM Continued

The dual SOM idea is good, but it turns out that the data just makes the SOM difficult to optimize. I’m not sure if I will be able to calculate a new second SOM fast enough to recalculate one for each dream.

Here is the memory field and U-Matrix for the primary (75×75) pixel SOM:



Here is the corresponding secondary (30×30 unit) SOM (trained over 15,000 iterations using linear decreasing functions):



Notice the organization of this second SOM (trained only on the images stored by the first SOM) gives an impression of the structure of the memory that is very similar to the SOM trained on the whole data-set using linearly decreasing functions.

So it appears this dual-SOM method gives static data-set quality results from a continuous feed of new data. The next question is how to train the SOM overĀ  15,000 iterations (or perhaps more) in under 1min. An approach is to store a concatenated histogram for each image stored by the first SOM, and train the second SOM directly on those hists. These hists could be stored as numpy buffers, and dumped directly from python into ann_som.

Another question is the dreaming. Now there are two SOMs, one that is highly organized, the other more like a staging area or very rough organization. Do dreams only propagate through the second SOM, or both? One SOM would provide a highly abstract free-association, where the second would be more concrete. Perhaps associations propagating through these two SOMs simultaneously could correspond to Gabora’s associative and analytical modes of thought.

Dual SOM

I’ve started exploring the idea of having two SOMs. The first SOM simply chooses which images should be stored, using a cyclic learning function. The second SOM uses a linear SOM and is trained on the (finite) results of the first SOM in order to make a highly organized map. The second SOM is fed with input data in random order, not the original order received by the camera. The first experiment used two SOMs of the same size (30×30 units). The problem was that the massive number of iterations possible in the first SOM just don’t scale to the second SOM (of the same size, but needing to optimize between dreams). The result is that the first SOM may have all its memory locations occupied, but the second SOM is unable to occupy all its locations. This is the case even when using the codebooks for the first SOM in the second.

Here is a U-Matrix of the hists of images stored in the first SOM:


The second SOM:


Here is an example of the second SOM with fewer units (15×15):


This shows the memory field for the above SOM:


I’m currently training a 75×75 unit SOM and will use that to feed smaller second stage SOMs.

Returning to the SOM

After capturing some test images from the motivated camera I’ve been working on the SOM structure. The quality of the SOM is very interesting (when using typical linearly decreasing learning and neighbourhood functions) when the camera provides images that are already in clusters. This cluster structure is complex enough that the resulting SOM is quite complex. Following is a representation of the memory field, its Umatrix and the Umatrix of the codebooks (neuron weights):


Umatrix of images stored in the memory field:


Umatrix of codebooks:


In comparison here are the memory field resulting from the same learning settings, except images are fed in their original order, and due to slow learning rate are run for 30,000 iterations:


This clearly shows that the order in which images are presented is highly significant in the foldedness of the resulting SOM. This is problematic considering the basis of the camera motivation is making sublte variations on the camera’s position, based on the visual scene. I could explore using very large multipliers in the motivation, but that would loose some of the quality of the camera following features of the scene. It would appear to just be randomly jumping between points. Another approach could be to use a two stage SOM. An initial SOM would simply store images (the number yet to be determined) as a first effort at organization. This would be highly folded, as seen above. The question is whether a second SOM, trained on only those images stored by the first SOM, would produce a more organized result. The first SOM would have to be trained on a cyclic function (to integrate new data), possibly a sawtooth function. The second would read the images in a random order and retrain between each dream. I wonder how quickly the second SOM could be trained. The size of both SOMs is also interesting. If the first SOM was larger (in terms of the number of units) and the second smaller, this could be an analog to longer and shorter term memory.

23,181 iterations of testing

After doing a full days testing on the camera motivation, I think things are close. In order to keep the camera from getting too lost in the small details, I’ve increased the multiplier (not the offset) for each step. The result is that areas of focus are quite large. Additionally I’ve added a reset so that when the motivation takes the camera to the edge of the visual field, a random pan/tilt is generated. This allows the path of the camera to search over the whole space with much better coverage. Out of the 23,181 iterations the camera motivation was reset 2394 times, representing only 10% of the camera movements. Although the goal was to remove the random aspect of the camera I’m quite happy with this direction. Perhaps another idea will come up in the future to remove this random requirement. For example the camera could be reset to the centre of the least dense area. This would require a much more complex statistical analysis of its motivational behaviour.

Here is a plot of the motivation paths of the camera during this test. The random movements have been removed.


As usual, the movements start in red and end in green.

Here is a 2D histogram of the density of the data. Notice the coverage over the visual field.


Here is the histogram overlayed over the visual field. Areas that are white are not visited often, areas that are visible are often visited (higher density).


The next step is to capture images from each location and see how the SOM responds to the non-uniformity. It seems clear the visible areas above (houses, trees) will take up much of the SOM, where other areas may not be represented (sky, grass).

A little less obsessed.

By increasing the multiplier and decreasing the offset I’ve got the motivation providing much better coverage, it still does appear to get stick in certain areas. Following is a 2D histogram showing the pockets of density in certain areas:


Here is the same representation of data from the previous post for comparison:


An idea is to have the camera jump to a random position when it reaches the edges, and perhaps when it has spent too much time in one area, to provide better coverage. It is unclear how the non-evenly distributed gaze will manifest itself in the SOM. The increased frequency in particular “areas of interest” (to the motivation) should result in clusters in the SOM corresponding to those areas.

Obsessive Motivation

This approach to motivation is more subtle than the first approach. Rather than fixing camera positions in a grid, and using the histograms to choose which grid position to move to next, this method uses the difference between the middle histogram and the LRTB (left, right, top, bottom) histograms to create a vector for the next move. The more different the edges are, the larger the movement. The result has a rather obsessive quality. The camera’s gaze tends to obsess about the details of a small area, and eventually (after an indeterminate time) move onto another region to obsess about. Here is a plot of the camera’s movement. It starts in red, and ends up in green:


Notice the clusters where the camera explores the small details of one region. The obvious colour shift in the second from the right cluster indicates the camera spent much more time in that area than in the other clusters. Here is a detail of that area:


Upon closer inspection it seems clear that this cluster is actually two clusters, the second of which (in green) is much more dense. The camera spent 2571 iterations in this cluster alone, where the total iterations was only 4274, representing approximately 60%. In the next run I’ll attempt to increase the likelihood of the gaze to escape these clusters by increasing the length of the vector.

This mock-up shows the path of the camera overlayed on the visual field. The gaze is clearly attracted to areas including many edges, and tend to escape when the vector is aligned with edges in the frame:


The First Motivated Camera

I’ve made my first attempt to remove the random control of the camera’s gaze. This approach is based on an analysis of the histograms of middle, top, right, left, bottom regions of the image. The x and y regions that are most different than the middle control the pan/tilt direction. This was done so that the camera moves over a fixed grid, so that locations that have already been visited could not be revisited. Even with this mechanism the gaze of the camera is highly looped and overlapped. It also tends to get stuck in certain areas. The following plot shows 2667 iterations:


The plot starts in the red area and ends up in the green area. The density of the camera’s fixation on certain areas is clear:


The upper right corner has been visited extremely disproportionately. This is even more extreme when the range of the camera (the area in which the camera is able to look) is included in the plot:



The next steps will be to give up on this grid-based approach and calculate a vector from the differences between the various histograms in order to point the camera in a new direction. Since this vector will contain some of the complexity of the image I hope it will not be as likely to get stuck in a certain area.