I have been able to run a dual SOM in a sufficient amount of time. The more memories are stored in the first SOM, the slower the training appears to be. This is not in terms of number of iterations but in terms of CPU time of accessing many more memory locations. Here is the U-matrix of the first SOM, once it has only been partially populated (3468 out of 5625 (75×75) units):
And the second SOM trained on those memories:
This second SOM was trained in 30s, over 15,000 iterations (2ms / iteration). When training this quickly the CPU usage is somewhat high, and does interfere with rendering. In order to see how feasible it really is I would need to integrate it into the DM system and see how it performs. One problem is that in the current DM system the pix_buffer is in the parent patch, and its the second patch that simply describes which memory location a particular input should be stored. In a dual-SOM the pix_buffer for the initial SOM will need to be in that second patch, and therefore would not be available to the first. A second pix_share could be used to send the data back to the parent patch, but its unclear if that would be fast enough. Following is the second SOM trained once the first SOM has associated all its units with images.
There are still lots of issues with making this approach work, but the quality of these second SOMs is so high that this may be worth it. Ideas to make it optimize faster:
- Add random code-books initialization to ann_som
- Try using a different numpy data type (rather than a python list) for faster iteration.
- Perhaps a C PD external that does the job of hist2numpy.py