I ran a test training the MLP with data filtered such that at least one cluster must change state between subsequent frames. This time, the result is a periodic dream that does not stabilize over 10,000 iterations:
After meeting with Philippe, we decided that the static dreams could be due to the variation over time in the data-set being “seen” as noise in the data by the ANN, which is attempting to generalize over time. This would explain dreams being static, because they reflect the stability and ignore the noise.
The first step in examining this issue was to look at the stability of my data, which is actually highly stable over time. This is shown in the following plot, which shows the amount of change (in %) between subsequent frames in the foreground and background data-sets:
After training the MLP, I thought I should try running it in feedback mode to see if it would actually be predictive. It is not. There are some shifts in input patterns at the start of the dream, but it takes only 8 iterations to settle into reproducing a stable pattern. The following image shows the results of the first ~1000 iterations of the dream (truncated from 10,000 iterations). The left image is the raw output while the right is the thresholded version. Note the slight changes in output patterns early in the sequence (far left edge of both panels).
I’ve put my leaking code problems aside for now to continue working on the project, the next phase being the ML stuff. So I’m now using FANN because while OpenNN was nicer, more complete and active, it did not provide functions for online / sequential learning needed for this project.
This is my second attempt to train an MLP with plausible data produced by the system. The input is a set of 41,887 state vectors (representing the presence of clusters at each moment in time) produced by a previous run of the segmentation and clustering system. Each element in the vector is a Boolean value corresponding to each perceptual cluster: 0 when the cluster is not present in the frame and 1 when it is. For training, 0 to 1 values are scaled to -1 to +1. The previous attempt appeared to work because the output resembled the input, but I realized after running prototype feedback (dreaming) code that the network was trained just to reproduce the input pattern, not the next input pattern.
The MLP here is considered a canonical case to compare with future sequential learning and contained three layers (1026 input, 103 hidden, 1026 output), and was presented the whole input set over 50 epochs. The network was presented a single state at each iteration, not a window of states over time. The code is a modified version of the FANN xor_example.cpp and uses the “rProp” learning algorithm where weights are initialized with random values between -1 and +1.
Further testing has shown that there is still a leak in the clustering code, but I can’t figure out where it is. It does seem that cv::Mat is somehow at the centre of this issue though. I had previously noticed that I could stop the memory increase just by commenting out the mergeImages() code (that averages images for two percepts). This week I realized that I can also stop the memory increase if I don’t normalize the feature vector before clustering. Now this really does not make sense since I rewrote all the feature-related code to change from a vector of floats to a cv::Mat so I could use the opencv normalization functions. There is no code overlap between the previous version of normalize and this new version, and yet if I run normalization and clustering memory increases (red line), if I run either independently (blue and green) there is no memory increase:
I can’t recall why I thought this memory increase was solved before. I may have not been running the normalization code in that test, i.e. I was testing clustering with only foreground objects, where normalization is not needed due to perceptual distances in CIELuv.
After confirming no leaks in the segmentation code I reran the clustering unit test. Even over 1000 frames, this clustering code showed a significant leak: (The black line is a linear model fitted to the data.)
The leak was caused by the way I was replacing percepUnit instances with newly merged percepUnit instances. The solution was to change the mergePerceps() function from returning the merged percepUnit to modifying a percepUnit in place using pass by reference. A 10,000 frame test overnight has shown that there is no longer a leak in the clustering code:
I did notice some strange percepUnits while debugging the memory leak. I need to confirm that the new background segmenter is producing reasonable results before moving on. After that, the next steps are to merge all these changes from unit tests into the trunk code, and then I can begin rewriting the way threading and rendering is done. At which point I will have caught up with where I was at the start of the summer.
A test over this weekend showed that there is no memory leak or CPU time spikes in any of the segmentation code (the y axis is rss in megabytes):
CPU time is constant over this test, and we can see here that although there are spikes in memory usage, that there is no leak as a linear model of the data (black line) shows a decrease in memory usage over the whole test. The next step is to write a unit test for the clustering code to confirm there is no leak there, and then we can move onto writing new threading/rendering code and adding ML to the system!
In the last post I talked about a different segmentation approach rather than trying to figure out why the FloodFill() operation was using more and more time (Eventually getting to over 200 seconds per frame). A quick look at the Creativity and Cognition (C&C) version did not show any functional differences compared to the unitTest code. Of course the C&C version was crashing after about 24 hours, which likely was not enough to exhibit the problem. I took a few days to rewrite the backgroundSegmentation() function from scratch. In doing so I noticed a nice new function: meanShiftSegmentation(). The code is now much cleaner, but unfortunately it is not any faster than the old flood-fill version. This is partially because the segmentation is happening at the full 1080p resolution (not 1/4 the pixels as in the floodFill version). I should be able to get it under 1s with some optimization. The good news is that this segmenter works much better, there are no more “blind spots” as the segmenter breaks up the whole image and joins small regions into larger ones rather than leaving unsegmented areas. Here is an example of the segmentation results (in improperly mapped HSV colourspace):
After almost two months, and seeking the help of quite a number of people who know a lot more about C++ than me, the problem I’ve been facing with the memory leak has been (at least partially) resolved, thanks to this post. It turns out that this whole time the way I was measuring ram usage was incorrect (maxrss via rusage vs rss via /proc). Thus all memory plots for the whole life of this project were inaccurate. This does not effect the fact that the system did eventually crash after using too much memory, but at least it means I should be able to actually find the problem.
I am back in town for almost a week and still struggling with jet-lag. The conference went well, and I will post my paper as soon as the proceedings are on the ACM website. The papers in the conference leaned quite a lot to the HCI side, so I felt like my work was quite a bit at odds with the community. Still it was a good opportunity to meet people and see Australia for the first time.
I have been hurriedly attempting to fix some of these memory leaks before exhibition at the Creativity and Cognition conference happening next week. In lieu of any significant development progress, this is a grab of what the perceptual output currently looks like: (note the white background has been replaced by the background model.)
In preparing for the exhibition of a prototype of the system for Creativity and Cognition, I’ve been running the system with a live camera input. The good news is that the system is looking aesthetically interesting (sorry no screen-grabs currently available). The bad news is that this testing seems to have uncovered a memory leak somewhere, as shown in the continued increase of memory usage even after the maximum number of clusters have been reached for FG and BG:
After quite some frustration I have managed to move the segmentation and clustering code into a separate thread. This thread runs as fast as possible, and the main rendering process checks in on the thread between frames to see if new data (percepts) are ready to be rendered. The main thread then makes a local copy of the percepts and then renders them. The main reason for this change is that the rendering rate was always expected to be much faster than the expensive segmentation rate, and keeping it separate keeps segmentation from blocking rendering. Following is a plot from a ~90,000 frame test of the threaded code.
Following are images that represent the learning of percepts over time. Each pixel represents the mean colour of a percept at a particular time. The identity of the percept is the Y axis, while the X axis is time. These were the results from the last ~90,000 pixel test, with 2000 background and 1000 foreground clusters requested.
The top image shows the background percepts, while the bottom image shows the foreground percepts. The original images are 90,000 pixels wide, and are available here: background and foreground. We only begin tracking these percepts once the max clusters has been reached. The middle of the original images have been removed where the left sides of the images are the start of the clustering process where all new patterns are clustered with the nearest cluster. The right side of the images shows the same clusters at the end of the test. Note how they diverge from the start to the end of the test. It is presumed that the apparent regularity on the Y axis in the background near the start of the test is due to the repetition of similar background percepts in subsequent frames. This also explains the regularity of colour blocks of foreground percepts at the start of the test. As this set of percepts is constantly adjusted to represent a continuous flow of sensory information, their regularity is reduced to increase their density.
Whether I end up using an MLP, RL, or some other ML method, the representation of the clusters must be stable. That is, the index that represents percept A should represent percept A in all frames, even as A changes due to ongoing clustering. The first version of the code made new clusters from one cluster + one scratch (newly segmented) unit, and then appended the modified cluster to the end of the list. This is why the previous plots of the state vector were so uniform. I realized it was a quick fix to change this: when we modify a cluster, it replaces the cluster used to construct it. So this is the data that the ML would have to learn (background only):
It reminds me of the images of the Matrix…
In order to start working on prediction and learning for DM3, I’ve starting dumping some state data to see what it looks like over time. The idea is that learning will only happen once the max clusters have been reached for foreground or background, and there would be a different instance of the learner for foreground and background. This is because both RL and MLP methods require a fixed number of dimensions. Once the max clusters have been reached, all new percepts are merged with the nearest cluster, and thus after that point we can represent any image in the system as a vector of booleans where each element corresponds to one of the clusters. In the following plot, the rows represent each moment in time, while the columns represent particular percepts. Black percepts do not appear at that moment in time, while grey percepts do.
This is the first test showing an entire day-night cycle (approximately 24 hours). As expected, the plots show the number of percepts and processing time (segmentation + clustering) drops down to nearly nothing during the night:
I had a valuable two hour+ meeting with a two fellow students specializing in AI (Omid Alemi: Multi-Agent Systems, Graeme McCaig: Deep Learning) to discuss how RL could be applied in this project, and to solidify my vague thoughts on the notion. Consensus was that the fit between the problem (predicting what percepts will appear at the next state considering the current state) and RL is not straight forward. This is because RL is aimed at allowing an agent to make a sequence of optimal actions that maximize reward, which is given when actions result in desirable states. Core to RL is the notion of delayed rewards, where a current action can inherent some of the reward given by a future attained state. Even if the action itself does not lead to reward, if it leads to another state and another action that leads to a reward, then it is still a valuable action for this state. As RL is oriented towards an agent’s behaviour in a complex environment, it includes an explicit way of trading off between exploitation (do what you know is best) and exploration (look around to find a better way then you current think is best).
In our case for this project, the prediction of sensory information, rewards are actually the error between the prediction and the actual sensory information. The most reward possible is zero error, and that is known immediately. Additionally, the fact that rewards are immediate, and the greatest possible reward is a prediction identical to reality, the role of exploration is unclear. In a proper RL framework, the purpose of exploration is to find another causal path to greater reward, but if no such path exists, exploration serves no purpose. By the end of the meeting, a recording of which is available here (apologies for the poor quality), we had not reached consensus but had considered a few ways that RL could be applicable, and also the alternative of an MLP for the purpose of prediction. While I was convinced that the awkwardness of RL for this project meant that an MLP may be more appropriate, Graeme continued to consider the mapping between the system as currently conceptualized and RL, which we are currently discussing. Following is an attempt to list pros and cons for MLP vs RL for this project:
- Relation to psychological conditioning
- Model is implicit in policy
- Can be trained continuously
- Atypical Mapping (No environment, no delayed reward)
- Large state space
- Optimization during dreaming (pruning) does appear naturally. (Pruning of unlikely transitions?)
- Does the Markovian property hold for this context?
- Can the system imagine unseen patterns?
- Requires lots of training iterations
- Requires custom implementation (few choices for C++ libraries)
- Designed to learn patterns of vectors from supervised error.
- Dreaming stimulation would be feeding output back into input.
- Pruning could remove weak links (optimization during sleep)
- Lots of existing C++ libraries
- Possible over-fitting with continuous training?
- Unknown topology (tuning required)
- Can the system imagine unseen patterns?
Up to this point I’ve been considering ML methods in the context of the system learning to predict which percepts are likely to appear next. The predictive system is not so easy to excise from the previous conception of associative propagation. Three features of the system, as currently conceptualized, are not obviously mapped to RL:
- Continuity between perception / mind-wandering and dreaming (change in behaviour of a single system)
- Habituation was initially included to reduce over-activation. This is no longer an issue, but habituation does offer attention-like mechanisms and would allow a lack of novelty in stimulus to lead to mind-wandering.
- Pruning: During sleep shifts in activation that resemble shifts between REM and NREM sleep could allow weak links between neurons to be pruned, optimizing the network during sleep.
The most important is the continuity between waking, mind-wandering and dreaming. This is a key aspect of the theory proposed as part of this research. In the case of RL, waking perception would be the system learning what percepts are active and priming the percepts it expects to be next. Dreaming would be the exploitation of that learning. It seems that this would require a hard switch between learning and exploitation, which does not seem to indicate the continuity described in the theory.
As currently conceptualized, habituation reduces overall activation in the associative network, and this lack of activation initiates mind-wandering. When the background is highly static, the system would free-associate, but that free-association could be interrupted by non-habituated external stimulus. One proposal for the function of mind-wandering is to cause dishabituation. The theory being that when memory structures are over-activated a chance for the brain to rest can improve performance. Mind-wandering is considered a fusion of waking and dreaming states, where percepts are activated by both external perception and by internal mechanisms simultaneously, as modulated by overall activation. In the case of RL, its unclear what habituation would be, nor what a mix of exploration and exploitation would look like. It seems that it would require a feedback loop. During mind-wandering, the current state should be partly due to perception and partly due to its own prediction. Ideally the current state would be a sum of the current external state and the predicted state. The differences between mind-wandering, dreaming and perception would be modulations of the degree of impact of external stimuli on this feedback process.
It is unclear how RL learning would function considering feedback along these lines. While habituation is also unclear when it comes to the MLP, the feedback described above seems quite natural. There is even a class of ANNs that learn temporal sequences through recurrency, where the outputs are fed back as inputs. Perception would be the MLP learning to predict the next state where its input would be entirely dependent external sensory information. A dream would be the result of the output of the predictor being fed directly back into the input, which would cause a second output (and a second set of activations) and so on. This loop would clearly diverge away from the starting input and be a simulation of a sequence. It’s unclear to me whether the dreaming recurrent MLP would be able to construct new spatial arrangements of objects not seen in perception. Mind-wandering would be a sum of the MLPs output and external sensory input. I’m not sure how this would effect learning, but it provides nice smooth continuity between states as the effect of external and predicted activations are modulated according to habituation and the circadian clock. In fact, if habituation only effects rendering and the modulation of state, then it could be an independent system of RL or MLP and simply effect how percepts are rendered and the weight of internal and external stimulus on the predictor.
After this discussion and my thoughts above, it seems to me that an MLP is the most appropriate ML method for the predictor in this system. I’m still open to RL as an option, but as of yet it seems unclear what RL would offer for the system that the MLP would not.
This is the first long term test covering material over an approximately 12 hour period, shown in the following plot. In this case there are 2000 foreground clusters and 1000 background clusters. The mean rendering time is 2.6seconds with a variance of 0.28.
The spiking clustering time has been solved! The change in the code was to mark both clusters and scratch units for removal after a merge, whereas previously only clusters would be marked. This test had a very small number of clusters (less than the number of regions in a single frame) so that any issues with spiking clustering time would appear earlier. The following plot shows quite stable time for clustering and rendering, so we’re ready for some longer-term tests:
After a few more short tests, it is clear that the spikes are actually caused by the clustering algorithm. Some additional testing showed that multiple percepts may have the same distance to their nearest cluster. The code to calculate the minimum distance assumed an upper limit of 100, since the initial plan was to use features normalized to 0-1. At some point I changed the distance function for foreground percepts to only use colour features, and since CIELuv distances are perceptually correct, there is no need to normalize them. The result is that distances could exceed 100, thus the calculation of the minimum is incorrect in some cases. While debugging I also found that the number of new units could exceed the number of scratch units. It is expected that this is due to the same scratch unit being merged with multiple clusters. I assumed that this was due to the false minimums, but after changing the upper distance limit to the max possible distance in Luv colourspace, the spikes in clustering time still persist. So the same scratch percept is merged in multiple clusters for some other yet unknown reason. The clustering method is BSAS, but only until we have gotten to the max number of clusters, at which case we’re using our own method similar to the SOM where a scratch unit (input) is merged with the closest cluster (Best Matching Unit), which may explain these problems. As we have a fixed number of clusters, and many inputs, the clustering algorithm used after the fixed number of clusters has been reached is even more crucial. The following plot seems to indicate that indeed the problem is caused when multiple clusters are updated by a single percept. Note the spike in “extraNewUnitsFG”, which is the number of updated clusters (numNewUnitsFG) minus the number of scratch percepts (numScratchFG).
The increasing time to render each frame certainly slows down testing, and would cause future problems processing more than 20,000 frames. What I have done is rewrite the rendering code so that the segmentation uses RGB internally (rather than BGR), and put the draw function inside the percept class so that percepts for segmentation and rendering are stored in the same class. In the previous tests, rendering would take over 8 seconds per frame by 20,000 frames and apparently would continue to increase linearly. Rendering is certainly faster with the new implementation, but unfortunately the test did not get to the goal of 20,000 frames because it used too much memory to continue. Following shows the progress of this latest test…
Following are some thoughts on Reinforcement Learning (RL) in relation to the current conception of the project. Note this this assumes a basic introductory understanding of RL.
Following are a few plots of the performance of the last test (20,000 frames, 1500 clusters). The most important observation is that it is not at all the clustering that causes increasing processing times, but the actual rendering itself.
This second test went well, although the processing time for each frame shows what looks like it could be an exponential increase. I presume this is due to the comparison of each newly segmented percept to the set of all existing percepts:
The long-term clustering test was successful. This is the first time in this project that I’ve gotten clustering code to deal with the massive amount of data from real-world frames. These 10,000 frames represent about 3 hours ending near dusk. Following is a set of plots that show the behaviour of the system over time, images follow.
For those that are highly detail oriented you may notice that although all the previous clustering tests are supposed to be 300 frames long, but they are actually not. I spent a week finding the problem of why the openframeworks draw() method would not be called on some frames, and then get called again later, all at seemingly unpredictable times. Function by function I gutted my program until all that was left was an empty shell producing debugging output, and yet the problem persisted! At that point I was convinced this must have been a bug in openframeworks and filed a github issue. I thought I should test on another machine, I did so and the problem did not occur. Then I realized that draw() is not called when the window is minimized. My development machine is headless and I use it via an ssh X11 tunnel and vnc (to run with opengl). Turns out it was the screen-saver that was causing the problem: draw() would not be called when the screen-saver was engaged. So I’m back on track. Following is a 1000 frame test, as usual at very high resolution and with the original frames on the left. In this case I limit background percepts to 1000, so once 1000 percepts have been reached, then new segmented regions are merged with the nearest cluster, no matter how far away it is. I also ran a 10,000 frame test overnight; I’ll look at that data and summarize in a future post.
Following is a test of foreground clustering. In this case the only features being used are colour (mean L, u and v values), and the threshold of similarity is somewhat arbitrary. The first thing to notice is that the clusters are quite poor compared to background. This is because foreground objects change a lot in area, position, aspect, size, etc. which is why those features are not used to cluster them. Background objects are much more stable because they don’t move around much with the static camera. The frames in the test stream were captured once per second, so foreground objects close to the camera move a lot between frames. Resulting clusters then appear quite strained, the constituent patterns don’t reinforce but conflict, because there is such a high likelihood of drastic changes of the same moving object between frames. Combine that with the limited number of frames they are present, and we end up the video like the following, where sometimes only two frames are merged. Even if the threshold allowed more merges, they would likely be even more strained. Again this video is extremely high-resolution, and the percepts presented don’t get cleared when they disappear from the input, so they accumulate in image.
Following is a new video constructed from the same frames as the previous clustering test. The issue with the fine horizontal lines in masks was due to a bug in the segmentation code during some optimization changes I made while writing the paper for Creativity and Cognition. That has been fixed, and I have also switched to extracting pixels from the background model, rather than the current frame, so no foreground percepts are included in the background. There are still hard edges around some masks, which will eventually need to be dealt with.