Whether I end up using an MLP, RL, or some other ML method, the representation of the clusters must be stable. That is, the index that represents percept A should represent percept A in all frames, even as A changes due to ongoing clustering. The first version of the code made new clusters from one cluster + one scratch (newly segmented) unit, and then appended the modified cluster to the end of the list. This is why the previous plots of the state vector were so uniform. I realized it was a quick fix to change this: when we modify a cluster, it replaces the cluster used to construct it. So this is the data that the ML would have to learn (background only):
In order to start working on prediction and learning for DM3, I’ve starting dumping some state data to see what it looks like over time. The idea is that learning will only happen once the max clusters have been reached for foreground or background, and there would be a different instance of the learner for foreground and background. This is because both RL and MLP methods require a fixed number of dimensions. Once the max clusters have been reached, all new percepts are merged with the nearest cluster, and thus after that point we can represent any image in the system as a vector of booleans where each element corresponds to one of the clusters. In the following plot, the rows represent each moment in time, while the columns represent particular percepts. Black percepts do not appear at that moment in time, while grey percepts do. (more…)
This is the first test showing an entire day-night cycle (approximately 24 hours). As expected, the plots show the number of percepts and processing time (segmentation + clustering) drops down to nearly nothing during the night:
I had a valuable two hour+ meeting with a two fellow students specializing in AI (Omid Alemi: Multi-Agent Systems, Graeme McCaig: Deep Learning) to discuss how RL could be applied in this project, and to solidify my vague thoughts on the notion. Consensus was that the fit between the problem (predicting what percepts will appear at the next state considering the current state) and RL is not straight forward. This is because RL is aimed at allowing an agent to make a sequence of optimal actions that maximize reward, which is given when actions result in desirable states. Core to RL is the notion of delayed rewards, where a current action can inherent some of the reward given by a future attained state. Even if the action itself does not lead to reward, if it leads to another state and another action that leads to a reward, then it is still a valuable action for this state. As RL is oriented towards an agent’s behaviour in a complex environment, it includes an explicit way of trading off between exploitation (do what you know is best) and exploration (look around to find a better way then you current think is best).
In our case for this project, the prediction of sensory information, rewards are actually the error between the prediction and the actual sensory information. The most reward possible is zero error, and that is known immediately. Additionally, the fact that rewards are immediate, and the greatest possible reward is a prediction identical to reality, the role of exploration is unclear. In a proper RL framework, the purpose of exploration is to find another causal path to greater reward, but if no such path exists, exploration serves no purpose. By the end of the meeting, a recording of which is available here (apologies for the poor quality), we had not reached consensus but had considered a few ways that RL could be applicable, and also the alternative of an MLP for the purpose of prediction. While I was convinced that the awkwardness of RL for this project meant that an MLP may be more appropriate, Graeme continued to consider the mapping between the system as currently conceptualized and RL, which we are currently discussing. Following is an attempt to list pros and cons for MLP vs RL for this project:
Relation to psychological conditioning
Model is implicit in policy
Can be trained continuously
Atypical Mapping (No environment, no delayed reward)
Large state space
Optimization during dreaming (pruning) does appear naturally. (Pruning of unlikely transitions?)
Does the Markovian property hold for this context?
Can the system imagine unseen patterns?
Requires lots of training iterations
Requires custom implementation (few choices for C++ libraries)
Designed to learn patterns of vectors from supervised error.
Dreaming stimulation would be feeding output back into input.
Pruning could remove weak links (optimization during sleep)
Lots of existing C++ libraries
Possible over-fitting with continuous training?
Unknown topology (tuning required)
Can the system imagine unseen patterns?
Up to this point I’ve been considering ML methods in the context of the system learning to predict which percepts are likely to appear next. The predictive system is not so easy to excise from the previous conception of associative propagation. Three features of the system, as currently conceptualized, are not obviously mapped to RL:
Continuity between perception / mind-wandering and dreaming (change in behaviour of a single system)
Habituation was initially included to reduce over-activation. This is no longer an issue, but habituation does offer attention-like mechanisms and would allow a lack of novelty in stimulus to lead to mind-wandering.
Pruning: During sleep shifts in activation that resemble shifts between REM and NREM sleep could allow weak links between neurons to be pruned, optimizing the network during sleep.
The most important is the continuity between waking, mind-wandering and dreaming. This is a key aspect of the theory proposed as part of this research. In the case of RL, waking perception would be the system learning what percepts are active and priming the percepts it expects to be next. Dreaming would be the exploitation of that learning. It seems that this would require a hard switch between learning and exploitation, which does not seem to indicate the continuity described in the theory.
As currently conceptualized, habituation reduces overall activation in the associative network, and this lack of activation initiates mind-wandering. When the background is highly static, the system would free-associate, but that free-association could be interrupted by non-habituated external stimulus. One proposal for the function of mind-wandering is to cause dishabituation. The theory being that when memory structures are over-activated a chance for the brain to rest can improve performance. Mind-wandering is considered a fusion of waking and dreaming states, where percepts are activated by both external perception and by internal mechanisms simultaneously, as modulated by overall activation. In the case of RL, its unclear what habituation would be, nor what a mix of exploration and exploitation would look like. It seems that it would require a feedback loop. During mind-wandering, the current state should be partly due to perception and partly due to its own prediction. Ideally the current state would be a sum of the current external state and the predicted state. The differences between mind-wandering, dreaming and perception would be modulations of the degree of impact of external stimuli on this feedback process.
It is unclear how RL learning would function considering feedback along these lines. While habituation is also unclear when it comes to the MLP, the feedback described above seems quite natural. There is even a class of ANNs that learn temporal sequences through recurrency, where the outputs are fed back as inputs. Perception would be the MLP learning to predict the next state where its input would be entirely dependent external sensory information. A dream would be the result of the output of the predictor being fed directly back into the input, which would cause a second output (and a second set of activations) and so on. This loop would clearly diverge away from the starting input and be a simulation of a sequence. It’s unclear to me whether the dreaming recurrent MLP would be able to construct new spatial arrangements of objects not seen in perception. Mind-wandering would be a sum of the MLPs output and external sensory input. I’m not sure how this would effect learning, but it provides nice smooth continuity between states as the effect of external and predicted activations are modulated according to habituation and the circadian clock. In fact, if habituation only effects rendering and the modulation of state, then it could be an independent system of RL or MLP and simply effect how percepts are rendered and the weight of internal and external stimulus on the predictor.
After this discussion and my thoughts above, it seems to me that an MLP is the most appropriate ML method for the predictor in this system. I’m still open to RL as an option, but as of yet it seems unclear what RL would offer for the system that the MLP would not.
This is the first long term test covering material over an approximately 12 hour period, shown in the following plot. In this case there are 2000 foreground clusters and 1000 background clusters. The mean rendering time is 2.6seconds with a variance of 0.28.
The spiking clustering time has been solved! The change in the code was to mark both clusters and scratch units for removal after a merge, whereas previously only clusters would be marked. This test had a very small number of clusters (less than the number of regions in a single frame) so that any issues with spiking clustering time would appear earlier. The following plot shows quite stable time for clustering and rendering, so we’re ready for some longer-term tests:
After a few more short tests, it is clear that the spikes are actually caused by the clustering algorithm. Some additional testing showed that multiple percepts may have the same distance to their nearest cluster. The code to calculate the minimum distance assumed an upper limit of 100, since the initial plan was to use features normalized to 0-1. At some point I changed the distance function for foreground percepts to only use colour features, and since CIELuv distances are perceptually correct, there is no need to normalize them. The result is that distances could exceed 100, thus the calculation of the minimum is incorrect in some cases. While debugging I also found that the number of new units could exceed the number of scratch units. It is expected that this is due to the same scratch unit being merged with multiple clusters. I assumed that this was due to the false minimums, but after changing the upper distance limit to the max possible distance in Luv colourspace, the spikes in clustering time still persist. So the same scratch percept is merged in multiple clusters for some other yet unknown reason. The clustering method is BSAS, but only until we have gotten to the max number of clusters, at which case we’re using our own method similar to the SOM where a scratch unit (input) is merged with the closest cluster (Best Matching Unit), which may explain these problems. As we have a fixed number of clusters, and many inputs, the clustering algorithm used after the fixed number of clusters has been reached is even more crucial. The following plot seems to indicate that indeed the problem is caused when multiple clusters are updated by a single percept. Note the spike in “extraNewUnitsFG”, which is the number of updated clusters (numNewUnitsFG) minus the number of scratch percepts (numScratchFG).
The increasing time to render each frame certainly slows down testing, and would cause future problems processing more than 20,000 frames. What I have done is rewrite the rendering code so that the segmentation uses RGB internally (rather than BGR), and put the draw function inside the percept class so that percepts for segmentation and rendering are stored in the same class. In the previous tests, rendering would take over 8 seconds per frame by 20,000 frames and apparently would continue to increase linearly. Rendering is certainly faster with the new implementation, but unfortunately the test did not get to the goal of 20,000 frames because it used too much memory to continue. Following shows the progress of this latest test… (more…)
Following are a few plots of the performance of the last test (20,000 frames, 1500 clusters). The most important observation is that it is not at all the clustering that causes increasing processing times, but the actual rendering itself.
This second test went well, although the processing time for each frame shows what looks like it could be an exponential increase. I presume this is due to the comparison of each newly segmented percept to the set of all existing percepts:
The long-term clustering test was successful. This is the first time in this project that I’ve gotten clustering code to deal with the massive amount of data from real-world frames. These 10,000 frames represent about 3 hours ending near dusk. Following is a set of plots that show the behaviour of the system over time, images follow. (more…)
For those that are highly detail oriented you may notice that although all the previous clustering tests are supposed to be 300 frames long, but they are actually not. I spent a week finding the problem of why the openframeworks draw() method would not be called on some frames, and then get called again later, all at seemingly unpredictable times. Function by function I gutted my program until all that was left was an empty shell producing debugging output, and yet the problem persisted! At that point I was convinced this must have been a bug in openframeworks and filed a github issue. I thought I should test on another machine, I did so and the problem did not occur. Then I realized that draw() is not called when the window is minimized. My development machine is headless and I use it via an ssh X11 tunnel and vnc (to run with opengl). Turns out it was the screen-saver that was causing the problem: draw() would not be called when the screen-saver was engaged. So I’m back on track. Following is a 1000 frame test, as usual at very high resolution and with the original frames on the left. In this case I limit background percepts to 1000, so once 1000 percepts have been reached, then new segmented regions are merged with the nearest cluster, no matter how far away it is. I also ran a 10,000 frame test overnight; I’ll look at that data and summarize in a future post.
Following is a test of foreground clustering. In this case the only features being used are colour (mean L, u and v values), and the threshold of similarity is somewhat arbitrary. The first thing to notice is that the clusters are quite poor compared to background. This is because foreground objects change a lot in area, position, aspect, size, etc. which is why those features are not used to cluster them. Background objects are much more stable because they don’t move around much with the static camera. The frames in the test stream were captured once per second, so foreground objects close to the camera move a lot between frames. Resulting clusters then appear quite strained, the constituent patterns don’t reinforce but conflict, because there is such a high likelihood of drastic changes of the same moving object between frames. Combine that with the limited number of frames they are present, and we end up the video like the following, where sometimes only two frames are merged. Even if the threshold allowed more merges, they would likely be even more strained. Again this video is extremely high-resolution, and the percepts presented don’t get cleared when they disappear from the input, so they accumulate in image.
Following is a new video constructed from the same frames as the previous clustering test. The issue with the fine horizontal lines in masks was due to a bug in the segmentation code during some optimization changes I made while writing the paper for Creativity and Cognition. That has been fixed, and I have also switched to extracting pixels from the background model, rather than the current frame, so no foreground percepts are included in the background. There are still hard edges around some masks, which will eventually need to be dealt with.
The clustering code is working pretty well for background percepts. Following is a video that shows the raw frames on the left (in 720p) and the resulting clustered output on the right (also in 720p) through ~300 consecutive frames. Note the video is quite high resolution (2560×720) and best performance is likely attained by downloading (right click and “save video as”) and using a native video player. For each new frame all regions in the previous frame are compared and clustered: if they are sufficiently similar, then the corresponding regions in both frames are merged by averaging into a single percept.
I have integrated the existing segmentation code into openframeworks and also implemented a first version of the new clustering algorithm (based on BSAS). This clustering algorithm is quite simple and I’m currently using all features provided by segmentation — position (x, y), size (width, height, area), colour (mean of CIE L, u, v channels) — similarity being Euclidean distance in multiple dimensions. I have only tested thus far with background percepts, with the following results:
I just finished reading a new Hobson paper (“Waking and dreaming consciousness: Neurobiological and functional considerations”), which is an update on Hobson’s AIM model integrating Friston’s “free energy formulation”. The key points are that we can consider waking perception as a learning process where the difference between what happens and what is expected drives more accurate predictions. REM sleep and waking are contiguous processes, where the lack of external stimulus during REM means there are no prediction errors, which triggers dream experiences. In my reading it appears that Hobson proposes that visual images in dreams are the result of the ocular movements themselves (REMs) predicting visual percepts. Hobson proposes that dreams are of functional use because they manifest an optimization process: “one can improve models by removing redundant parameters to optimize prior beliefs. In our context, this corresponds to pruning redundant synaptic connections.”. In short, dreaming improves the quality of the predictive model of the world in the absence of sensory information, by pruning. (more…)
I met with Steven Barnes today to talk about the previous post and through discussion we clarified some of the points of the learning algorithm proposed earlier. Lets go through an example of three subsequent frames (t=1,2,3) in a perceptual case. For simplicity we are do not describe the clustering process here, though the algorithm may have unforeseen consequences in relation to clustering. The basic premise is that priming is predictive and therefore that future percepts are expected to be in the same context as current percepts. (more…)
The importance of simulation (prediction) in dreaming and mind-wandering literature should be integrated into the current conception of DM3. There are two aspects of continuing development: (1) The current propagation of activation for free-association is inherited from previous projects (MAM, DM1 and DM2) and not well situated in theory. (2) There is no feedback from the world that can be used as a reward that could be used to drive intrinsic motivation. (more…)