After quite some frustration I have managed to move the segmentation and clustering code into a separate thread. This thread runs as fast as possible, and the main rendering process checks in on the thread between frames to see if new data (percepts) are ready to be rendered. The main thread then makes a local copy of the percepts and then renders them. The main reason for this change is that the rendering rate was always expected to be much faster than the expensive segmentation rate, and keeping it separate keeps segmentation from blocking rendering. Following is a plot from a ~90,000 frame test of the threaded code.
Following are images that represent the learning of percepts over time. Each pixel represents the mean colour of a percept at a particular time. The identity of the percept is the Y axis, while the X axis is time. These were the results from the last ~90,000 pixel test, with 2000 background and 1000 foreground clusters requested.
The top image shows the background percepts, while the bottom image shows the foreground percepts. The original images are 90,000 pixels wide, and are available here: background and foreground. We only begin tracking these percepts once the max clusters has been reached. The middle of the original images have been removed where the left sides of the images are the start of the clustering process where all new patterns are clustered with the nearest cluster. The right side of the images shows the same clusters at the end of the test. Note how they diverge from the start to the end of the test. It is presumed that the apparent regularity on the Y axis in the background near the start of the test is due to the repetition of similar background percepts in subsequent frames. This also explains the regularity of colour blocks of foreground percepts at the start of the test. As this set of percepts is constantly adjusted to represent a continuous flow of sensory information, their regularity is reduced to increase their density.
Whether I end up using an MLP, RL, or some other ML method, the representation of the clusters must be stable. That is, the index that represents percept A should represent percept A in all frames, even as A changes due to ongoing clustering. The first version of the code made new clusters from one cluster + one scratch (newly segmented) unit, and then appended the modified cluster to the end of the list. This is why the previous plots of the state vector were so uniform. I realized it was a quick fix to change this: when we modify a cluster, it replaces the cluster used to construct it. So this is the data that the ML would have to learn (background only):
It reminds me of the images of the Matrix…
In order to start working on prediction and learning for DM3, I’ve starting dumping some state data to see what it looks like over time. The idea is that learning will only happen once the max clusters have been reached for foreground or background, and there would be a different instance of the learner for foreground and background. This is because both RL and MLP methods require a fixed number of dimensions. Once the max clusters have been reached, all new percepts are merged with the nearest cluster, and thus after that point we can represent any image in the system as a vector of booleans where each element corresponds to one of the clusters. In the following plot, the rows represent each moment in time, while the columns represent particular percepts. Black percepts do not appear at that moment in time, while grey percepts do.
This is the first test showing an entire day-night cycle (approximately 24 hours). As expected, the plots show the number of percepts and processing time (segmentation + clustering) drops down to nearly nothing during the night:
I had a valuable two hour+ meeting with a two fellow students specializing in AI (Omid Alemi: Multi-Agent Systems, Graeme McCaig: Deep Learning) to discuss how RL could be applied in this project, and to solidify my vague thoughts on the notion. Consensus was that the fit between the problem (predicting what percepts will appear at the next state considering the current state) and RL is not straight forward. This is because RL is aimed at allowing an agent to make a sequence of optimal actions that maximize reward, which is given when actions result in desirable states. Core to RL is the notion of delayed rewards, where a current action can inherent some of the reward given by a future attained state. Even if the action itself does not lead to reward, if it leads to another state and another action that leads to a reward, then it is still a valuable action for this state. As RL is oriented towards an agent’s behaviour in a complex environment, it includes an explicit way of trading off between exploitation (do what you know is best) and exploration (look around to find a better way then you current think is best).
In our case for this project, the prediction of sensory information, rewards are actually the error between the prediction and the actual sensory information. The most reward possible is zero error, and that is known immediately. Additionally, the fact that rewards are immediate, and the greatest possible reward is a prediction identical to reality, the role of exploration is unclear. In a proper RL framework, the purpose of exploration is to find another causal path to greater reward, but if no such path exists, exploration serves no purpose. By the end of the meeting, a recording of which is available here (apologies for the poor quality), we had not reached consensus but had considered a few ways that RL could be applicable, and also the alternative of an MLP for the purpose of prediction. While I was convinced that the awkwardness of RL for this project meant that an MLP may be more appropriate, Graeme continued to consider the mapping between the system as currently conceptualized and RL, which we are currently discussing. Following is an attempt to list pros and cons for MLP vs RL for this project:
- Relation to psychological conditioning
- Model is implicit in policy
- Can be trained continuously
- Atypical Mapping (No environment, no delayed reward)
- Large state space
- Optimization during dreaming (pruning) does appear naturally. (Pruning of unlikely transitions?)
- Does the Markovian property hold for this context?
- Can the system imagine unseen patterns?
- Requires lots of training iterations
- Requires custom implementation (few choices for C++ libraries)
- Designed to learn patterns of vectors from supervised error.
- Dreaming stimulation would be feeding output back into input.
- Pruning could remove weak links (optimization during sleep)
- Lots of existing C++ libraries
- Possible over-fitting with continuous training?
- Unknown topology (tuning required)
- Can the system imagine unseen patterns?
Up to this point I’ve been considering ML methods in the context of the system learning to predict which percepts are likely to appear next. The predictive system is not so easy to excise from the previous conception of associative propagation. Three features of the system, as currently conceptualized, are not obviously mapped to RL:
- Continuity between perception / mind-wandering and dreaming (change in behaviour of a single system)
- Habituation was initially included to reduce over-activation. This is no longer an issue, but habituation does offer attention-like mechanisms and would allow a lack of novelty in stimulus to lead to mind-wandering.
- Pruning: During sleep shifts in activation that resemble shifts between REM and NREM sleep could allow weak links between neurons to be pruned, optimizing the network during sleep.
The most important is the continuity between waking, mind-wandering and dreaming. This is a key aspect of the theory proposed as part of this research. In the case of RL, waking perception would be the system learning what percepts are active and priming the percepts it expects to be next. Dreaming would be the exploitation of that learning. It seems that this would require a hard switch between learning and exploitation, which does not seem to indicate the continuity described in the theory.
As currently conceptualized, habituation reduces overall activation in the associative network, and this lack of activation initiates mind-wandering. When the background is highly static, the system would free-associate, but that free-association could be interrupted by non-habituated external stimulus. One proposal for the function of mind-wandering is to cause dishabituation. The theory being that when memory structures are over-activated a chance for the brain to rest can improve performance. Mind-wandering is considered a fusion of waking and dreaming states, where percepts are activated by both external perception and by internal mechanisms simultaneously, as modulated by overall activation. In the case of RL, its unclear what habituation would be, nor what a mix of exploration and exploitation would look like. It seems that it would require a feedback loop. During mind-wandering, the current state should be partly due to perception and partly due to its own prediction. Ideally the current state would be a sum of the current external state and the predicted state. The differences between mind-wandering, dreaming and perception would be modulations of the degree of impact of external stimuli on this feedback process.
It is unclear how RL learning would function considering feedback along these lines. While habituation is also unclear when it comes to the MLP, the feedback described above seems quite natural. There is even a class of ANNs that learn temporal sequences through recurrency, where the outputs are fed back as inputs. Perception would be the MLP learning to predict the next state where its input would be entirely dependent external sensory information. A dream would be the result of the output of the predictor being fed directly back into the input, which would cause a second output (and a second set of activations) and so on. This loop would clearly diverge away from the starting input and be a simulation of a sequence. It’s unclear to me whether the dreaming recurrent MLP would be able to construct new spatial arrangements of objects not seen in perception. Mind-wandering would be a sum of the MLPs output and external sensory input. I’m not sure how this would effect learning, but it provides nice smooth continuity between states as the effect of external and predicted activations are modulated according to habituation and the circadian clock. In fact, if habituation only effects rendering and the modulation of state, then it could be an independent system of RL or MLP and simply effect how percepts are rendered and the weight of internal and external stimulus on the predictor.
After this discussion and my thoughts above, it seems to me that an MLP is the most appropriate ML method for the predictor in this system. I’m still open to RL as an option, but as of yet it seems unclear what RL would offer for the system that the MLP would not.