This test only lasted a few hours, and the weather was really terrible (constant heavy rain); following are the results.
Following are a few images I captured of the Dreaming Machine while uninstalling, and thus have quite limited variation. Night was artificially triggered by putting the lens cap on the camera. It was quite difficult to get an appropriate view for the camera, and this was the best possible from Zayed University (due to constraints on capturing images of women in U.A.E.).
While I had attempted to get real-time transition into the Dreaming Machine #3 software, I had a last minute issue with an interaction between system updates and performance. Rendering each frame went from 0.033 to 0.2 seconds, too slow to do a smooth transition between images generated in the thread. After spending a week trying to fix it, I gave up. So the version to be shown at ISEA will not have any transitions. The good news is that the shader-based renderer uses half the RAM, so I’ve increased percepts from 4000 to 6000 (more caused problems with the MLP). I also fixed crashes related to the IP camera being unreachable, so that should solve the issues that occurred during ACM Creativity and Cognition. Following is a selection of images produced by the system during testing using a live camera feed of my living room:
On September 9th, 2014 I successfully defended my PhD Dissertation, entitled A Machine That Dreams: An Artistic Enquiry Leading to Integrative Theory and Computational Artwork.
Some more work in progress on Watching and Dreaming, this time at the correct aspect ratio:
Some work-in-progress using the Dreaming Machine #3 system to generate imagery learned from Kubrick’s 2001: A Space Odyssey, tentatively titled Watching and Dreaming (2001: A Space Odyssey):
After trying to add a little bit of noise to the state of the inputs to the predictor, I noticed that the noise had no effect on the output. I thought there was an issue with my implementation of the feedback mechanism, so I rewrote it. The behaviour did not change. This must be due to the noise tolerance of the MLP. I ran a test last night where I inserted, every 50 iterations, a dense random vector (random selection of which clusters are present or not) which was ORed with the previous network output before feeding back. The result is that the network does clearly change the network behaviour, but it only does so for 1-2 iterations before the network stabilizes again to a static / periodic / complex pattern.
The complex behaviour seems quite common, but the problem is that those percepts that are present at one time step, tend to either stay present (static), turn on and off (periodic) or exhibit some other apparently chaotic behaviour (complex), while those percepts that are not present, tend to stay non-present. Thus a small set of percepts are activated in a complex pattern in feedback, but that does not seem to result in the activation of percepts that were not present earlier. In short, dream activation seems highly constrained to the latent perceptual activation that initiated it.
So it seems the idea of injecting periodic Boolean noise is a non-starter because it seems in order to elicit a small change in network behaviour, the randomness inserted would dominate the activation, and thus contrast highly with the behaviour of the network outside of that scope. There seem to be a few options: rather than injecting noise at the boolean level, add a little bit of constant floating point noise to the values after discretization. This means adding a new method to the predictor class that adds noise to the values fed to the network. I’m currently trying another idea where I shift (and wrap) the state vector by one unit each 50 frames. Since this is the same vector modified, it could cause more lasting change, would certainly have the same density of percepts as feedback, and would involve the activation of percepts adjacent to those previously activated. The latter point is an issue because there is no meaningful relationship between neighbouring percepts in the vector. Shifting the vector seems to have had no impact on the network’s output, it seems to be considered noise and ignored. Seems it is time to try implementing some continuous noise in the predictor class.
Following are some short videos that show some of the more interesting dreams generated by the system in the last test. They give a sense of the periodicity of some dreams and how dreams look with these very noisy percepts:
As refining the MLP beyond what it already does seems no easy task, I thought I would move to the previous problem: To make sure that there is enough temporal diversity in the percepts for the predictor to learn from longer temporal sequences. I’m running a proper test now, but following are a few frames selected from a botched previous test. The percepts are drawn on a black background, and there is no visual difference between perception, mind-wandering or dreaming.
Due to the results from previous posts, I thought I would try another approach: train the network by feeding it not the state at one moment in time, but concatenate multiple moments of time into a single vector such that the network has some history to learn from. In implementing this, I did not find any improvement over the old method: (PHASE2 is the old method, PHASE3 is the new method)
If the MLP is able to learn a sequence, and demonstrates that learning by producing the correct pattern for a particular input, then feedback should result in the network replaying the sequence. There is no difference between feeding the network state t+1 no matter where that pattern comes from. So why is the network apparently learning the sequence in the previous post, while feedback does not result in replaying the learned sequence?
In order to deal with the problem of static dreams Philippe asked me to create a synthetic data set that has particular temporal properties. The idea is that we can use it to get a sense of both the distribution of percepts over time and the resemblance / boundedness of dreaming and mind wandering compared to perception. The data-set is 2000 frames and contains three objects: a blue circle that gets bigger, a red circle that gets smaller, and a green rectangle that moves from the left to the right. The background toggles between white and grey every 200 frames. Additionally, there was a single all black frame at the end of the data-set to mark epochs. Following are the first and last frames of the synthetic data set, not including the trailing black frame:
The first test on the new machine went quite well. The system is now storing 4000 percepts, and seems to be performing quite well. Following is a dump of all percepts (stacked on top of each other) after ~90,000 frames:
Thanks to my supervisor I have a new faster shuttle to work with. The previous machine was almost 5 years old, and considering the heft of the computation involved in this project a faster machine goes a long way. The new machine is an Intel quad core with 32GB of RAM and a 3GB GeForce 780 graphics card (which was so big, it needed a little tweaking by the folks at CNS to fit, and involved removing the whole drive bay assembly, leaving SSD as the only storage option). Here is a comparison of performance between the two systems. (“micro” is the old shuttle, and “supermicro” the new one.)
This is the real-time (in seconds) for the OpenCV thread to process. Note the huge range in time between frames on micro, compared to the much more consistent and compact processing time on the new machine. The mean time per frame (during day frames) on micro was 0.58 and on super micro it is 0.21; An almost three-fold increase in performance. Now we’ll see about performance when maxing the number of percepts (up to ~4000 from 900) and filling up those 32GBs of RAM.
Some recent images generated by the system. This is a mix of perceptual and dreaming / mind-wandering processes. The perceptual images are generally brighter and more cohesive, as in they reflect a full frame with a lot of information. The dreaming / mind-wandering images are more fragmented. There are some exceptions, where the women walking by is actually perception. Note that perception can lead to impossible images, like the partially present car, cyclist and the man walking by who seems fused with a piece of a trunk. These are perceptual errors (visual illusions) that result from the constructive nature of perception. They are bizarre because of the very limited perceptual ability of the system.
I started dumping the most recent time (frame number) each percept was clustered to get a sense of the range of time encapsulated by the percepts. Turns out that the range is very small, in the last test the difference between the min and max times was 29, which represents only about one minute of real-time. So even if the predictor was making broad predictions, the precepts would not represent them! My intuition about this is that there are not enough percepts to represent the complexity of a real-world scene for more than a short period of time. Of course since all new percepts are clustered, this makes sense. Not doing so would mean blindness to the novel. Indeed since percepts are weighted clusters, so they hold more information than is represented in the time of the last clustering operation. Following is a composite of all the percepts after ~90,000 frames, which clearly appears to be fairly cohesive in time and lighting:
Now that all the system components have been implemented, I’ve finally had a chance to get a proper look at the system’s behaviour. Following are three images that show the display during perception, mind-wandering and dreaming:
While out of town I ran the new non-cropped code over the whole ~250,000 frame data-set. The results clearly show that keeping the percept images to a fixed size solves the memory leak problem. Unfortunately, the program crashed before writing the percepts to disk by attempting to load a non-existent frame past the end of the data-set. Thus, I have no idea what the 350 percepts generated by the system looked like by the end of processing.
I changed the percepUnit class so that percepts are stored at a fixed resolution (the full resolution of input frames). This way opencv does not try and reallocate any memory on merging percepts, and indeed my leak is gone. That is the good news. The bad news is that because of the size of the input frames (1920×1080) and the fact that all percepts segmented from a single frame hold their own copy of the same data, the memory usage is extreme. Before I could have probably been able to hold 3000+ percepts in memory, and now I can fit only 300. This makes sense since there are about 100 percepts per input frame. Following are the percepts after 25,000 frame test generating 200 percepts:
After the discussion in the previous post I took a look at the state data dumped by the program. While the waking and non-waking (dreaming and mind-wandering) states are clearly differentiable according to the quality of the state data (see this state plot), it seems they are not so easily to distinguish in terms of the number of activated percepts per frame. Actually it turns out that the distribution of the number of activated percepts per frame is very close in waking and non-waking cases. This indicates to me that the predictor is doing a good job learning, but that there is something missing that is manifest in the quality of dreaming and mind-wandering, which seems to be much more stable over time compared to waking. Following are histograms of the number of percepts active for each frame in 84990 and 256574 frame tests. In these tests the exogenous activation was disabled so we can look directly at the output of the predictor.
I’m currently running a long test of the longest contiguous part of the data-set, but in previous tests (with much shorter training periods) the dreams have been shown to be quite static. Following is a sample frame from one of these runs. In this case the dream is the sum of perceptual activation and predictor feedback:
Now that the prediction feedback mechanism and arousal have been written, I’ve been able to do some early tests to see what the system’s behaviour is like. Right now I’ve only been running short tests of one day/night cycle. So the degree of learning from the predictor is quite low.
This test is implemented as the system is expected to work, where ongoing external stimulus adds noise to the predictor feedback loop. The dynamics are quite simple now, where the three system states (dreaming, mind-wandering and waking) are all discrete and mutually exclusive. Mind-wandering and dreaming are both identical, where the current state of activation is the initial input to the predictor. While the system continues to mind-wander or dream, the next state is the predictor output combined with external stimulus activation. Mind-wandering is triggered by a lack of arousal (change over time) in external stimulus and dreaming is triggered by the circadian clock. As hard thresholds are used to trigger mind-wandering and dreaming, there are some oscillations between states due to the noisy arousal and / or brightness. Following is the system state plotted over time, the subtle dynamics are difficult to see because of the large number of frames.
In preparation for dreaming and mind-wandering states I was looking at the measures of system dynamics that could be used to trigger mind-wandering. At first I tried the number of clusters that changed state (present or not present in the current frame) to see if there was an increase in the number of changes where there was lots of activity. There did seem to be some link between the change in the number of clusters that changed state, but it also produced lots of spurious activity and showed activity when images appeared static. Since this is quite abstract information I tried to go down a level and track the sum of the distances between clusters and newly segmented pixel regions. After some examination, that seems an even worse indicator of the appearance of changes in the input frames. So I’m going down another level deeper and will just sum the values of pixels in the absolute difference between subsequent frames.
Now that it looks like segmentation and clustering are working, I’m starting to implement the system dynamics that will generate images in mind-wandering and dreaming. As a first step in this process I wanted to run the longest contiguous set of frames I have. Despite the memory leak persisting, the system was able to process this number of frames. Following is the debug output of the run:
I realized after so much work I have not been able to see how clusters behave over time. So I finally took a day to write a first crack at a rudimentary OpenGL renderer for DM3. Up to this point I was just using OpenCV functions to dump percepts and state data to disk, and then reconstructing images. In the final work, I’ll need to do this rendering anyhow, and it does seem to be working (i.e. working with the threaded application!). So I ran a 16,000 frame test where a perceptual frame was rendered for each input frame. The video runs at 30fps, but frames were captured at 1fps, so objects more really quite quickly.
The increasing jitter is due to the increasing amount clustering process where a finite number of clusters must reflect continuously shifting sense data, and the temporal instability of segmented region edges. The weights for clustering are such that new clusters are 25% new stimulus and 75% previous stimulus. Looking at this video it seems they should be even softer, I’ll next try 15% new and 85% existing. In this test the max number of clusters is 2000. The perceptual rendering is the unprocessed stimulus image in the background where the perceptual clusters are drawn on top at 75% opacity. Due to the jitter, this seems a bit too strong (too much emphasis on constructive perception), and could be reduced to 50%.
After some tweaks of the segmentation and clustering code, it seems we have something that won’t turn to mud after a few days. Consider the image from the previous post as reference when considering the following images:
This is the longest test in some time where the percepts are actually dumped to disk so we can take a look at them. Callgrind indicated that my inline weighting (just using the * operator on cv::Mats) was using 30% of the CPU of the whole program, switching to the addWeighted() function and other optimization got that 7s per frame time down to ~3s, making longer tests more feasible on this machine. The bad news is that the trend to more ephemeral clusters seems to continue, and after 100,000 frames all percepts are unreadable mud:
The idea for the fix is to switch from CIELuv to HSV and threshold masks so they only calculate the mean colour for a small area that corresponds to the most recently clustered mask. Currently, the raw mask is used, and is interpreted as binary, so its likely that most of the image is selected by the mask, increasing muddiness.
Over the weekend I ran a 30,000 frame test, thus far the longest test running the predictor and integrated segmentation and clustering system. The temporal instability has lead to many percepts ending up being extremely ephemeral. Following is an image that shows all percepts after 30,000 frames rendering on top of one and other with a white background:
After implementing the predictor in the main DM program, I had the chance to run the and them dump percepts to give some form to the ML generated output previously posted. The results were quite weak. It appears that there is simply too much information to be encapsulated by the small number (~2000) of percepts in the system. The first issue was the way that percepts had a tendency to crawl in from the edges due to the centring of clusters. I resolved this by treating percepts on edges differently which merge while still being anchored to the edges. Additionally, the precept’s averaging of constituent regions were weighted to emphasize the most recent percepts (something like 75-85% weighting of current stimulus). This made percepts appear much more stable (over time) than they actually were. In short, a very unstable cluster was represented by a highly specified image. The idea with this was that the presentation of the percepts would be recognition, and in perception the display would show a reconstruction of sensory data from clusters alone. The result was very little correlation between the reconstruction and the sensory information:
After the previous tests I’ve gotten a better sense of the prediction problem. We realized there may not be enough data in my previous tests to get sufficient training (corresponding to a few days of the full system processing). Additionally, I found a few issues with the segmentation code that could have changed the behaviour of clusters over time. I took the 6 days necessary to train a new data-set. The full data-set is composed of only the day time periods (including sunset and sunrise), and includes approximately 300,000 frames. In 6 days I processed approximately half the set, 150,000 frames. Note that this is actually significantly more data (for the same number of frames) compared to previous examples due to their inclusion of night frames. Following is the resulting error from same MLP learning procedure as used previously, presenting each pattern only once without repeated epoch training, and reporting error after each iteration: