Dreaming Machine: Perception - Synthesis
Technical Algorithm: (including mapping!, Go through it with Steven)
Define ROI (background subtraction) fixed aspect ratio (Object-centric approach )
A non-object centric approach would be simply to feed some abstraction of the context into the SOM, for example a hist of the whole image, or some other statistical summary of the whole image) Is this important? The objects already contain a small amount of context from the cropping, is this sufficient? What does the science say about this combination of object + context considering the new ventral/dorsal model?
Which image stored has the most similar edge detection & histogram.
(Should the position of object (CoM?) also be fed into the SOM?, see above object-centrism vs statistical approaches )
- From dream theory (Tononi), The impossible does appear in dreams, so the position of these objects in dreams need not be related to their position as captured.
- Children do not have complex visual imagination skills, resulting in their dreams being largely static, they do not transform. (but images are constructed, just not moving images, need more details.)
- What is the best distance measure for hist + edge images?
The "context" (place) SOM for the dorsal stream could be trained simply on the origin and size of the crops, rather than the images (or both?). A proper learning method may that create a topological map of where objects are, that could be used in the attentional system (future camera direction?)
Can the machine only be "conscious" of objects, but intuitively aware of their positions? Does it make sense to "remember" the positions of all objects in memory?
After reading Goodale, it is clear that according to this theory the function of the dorsal stream is of little interest to this project. I wonder what aspect of this could be kept if I was to map my idea of the spacial/context processing from the dorsal stream to the hipposcampus?
- SOM alternative:
- If the distance between the new input and the BMU is greater than a threshold, then create a new memory unit, using the pattern of the input. If the distance is smaller, then update the BMU with the input. It could even be initialized with a single unit. Its codebook would always be the first input. For the next input the threshold would define if that unit is updated or if a new unit created.
- If even using a GSOM, the problem here is creating new images (new buffer slots?)
- Can a person dream of a context that fuses elements of other contexts? Could the "background" context of the dream be a combination of contextual images? Just as the foreground objects could be combinations of objects. (analagous to the "hybrid" characters? in dreams? For this system there are no people, so are objects and contexts its characters?)
- for each input image: (Retinal Stimulus)
- accumulate into background reference (contextual mapping)
- do background subtraction (Attention)
- get the coords of the bounding box around object. (Attention)
- crop image (Attention)
- abstract into edge + hist (Perirhinal representation)
- feed into clusterer (SOM?) (Perirhinal Perception and memory)
- accumulate images corresponding to cluster units. (Perirhinal Perception and memory)
- crop background reference (same attentional act as object focus) (
- record position + hist? (posterior hippocampal representation)
- feed into clusterer (SOM?) (posterior hippocampal perception and memory) Do I need a clusterer if I already know the exact position? Maybe just a topographical map?
- record one image & location for each attentional act (posterior hippocampal perception and memory)
Science Basis: (recheck this)
- The aim of this system is to collect sensory materials for the dreaming machine, but could also stand on its own as a meta-creative system capable of generating aesthetically interesting images.
- Structure of the visual system (Horton & Sincich, 2004)
- Retina -> Primary VC
- Spatial mapping w/ little lateral inhibition (Hubel and Wiesel, 1979, 2004)
- Cortical columns are tuned to contrasting edges of various lengths and orientations. (Hubel, Wiesel & Stryker, 1977; Martinez & Alonso, 2003)
- VC as the diffusion (rather than abstraction) of data. (no ref)
- Primary VC -> Secondary VC
- Secondary cortex is more specialized as to how it responds to stimulus with specific features. (no ref, Pinel?)
- Ventral Secondary VC (TEO/ TE?) -> Medial Tempral Lobe (MTL)
- Integration of perception and memory.
- Segregation of spacial and object oriented visual processing.
- Application of current models in art.
- Mapping between elements of algorithm and visual system. (Added mapping to algo discussion. What is here that still needs to be said?)
- The system's sensory apparatus is a fixed monoccular camera that does not have a foveal/periphery distinction.
- The image supplied by the camera is the stimulus. It is considered analogous to the signal sent from the retina.
- The retinal-geniculate-striate system is completely absent from the system. Images are perfectly recreated at the primary visual cortex.
In biological systems saccades of the fovea are combined in short-term buffer which stores an image combining details of the multiple sensory impressions (signals) from the retina. In this system every image presented is accumulated in a buffer. This buffer stores the background from which objects are extracted. This buffer adapts slowly to absorb changes in the environment. This seems quite different than a biological visual buffer.
- Early VC processing is implemented in opencv using simple adaptive background subtraction.
This results in passing both the stimulus and the background reference onto the ventral stream.
This buffer is used in the selection of objects through background subtraction.
Background subtraction selects objects from the visual field. These objects tend to be moving, as static objects are ignored due to the adaptive background. The purpose of the background selection is to focus attention on a particular objects. The input images are cropped around these object. This process is analogous to a fixation of the fovea on an object.
Object representations are created and stored in an area corresponding to the perirhinal cortex
As each image is cropped, the location of the object is associated with a corresponding crop of the adaptive background. This data is clustered, and corresponds to the dorsal stream, which corresponds to a spacial map of memory.
Each cropped object is abstracted into a RGB histogram and edge detection. This abstraction is used to cluster object seen in various images/stimuli. This corresponds to the ventral stream of the visual cortex where images are abstracted into classes of objects.
Objects and places are stored in separate memory systems. (in ventral and dorsal streams)
- Why not more biologically inspired computer vision?
- The inherent parallelism of visual processing in the brain is not suited computational implementation.
- OpenCV makes traditional computer vision algorithms accessible to artists, in particular opencv hooks for max/flash/pd/python etc..
- This project is meant to be a component of a larger project "Dreaming Machine" and therefore deeply biologically plausible techniques are inappropriate both because of the contradiction between serial and parallel processing, and due the non-real-time aspect of accurate biological simulation.
- The system uses mechanisms that are vaguely analogous to biological mechanisms. The edge-detection algorithm is the one included with opencv, and corresponds roughly to biological edge-detection techniques.
- The colour histogram is a fast statistical method to generate the gist colour values in an image and may be implicated in quick scene classification in biological systems. (Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention.)
- Future Work
- Dreaming machine will implement a model of dreaming where memory of objects and spaces will be combined into simple visual compositions
, inspired by the detrimental effect on dreams when lesioning broadman's area 40.
- Circadian model of sleeping, with visual light entrainment.
- A pan/tilt camera would allow a greater variation of objects and contexts fed into memory.
- A continuous online learning system that provides topological organization of memories. (SNN? ART? RBF?)
The role of context in object recognition
Oliva & Torralba, 2007
- Background elements provide useful information in object recognition.
- Contextual Influences on object recognition
- Objects in a familiar background are detected more accurately and quickly than objects in inconsistent scenes.
- "Average images aligned on a single object can reveal additional regions beyond the boundaries of the object that have a meaningful structure." p520
- The effects of context
- Components of context meaningful in object recognition:
- semantics, spacial organization, and pose (orientation)
- Context most significant in "glance" (<200ms exposure) object recognition.
- Framework of contextual influences: refs 11, 18-20
- The presentation of a context scene primes the mind-brain to consider subsequent objects in that context. The recognition of loaf of bread is faster when presented after an image of a kitchen counter, and slower when presented after a bass drum.
- This can also change the recognition of the background suggesting a "mutual" influence between contexts and objects.
- Implicit learning of contextual cues
- Human subjects have been shown to learn contextual relations between arbitrarily associated objects.
- The transfer of contextual cues is not impaired by stretching (ref 28) but significantly impaired by changes in viewpoint.
- factors effecting speed of recognition:
- co-occurrence (two objects in one scene) may happen at multiple levels:
- Kitchen predicts stove (macro)
- Nightstand predicts alarm clock (local)
- Associations can be definite or probabilistic
- Memory search vs Visual search
- Observers may act on an object in a consistent manner, or not. Q: What does this mean?
- Perception of sets and summary statistics
- If we consider object-object relations as context then we assume the atoms of Perception are objects.
- An alternative is the statistical properties (ie histogram?) refs 40-44
- mean size and variance of objects
- textural descriptors
- depth / perspective
- Could these be useful for DM? Is a combination of statistical and object-centric approaches make sense for this project?
- Ref 48 discusses computer vision based on the statistical approach.
- Global context: insights from computer vision
- Sliding/scaling window around image and determine if the target is in one of these windows. (ref 49)
- A major problem is compact representations of context suitable for computation.
- One method is the use of "aggregated statistics of low-level features" refs 11,18,50.
- These are "state-of-the-art" features of scene and context-based object recognition.
- scene recognition: res 50,52-56
- Contextual effects on eye movements
- Global scene directs the eye to target objects. Is this relevant when the movement is not purposeful (as in DM) or until DM becomes purposeful.
- Saliency maps are good to predict the chance of a viewer's fixation.
Object vision and spacial vision: two cortical pathways
M. Mishkin, L.G. Ungerleider, and K.A. Macko
- There are two hierarchical branches in the visual system, the ventral (occipital-temporal) and the dorsal (occipital-parietal).
- The ventral is associated with "object vision" and the dorsal associated with "spatial vision"
- Two pathways
- Links from the ventral stream project into the ventral temporal lobe (limbic system), and the ventral frontal lobe, "may make possible the cognitive association of visual objects with other events such as emotions and motor acts." p1
- The dorsal stream is critical for the "visual location of objects" (ref 40)
- Links from the dorsal stream project into the dorsal parietal, and dorsal frontal lobes "may enable the cognitive construction of spatial maps, as well as the visual guidance of motor acts triggered by the ventral pathway."
- Ventral is modality specific, frontal dorsal is multimodal (polysensory).
- Object Vision
- Extraction of qualities of stimulus for identification, and assigning it meaning through interaction with limbic (emotion?) and frontal cortex.
- Qualities: Size, colour, texture and shape
- TE as the integration of these features leading to object recognition.
- Removal of this area in monkeys lead to "...impairment both in the retention of visual discrimination...and in the postoperative acquisition of new ones." p2
- impairment is worse for memory, rather than perceptual ability. So this area is correlated with visual memory of objects.
- "...area TE contains the traces laid down by previous viewing of stimuli, and these serve as stored central representation against which incoming stimuli are constantly being compared."
- Spatial Vision
- Lesioning the dorsal stream in the parietal lobe produced impairment of not object recognition, but the ability to locate the recognized object in the visual field.
- "...[T]he posterior parietal cortex seems to be concerned with the Perception of the spacial relations among objects, and not their intrinsic qualities." p2
- Impairment of the landmark task was correlated with the size of the lesion area, not its location in the inferior parietal cortex. What does this mean for how spaces are organized?
- Foveal areas are more correlated with the ventral stream.
- Both foveal and peripheral areas are correlated with dorsal stream. Does this mean that just the object should be fed into the ventral, and object and context into the dorsal?
- Metabolic and anatomical mapping
- Objects in spacial locations
- How are these two streams integrated?
- limbic & frontal systems?
- Dorsal and Ventral Streams of the visual system:
- Objects vs Context
- Multiple channel abstraction -> Reduction of dimensionality -> Classification MRC
- Biological model of attention and basic abstraction:
- Significance of Context:
- Model Fitting (transforming images with same/similar features for registration) RANSAC:
- Octave Code: http://www.csse.uwa.edu.au/~pk/research/matlabfns/
- M.A. Fishler and R.C. Boles. "Random sample concensus: A paradigm for model fitting with applications to image analysis and automated cartography". Comm. Assoc. Comp, Mach., Vol 24, No 6, pp 381-395, 1981
- Richard Hartley and Andrew Zisserman. "Multiple View Geometry in Computer Vision". pp 101-113. Cambridge University Press, 2001
- Inspired by Hippocampus
- Inspired by Perirhinal Cortex
M.A. Goodale & D.A. Westwood
- Dorsal Stream: Visual control of action
- Ventral Stream: Perception of the visual world
- Visual perception also for visually guided motor control
- Vision for perception (seeing to see) vs vision for action (seeing to move)
- D.F. has a lesion in her ventral stream. She cannot see objects consciously, but when asked to reach and grasp she changes the size of her grasp to match the unseen object, and is able to grasp it.
- Different visual processing for perception and action
- Both dorsal and ventral streams process both the features and locations of objects.
- In the ventral stream object recognition is dependent on seeing the same object in relative terms (in different contexts) as the same.
- As in your cropping method.
- In the dorsal stream objects need to be seen in absolute terms (situated in context?) in order for their actual positions to be known.
- What is "egocentric coding"
- In an experiment the subject was told to grasp an object, the object was moved between saccades, and the subject reported conscious awareness that the object has not moved. During the reach the subject changed hand position to grasp the object in its new position.
- This appears to be old Liebnitz stuff, the idea that consciousness takes more time to process than action/reflex. The core of the argument is that you can have action w/out perception and perception w/out action, so they must be independent parallel processes. If we take (slow) consciousness out of the picture, what is the difference between visual processing for action and recognition.
- "...object-based perceptual machinery has to be initially engaged to parse the scene in which the action is embedded."
- If the ventral stream is required, then they are not parallel! Ref 34
- How can fast action visual processing depend on slow object vision?
- What is the activity in the dorsal stream during dreaming?
- Visual illusions: demonstrating a dissociation between perception and action
- Some visual illusions do not impact action vision.
- "...the visuomotor system computes absolute (i.e. Euclidean) object metrics, whereas the perceptual system utilizes scene-based (i.e. non-Euclidean) metrics."
- Hist (scene based?): ventral. edge (absolute position based?): dorsal?
- visual illusions effects early rather than late vision.
- What does "refractory" mean in this context?
- Visual illusions: refining the perception–action hypothesis
- Other illusions do effect action-vision.
- illusions arising from early vision should effect action and perception vision. Illusions arising from later vision would only effect perceptional vision.
- evidence that the subject remembers the features of the object in the ventral stream, which is used by the dorsal in motor control. That is dorsal processing may use ventral stream memory.