Dreaming Machine #3: Perception - Synthesis (IAT888)
Rough Ideas/Notes
Define ROI (background subtraction) fixed aspect ratio (Object-centric approach)
A non-object centric approach
would be simply to feed some abstraction of the context into the SOM,
for example a hist of the whole image, or some other statistical
summary of the whole image) Is this important? The objects already contain a small amount of context from the cropping, is this sufficient? What does the science say about this combination of object + context considering the new ventral/dorsal model?
Which image stored has the most similar edge detection & histogram.
(Should the position of object (CoM?) also be fed into the SOM?, see above object-centrism vs statistical approaches)
- From dream theory
(Tononi), The impossible does appear in dreams, so the position of
these objects in dreams need not be related to their position as
captured.
- Children do not have complex visual imagination
skills, resulting in their dreams being largely static, they do not
transform. (but images are constructed, just not moving images, need more details.)
- What is the best distance measure for hist + edge images?
The "context" (place) SOM for the dorsal
stream could be trained simply on the origin and size of the crops,
rather than the images (or both?). A proper learning method may that
create a topological map of where objects are, that could be used in
the attentional system (future camera direction?)
Can the machine only be
"conscious" of objects, but intuitively aware of their positions? Does
it make sense to "remember" the positions of all objects in memory?
After reading Goodale, it is clear that
according to this theory the function of the dorsal stream is of little
interest to this project. I wonder what aspect of this could be kept if
I was to map my idea of the spacial/context processing from the dorsal
stream to the hipposcampus?
- SOM alternative:
- If the distance between the new input and the BMU is
greater than a threshold, then create a new memory unit, using the
pattern of the input. If the distance is smaller, then update the BMU
with the input. It could even be initialized with a single unit. Its
codebook would always be the first input. For the next input the
threshold would define if that unit is updated or if a new unit created.
- If even using a GSOM, the problem here is creating new images (new buffer slots?)
- Can a person dream of a context that fuses elements of other contexts?
Could the "background" context of the dream be a combination of
contextual images? Just as the foreground objects could be combinations
of objects. (analagous to the "hybrid" characters? in dreams? For this
system there are no people, so are objects and contexts its characters?)
Piaget related ideas:
In this case all actions are meaningful,
likely based on the physical requirements of the child, affection,
food, etc. For the system to act meaningfully toward an object in its
context it needs some connection with that object, like it fulfilling
some need of the machine.
The bad background subtraction contours are
a misinterpretation of what is an object. How could accidentally
finding an object lead to a reinforcing pattern that would make it
easier for the system to find objects? This would likely require a
"diffusion" style visual system that would integrate many thousands of
representations from which to extract features (at different levels of
abstraction) of what a person, or car is.
- Can DM remember its imaginations? If so, through the same perceptual mechanism, or a different one?
Could the DM be sensitive to movement of objects with a MHI representation at the same time of other representations?
could causal relations (self-movement) be determined from these representations?
- Since the spatial pathway holds positions
in space, what happens during a dream when those representations are
activated? Maybe the camera moves to look at those locations, as in REM
sleep.
- Try HSV or LAB or some other colour model for histogram abstraction (or haar-like features instead?)
What if the system chose for itself what
features of an object are important? It seems clear that the brain
extracts an unimaginable number of features from the world, at
different levels of abstraction, and that attention is the mechanism by
which these features are filtered. If perception is the world impacting
the agent, then is consciousness/attention the wilful manipulation of
those impacted representations?
Could a measure of the
specificity of each prototype (not well it refers to a single image) be
used to change them into differentiated signifiers? A simple measure
could just be the number of images in each accumulation refers to its
level of specificity.
- Orientation: If you can get the angle of a
BBox in opencv the angle can be normalized by rotating the image. This
should make all images of the sme object upright, or have the same
orientation. Can you rotate a whole image in opencv and then crop it
square?
Maybe visual prototypes are "perceptual"
schema in the object pathway, while Mandler's image-schema are the
corresponding prototypes in the spatial pathway? See On The Birth and Growth of Concepts
Poor visual acuity in infants: What
if the images taken by the camera start off as being out of focus (and
or low-contrast, B+W) what about a computer vision system that uses
this blurry simplified input to bootstrap a vision system for
colour/sharp/high contrast images? See The Cog project: building a humanoid robot
Start the system with camera's out of focus and have them slowly increase focus during development?
Technical Algorithm: (including mapping!, Go through it with Steven)
- for each input image: (Retinal Stimulus)
- accumulate into background reference (contextual mapping)
- do background subtraction (Attention)
- get the coords of the bounding box around object. (Attention)
- crop image (Attention)
- abstract into edge + hist (Perirhinal representation)
- feed into clusterer (SOM?) (Perirhinal Perception and memory)
- accumulate images corresponding to cluster units. (Perirhinal Perception and memory)
- crop background reference (same attentional act as object focus) (
- record position + hist? (posterior hippocampal representation)
- feed into clusterer (SOM?) (posterior hippocampal perception and memory) Do I need a clusterer if I already know the exact position? Maybe just a topographical map?
- record one image & location for each attentional act (posterior hippocampal perception and memory)
Why?
- The aim of this system is to collect sensory
materials for the dreaming machine, but could also stand on its own as
a meta-creative system capable of generating aesthetically interesting
images.
Science Basis: (recheck this)
- Structure of the visual system (Horton & Sincich, 2004)
- Retina -> Primary VC
- Spatial mapping w/ little lateral inhibition (Hubel and Wiesel, 1979, 2004)
- Cortical columns are tuned to contrasting edges of
various lengths and orientations. (Hubel, Wiesel & Stryker, 1977;
Martinez & Alonso, 2003)
- VC as the diffusion (rather than abstraction) of data. (no ref)
- Primary VC -> Secondary VC
- Secondary cortex is more specialized as to how it responds to stimulus with specific features. (no ref, Pinel?)
- Ventral Secondary VC (TEO/ TE?) -> Medial Tempral Lobe (MTL)
- Contribution
- Integration of perception and memory.
- Segregation of spacial and object oriented visual processing.
Separation of memory integration in perception with higher level memory integration?
- Application of current models in art.
- Mapping between elements of algorithm and visual system. (Added mapping to algo discussion. What is here that still needs to be said?)
- The system's sensory apparatus is a fixed monoccular camera that does not have a foveal/periphery distinction.
- The image supplied by the camera is the stimulus. It is considered analogous to the signal sent from the retina.
- The retinal-geniculate-striate system is completely
absent from the system. Images are perfectly recreated at the primary
visual cortex.
In biological systems saccades of the fovea
are combined in short-term buffer which stores an image combining
details of the multiple sensory impressions (signals) from the retina.
In this system every image presented is accumulated in a buffer. This
buffer stores the background from which objects are extracted. This buffer adapts slowly to absorb changes in the environment. This seems quite different than a biological visual buffer.
- Early VC processing is implemented in opencv using simple adaptive background subtraction.
This results in passing both the stimulus and the background reference onto the ventral stream.
This buffer is used in the selection of objects through background subtraction.
Background subtraction selects objects from
the visual field. These objects tend to be moving, as static objects
are ignored due to the adaptive background. The purpose of the
background selection is to focus attention on a particular objects. The
input images are cropped around these object. This process is analogous
to a fixation of the fovea on an object.
Object representations are created and stored in an area corresponding to the perirhinal cortex
As each image is cropped, the location of
the object is associated with a corresponding crop of the adaptive
background. This data is clustered, and corresponds to the dorsal
stream, which corresponds to a spacial map of memory.
Each cropped object is abstracted into a RGB
histogram and edge detection. This abstraction is used to cluster
object seen in various images/stimuli. This corresponds to the ventral
stream of the visual cortex where images are abstracted into classes of
objects.
Objects and places are stored in separate memory systems. (in ventral and dorsal streams)
- Why not more biologically inspired computer vision?
- The inherent parallelism of visual processing in the brain is not suited computational implementation.
- OpenCV makes traditional computer vision algorithms
accessible to artists, in particular opencv hooks for
max/flash/pd/python etc..
- This project is meant to be a component of a larger
project "Dreaming Machine" and therefore deeply biologically plausible
techniques are inappropriate both because of the contradiction between
serial and parallel processing, and due the non-real-time aspect of
accurate biological simulation.
- The system uses mechanisms that are vaguely
analogous to biological mechanisms. The edge-detection algorithm is the
one included with opencv, and corresponds roughly to biological
edge-detection techniques.
- The colour histogram is a fast statistical method to
generate the gist colour values in an image and may be implicated in
quick scene classification in biological systems. (Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention.)
- Future Work
- processing of multiple objects within the input image.
- Dreaming machine will implement a model of dreaming
where memory of objects and spaces will be combined into simple visual
compositions
, inspired by the detrimental effect on dreams when lesioning broadman's area 40.
- Circadian model of sleeping, with visual light entrainment.
- A pan/tilt camera would allow a greater variation of objects and contexts fed into memory.
- A continuous online learning system that provides topological organization of memories. (SNN? ART? RBF?)