As part of stepping back to see the big picture I’ve turned a second look at deep machine learning systems and biologically inspired cognitive architectures (as suggested by my supervisor), the latter of which will be discussed in another post.
Deep machine learning systems attempt to resolve an issue with “shallow” learning which has increasingly become the following process: Input → Feature Extraction → Machine Learning. An argument against this approach is that the “intelligence” shifts from the machine learning system to the human-centred, and domain specific, art of feature extraction. Deep learning systems excise the middle man, allowing Input → Machine Learning, without the intermediary.
Unfortunately, it appears that these systems are highly limited when it comes to spatial resolution. Often input patterns are as little as 30×30 pixels (The hope is that DM3 will have a sensory resolution of 1920×1080). While the information gleamed from this low-resolution data is significant, it appears to sacrifice spatial resolution for depth (high level abstraction).
As this project is inherently situated in theories of cognitive science and neurology, this issue of feature extraction is largely moot. Visual feature extraction may be one aspect of brain function that is most understood. Additionally, the emphasis of the project is the organization of mental representations, not perceptual mechanisms themselves. That being said, making a strict separation between perception and cognition is problematic. A balance between shallowness (where cognitive science informs feature extraction) and depth (where the system learns correlations between features automatically and without supervision) is required. In essence the difference between the two approaches is the nature of the abstraction of sensory inputs. What other difference is there between a subset of pixels, and a patch resulting from segmentation? The former expects much more from the machine learning system, while the latter can be tuned to use less computational resources, as the complexity of input is greatly reduced. Of course the feature extraction itself may be computationally expensive, as it currently is in the prototype perceptual system.
Another aspect of deep learning is an intrinsic dependence on time. This allows systems to ‘see’ continuity of objects over time, even as their object-centric features change. An important question is whether high level conceptual groupings of objects are reducible to object-centric features, even including time. Deep learning systems are highly complex, even without considering time.
An open question for this project is: What is the appropriate balance between a priori feature extraction and deep machine learning?