If the action of the system is a choice to merge two subsequent images as the same percept, then this could be a root action of the system, not an action in the world, but an action that changes the perception of the world. In order for the system to compare these constructed percepts with external stimulus, a different distance measure (or different threshold of the same measure) would be needed because percepts are already defined to approximate the external stimulus sufficiently.
Learning, when considered the improvement of performance on a task, could be manifest in the system improving the construction of these percepts. Greater performance would be a greater match between external stimulus and perception. Seems to me there needs to be a constraint, otherwise the perfect percept is one that has not been merged at all. The number of percepts in memory could be a constraint, but we want as many percepts as possible that don’t exceed available memory. If we merge too much, we’ll have very few percepts in memory that have very poor merges. If we merge too little, we’ll have too many percepts, although they may approximate external stimulus very well.
Another idea is to reconstruct a whole perceptual image from percepts (like those shown here), and then compare that whole image (constructed of short term percepts) to the live stimulus image. Due to segmentation, and the filtering of patches by area, these two images can never be identical, so this could be a site of learning. In this case, it’s unclear how that distance measure would manifest in learning. The merging process would not likely fill in white areas (blind spots composed of many small percepts that have been removed), but the number of percepts could increase how much those areas are filled (due to changes in patch borders).
This is all related to homoeostasis because the system’s internal representation of the world (the set of percepts) may have differing degrees of conflict (tension) with external stimulus. It seems reasonable that a “task” of perception is matching the internal compressed (merged) representation with the external stimulus. Even if we are simply talking about comparing a whole reconstructed (imagined) image with external stimulus, it is only abstraction that is lacking. It is unclear (and out of scope) to think through how additional layers of abstraction could be manifest. For example, a second network that holds not references to percepts, but references to patterns of activation of percepts.