The idea is to take a sequence, video or film. Take the first 100 (n) frames and train a SOM on those frames. That trained SOM is the first frame of the output sequence. Then take the 100 (n) frames starting at frame #2, and so on. The resulting sequence would have 100 (n) frames less than the input sequence. The result would show the structure over the last 100 (n) frames. A new scene would start a new cluster and grow in the field until it shrinks as the scene ends and is displaced by the next scene.
The number of frames used would define the number of clusters in the feild and the number of units in the frame. HD+ resolution would likely be needed as each frame would be composed of many frames from the original footage.