Just did a test fit and it looks like the changes from the first one are looking good and modifications are not needed; well, short of the holes in the middle which never seem to line up that still need to be redrilled. They’re working a little better with this one which is 16ga rather than the 14ga the previous version was. Now I just need the new button boards from Bobbi!
Update: I managed to fix the issue with the two boards’ GPIO pins behaving differently by using a button board that pulls up using the power and gnd pins on the GPIO header. This means both boards have the same HW interface, which will make cloning much easier! I’m thinking using a full 40pin connector will be idea because it will hold on better and mean there are only two ways to connect it. Other than final assembly and the final wood covers, the hardware part of the Zombie Formalist is complete! There is still much to do on the software side of things though. for the Art Machines 2 exhibition I changed to uploading every 3 hours rather than every 2 and that means collecting the training data is much slower. At the time of writing 1213 compositions have been uploaded to twitter and I need to get to 1734 to compare ML results with the previous integration test. The hope is that the current version will perform better because I’ve reduced the number of parameters by limiting the piece to only circles with 3 layers. I’m thinking about further restraint by removing radial lines and some kinds of offsets so all compositions will be variations on concentric circles.
The following gallery shows a selection from the frames processed to date on the left and their corresponding input frames on the right. I’ve selected them for the clarity of the emphasis on foreground objects in the relationship between the reorganized image and the input (e.g. the white dominance in the top pair due to the white vehicles and the yellow in the pair below it due to the taxi).
I’ve committed to a final approach and parameters, see the video below. I think the current version balances the flow with hints of the current frame with the stability over time with sufficient reorganization / abstraction. The emphasis on the current frame leads to some interesting colour shifts as the colour of moving objects tends to be more grey (due to the effect of shadows and changes of light on the pavement). Note the entrance of the yellow car (bottom right corner) at ~32s in and the corresponding increase of yellow in the middle of the frame. The yellow does not show up where the car can be seen because its reorganized and there are yellow colour values more similar in the middle than where the car is. Large changes (in the sky for example) change the whole palette because those changes are so dominant in the areas of movement, as shown just after the yellow fades as the car passes through the intersection. Looking at this video, I need to do a lot more tweaking to get the compression to look nice; video codecs tend not to be very good with subtle gradients like this!
On the GPU front, I have news. Daniel Frenzel ran my test code on his RTX3070 and sent me the results. This card is 5 generations ahead of mine and 7 years newer. His setup got my 26s per frame rate down to… 20s per frame; about 23% faster. This is much much lower than the 50% faster I was hoping from even with a 9 or 10 series GPU. This lack of improvement makes it clear that a 24h (even a 12h) video at 540p is impossible without a huge investment. Having started processing the final version yesterday, I’ll be able to make a 7.38h long video assuming finishing one week before showing. If I wait the three weeks for the new GPU and then run it for the same amount of remaining time, I could make a 8.7h long video. So basically a little more than an hour longer, not a very good investment for a $1k GPU. This is also all assuming it actually does take 3 weeks to get the GPU and that the 3060Ti I could get is just as fast as Daniel’s 3070, which are both not very likely.
Since the work depends on accumulating learning over time, I have to get it started and since the idea is to bring back the summer solstice into the winter, that means committing to which frame number to start with. As I suspected 12h even was out of reach I’ve started with the frame captured at 9am on the summer solstice, which is when the MPCAS turns on in the morning. Assuming no hiccups (power failures, hardware failures, etc.) that means the work would run until about 4:30 pm on the winter solstice, just as the sun sets. The means that the sunrise and sunset from the summer solstice will never get processed. It would be nice to see those transitions at some point, maybe I’ll return to this piece at some point in the future and make the full 24h version. On the plus side, it does mean bringing some brightness and colour to the dark winter days. I’ll make another post with samples of the final version soon.
After some reflection I’m just not happy with those hard edges of the current frame injected into the SOM, as much as they increase the sense of flow. I think at this resolution in particular. I’m currently looking at a version multiplying the neighbourhood map by the current frame, so that softens the edges; this seems to reduce the motion blur effect and certainly looks much smoother at this resolution. There is not a lot of time left before I need to get cooking, so hopefully I’m only one more test away. I have not heard back about which I can expect from investing in a faster GPU. One of the deciding factors was that some frames have a lot of movement in areas where there tends to be little on average, like the sky. This results in very large changes where the current frame interruption is too much.
Consider the cyan dominated frame around the 1:11 mark in the video above, and this nearby foreground mask with a large amount of movement in the sky:
Multiplying these masks by the neighbourhood map reduces the dominance of large changes in areas where there tends to be little movement. The results look encouraging! Here are some teaser frames: (note the stability over time!)
I need to spend a little more time with this, but I think the balance between the foreground objects presence and stability over time is quite good. I’ll do another sequence with a slight change in maximum learning rate to shift a little towards stability / foreground objects (i.e. 10% less reorganization in areas of movement). I’m also going to reset to the start of the day (9am) and see how that looks. It is too bad if I end up being stuck with an 8 hour version of the work that sunrise and sunset would not be included.
There is a new GPU that should work in my shuttle available, an RTX 3060 Ti, but it will take three weeks to arrive and costs nearly $1k. I’m thinking it may be worthwhile if that means allowing the work to have the scale (but likely not resolution) of what I envisioned for this piece. The author of ANNetGPGPU kindly offered to do a performance test of my code using his RTX 3070 GPU, which would at least give me a sense of what kind of performance I may get if I upgrade.
The following video is 100 frames of current work in progress. I was not happy with the motion-blur effect so I went back to the copying of the moving objects into the frame and tweaked some other parameters. I think it’s going well, but I’m not happy with the hard edges of the moving objects, especially at 540p. The motion blur is not great either. Maybe the harder edges are ok for the larger objects, but less so the spots in the trees. Part of the idea of including those moving objects was to anchor the abstraction into the current frame; now that I’m also increasing the effect of the reorganization process in the areas of movement does that have a similar effect?
I did a quick run without inserting the foreground into the state of the network and I’m certainly much happier with the general aesthetic. There is even a sense of movement away from the camera along Broadway that is still readable. The following gallery shows the same sequence as above, except where foreground objects are not included.
I’m just not sure, from an audience perspective, that the movement of the scene would be readable without some subtle reference to foreground objects. Up to this point is to seed the network state with the moving objects to effect the reorganization process. An alternative could be just to present those moving objects visually without them effecting the reorganization process… This would really miss out on some of the complexity of the disintegration of moving objects, for example in the following frames.
The obvious answer is just to take more time and allow the algorithm to obliterate the seed of the moving objects more, but that means taking more time when there is already a time crunch. Another idea is to use the current frame mask (that shows where the moving objects are) to augment the learning process. This may not do what I imagine though… After some reflection, I don’t think it would do anything more concrete. I’m right now doing a run to get the aesthetic I want, training time be damned! (see below.)
I’ve head back from a few of my GPU inquires, and it does not look like upgrading my GPU is possible. I’m now looking at seeing if I can borrow a faster GPU or potentially buying a “gaming” PC from FreeGeek; the problem with the latter is that its certainly a lot more money for a whole new system (even an older one) and the GPUs are much more in the middle of the road and actually have fewer CUDA cores than my current GPU. It’s unclear to me how this will effect performance, but it strikes me that more parallelism is what I need than raw speed to do the training faster.
The following video is 50 frames of the current work in progress. There is quite an improvement of stability over time, but I it could still use a little more. I also find the foreground objects too prominent and have some ideas on how to decrease that dominance; the increase of stability over time is due to a smaller learning rate, which means subsequent frames effect the image less than the first frame, so the image is more stable. The effect is also that the foreground objects are also more stable, so there is a bit of a motion blur effect where previous frames may stay visible because subsequent frames don’t modify them much. I’m thinking I should use the neighbourhood map (that determines the degree of reorganization) in inverse to determine the learning rate of neurons (it looks like the SOM library I’m using may support that). That way the pixels will change the most in the areas of most movement, which should both give me more control of reorganization of foreground objects and maintain stability over time in areas where there is little movement.
The results are quite comparable at 720p, see gallery below, but process still takes about 40s per frame. Even if I change the length of the work to fit the normal hours of the MPCAS (9am to 10:30pm), that’s still 486,000 frames and at 40s per frame, 225 days to process. I just did another sketch at 540p and that’s down to 22s per frame, which would mean the 9am to 10:30pm version would only be done a month late. So it seems I need to consider upgrading the GPU and getting that to fit in this old machine may be troublesome.
I’ve requested a quote from my shuttle dealer and we’ll see where that goes. I’m sure it may be twice as fast, but that still means going with no larger than the 720p version. I suppose I should refine down parameters for 720p and 540p versions. If I can upgrade the GPU, I may be able to do the 720p version for 24hours. If I can’t upgrade, I’d end up doing the 540p version and it’ll be as long as it ends up being with the time available. The following galleries are the 720p and 540p versions with no refinement.
After running a few explorations with different parameters I’m contending with a significant issue, processing time. If I start next week I can use about 7.5s a frame to get the piece done, but so far I can’t get processing down below 90s per frame. I’ve confirmed it is actually the SOM training (reorganization) that is the slow part. Also, the results I have so far are really quite strong and while I have to tune things to get some stability over time, I’m quite happy with these results as they are. I don’t have a lot of options to decrease processing time, I could
lower the resolution; I’ve been working at fill 1080p resolution up to this point, dropping to 720p could mean 55% fewer pixels to process, I’ll try this first.
get a faster GPU; the GPU is a GTX 780, so not at all new.
shorten the video; this is not ideal since I’m hoping Grunt will be able to get special permission to have the screen on for 24hours for this special event; even in the case of sticking to the normal hours of the screen, that would only get me up to 13s per frame.
lower the frame-rate; since the emphasis of this project is the flow of movement with the reorganization of the machine learning, this would be the least favourable option.
The following gallery shows a few explorations to date. Each column is a different exploration and the rows are the sequence over time. The leftmost exploration feels the strongest to me and needs some find-tuning for stability over time.
I got into the MPCAS control room today to check on the laptop. I saw a stream of DTS STS errors which was a bad sign. I confirmed that the files were still being written to disk, so the process is still active, but unfortunately the camera had moved. I presume there was some kind of power interruption that caused both the camera to reset to its default position (see first post) and confuse the ongoing ffmpeg network connection. I moved the camera back (luckily the camera remembered its pan/tilt/zoom position) and checked that the new images were being captured, and they were. I did not get a close look at the scope of the issue (I’ll get a sense when I get the machine back in two months) but its likely I’ll have lots of missing data (days? weeks?) and missing variation. I hope it works well for the next two months at least to get the fall and winter changes! This is not so much of a problem for most of what I have in mind for those images, but it is too bad I won’t be able to do some explorations now.
I also wanted to make a gallery to show the refinement / bug fixing using the sunrise sketches. The left column was the first attempt and the right column the most recent work. The middle ones strike me as the weakest, so I think things are going in the right direction.
In debugging the sequence work, I noticed yet another bug in the code. I was actually cubing and squaring the max degree of reorganization in the loop that sets the max Sigma for each neuron. Since I was already squaring and scaling the matrix of values before that function, that was a making the degree of reorganization very very large. Now my max neighbourhood numbers actually make sense! After some explorations I ended up with a max neighbourhood size of 1000 and 5 iterations with a cubic increase of reorganization. The following gallery shows my explorations with the most successful at the bottom. Compare these to those posted here. I’m now re-doing the frames spanning sunrise.
Indeed increasing the number of training iterations does increase the abstraction (reorganization) of the elements copied from the current frame. Note the truck on the left of the frame in the following sequence. I also tried playing these back at 10fps and indeed the changed in the background are too different from frame to frame. It looks more like a frame-by-frame animation, compared to the smoothness I’m aiming for. for the next stage I’ll try some methods of making the effect of subsequent training more subtle and see if I can speed up the processing rate also.
I ended up doing something a little more simple than I had in mind to start; I’m using only the currently moving objects to ground the image in the currently processed frame in order to emphasize flow and movement. i.e., referencing the a recent post, I’m multiplying the current frame with the foreground mask and replacing pixels in the SOM using that image data. For example, see the following images.
The pixels that are not black in the right image are copied directly into the SOM state. This way we retain some of the previous training (from the seed image) but also ground the image in the current frame that is augmented by subsequent training (which only uses pixels from the current frame). The following sequence shows the current results.
The current frame does certainly ground the image where the delivery truck retains structure. The over-all stability in the frame is a little low though, so this may look too flickery in video. Also the truck may be too literal. This is because there are parts of the truck that are not changed by the training (not all pixels are changed for every frame). I also noticed that even with quite few training samples (from 250 down to 10) there is only a marginal difference to training time. Looking at the way the image changes frame to frame, I think this is due to the size of the neighbourhood (the max degree of reorganization). I’m going to try two things next; first I’m running again with more training iterations the same 10 frames to see how that effects the degree to which the truck is abstracted and the difference in processing time. Second, another idea is to create a version of the neighbourhood map that has a much smaller max degree of reorganization. I can compute that once and use it for subsequent frames. Of course I’m assuming that will make processing faster, which may not be the case.
Since I got the code wrong to get the degree of reorganization to increase exponentially, I reran the sampling of sunrise frames using the current parameters (now a learning rate of 1, 3 epochs and a max neighbourhood size of 15). These results are an improvement, and in this case only 3000 iterations of training are required.
Following from the previous post, I ran a more broad set of explorations where the exponential increase of reorganization (in this case squared) is actually manifest. There still needs to be some fine-tuning but these are going in the right direction. The images below are arranged from the most reorganization to the least, all using the same squared increase of reorganization. The image on the bottom is the most successful.
Following is the best previous result, without squared reorganization, for reference; clearly there is a significant decrease in abstraction in the areas where there is the most movement.
The squared version may be a little too literal (too little reorganization in the areas of the most movement) so I think I’m going to tweak things a little more. I also think I may want to include an offset so that the least amount of reorganization is not none, but a small amount of reorganization.
Looking at these images I see what I had in mind in terms of the aesthetic and flow and why the sequence results are disappointing; it’s because I’m imagining the high movement areas of the image being reorganized in the context of the whole image, in other words I’m imagining what the image would look like if I trained a SOM for every frame. The images above take about 45mins to calculate, so clearly that is not practical. I’m adding a few explorations with less training, but its very unlikely I’d be able to get processing fast enough to get the piece done by the winter solstice; if I can get processing down to 6s per frame, then I’d be generating for two months which would leave me with one month of research and development before final production.
I’m thinking through how to create the impression of the whole image being retrained without doing actually having to do it. Right now the sequence code reorganizes the first frame with a lot of training and then the subsequent frames with only a small amount of training. In the most recent sequence post, the cubic increase of reorganization means that there is little reorganization in the areas of the most movement; since the first frame is used as a seed and then refined, those pixels appear frozen. Subsequent frames do change the image, but in a highly abstract way that does not result in any sense of the flow of time. I have some ideas for how to change this.
Since the sense of the flow of time only happens in the areas of the image where there is lots of movement, perhaps the current frame could change the state of the SOM only in those areas. I could multiply the current frame by the neighbourhood map and add that to the current state of the SOM. Even better may to be pixelwise AND the current frame mask with the neighbourhood map and multiply that by the current frame and add it to the current state of the SOM.
The following images show a set of refinements tweaking params and decreasing the number of training iterations. The bottom image is feeling pretty close and includes an offset to the neighbourhood map so that the minimal amount of reorganization is a small amount (rather than none).
In working on having the current object mask (pixels that are moving in the current frame) change the neighbourhood map (the image that specifies the degree of reorganization) I realized some things were not working right; after just doing a debug test I realized that I was not actually using an exponential increase of degree of organization. This was due a math error on my part. This explains why the effect of the neighbourhood map was lower than expected. The following images are the sequence using the same parameters, except where the cubic increase of neighbourhood size is actually manifest. As you can see, there is no re-organization happening in much of the roadway.
I’m now running a a few of the previous (non-sequential) explorations again using the changed code to re-evaluate what the parameters should be.
The images above show 5 sequential frames from a quick test. The first frame is trained with more iterations and then subsequent frames are trained much less (250 training samples) using only the pixels that had changed since the previous frame. I was hoping this would make things more concrete and less abstract for the moving objects, but that is not the case at all. Actually, the colours that gain dominance are those emphasized in the new frames, but because the network is already trained, they don’t effect the locations where those pixels were seen much at all. So the question is what to do about this; the ML algorithm determines the composition, so the placement is always going to be emergent. One thing to try would be to combine the neighbourhood map (the image that specifies the degree of reorganization) with the mask that highlights the moving objects so that those areas are less organized than other areas. The following image is a mock-up showing the previous neighbourhood map (including a sketch of the exponential increase of neighbourhood size) with black areas showing the moving objects in the current frame. One issue with this approach is changing the degree of reorganization for every frame is potentially very slow. In this test frames took 100s to generate; far slow to be practical.
Following from the previous post, I’ve calculated sketch images using that same method (and params) trained on different images from the 24h set captured through sunrise. The top images are earlier in time (night) and the bottom images are later in time (daylight). In these images you get a sense of how divergent the output can be for similar frames (these images do show emergent properties). I’m going to work on tackling the issue of how these could look over time next.
This is the new enclosure design with the changes from the assembly for the first ZF. I’ve sent this off to the fabricator and hopefully will be able to assemble the second ZF soon. The button board electronics are giving me some issues as I’ve been unable to get the two board revisions to behave the same, even when using exactly the same Jetson image! The NVidia forum has not been any help either, I’m hoping changing to pull down buttons for the GPIO interface (from pull-up) will solve this issue with only being able to use a single pin on one of the two boards. I don’t think this is actually fixable, so unfortunately the two ZFs will need different software to reference the pins. This is not ideal. Who knows, maybe using pull down buttons will let me use the same pins on both boards (accidentally).
What is interesting about it is the interplay between the different degrees of reorganization that gives the colour regions ephemeral smoke-like shapes. Since then I had in mind different ways of manifesting degrees of reorganization (a continuity between sensation and imagination, realism and abstraction, etc.); at the start of the MPCAS project I considered using a depth camera to provide the degree of reorganization where closer objects would be less re-organized than distant objects (a direct manifestation of what the horizon was a proxy for). Since the view from the MPCAS camera does not have a clear place for a horizon it was unclear what should determine the differences in reorganization. Using the areas of movement is interesting because it’s consistent with the emphasis on flow I was interested in with a moving image. So I wrote some code to use the following image to determine the degree of reorganization and have been making a lot of explorations.
The roadways and trees have the most movement with some movement on the sidewalk as well. The street signs, utility poles, mountains and architecture are static. In my first explorations, below, I could not see much effect of the movement image. The degree of reorganization seemed quite consistent. There are some hints of effect though; note the details around the pink area amongst the trees on the right. This is an area where there is lots of movement (on Broadway) visible through the trees.
I thought perhaps the subtlety was due to the parameters and tried a few more variations, as follows. While the trace of the movement image is still visible in small details, the images are too evenly organized overall.
At this point I was starting to think there was a scaling issue and that the movement information was not being interpreted correctly. So I did a test using one of the foreground masks such that the foreground objects in the frame are not reorganized, but background objects are. Sure enough the code is working properly, see image blow. Some background objects are not reorganized because it was a quick test with few training iterations.
Since I was using code from “As our gaze peers off into the distance, imagination takes over reality…” I realized that I had used an exponential function where the degree of reorganization accelerates from the horizon to the top. So I went through another set of variations where the movement information is squared or cubed, see following images. The effect of the movement information is much clearer now and you can see traces of the architecture, street signs etc. in the degree of reorganization.
I’m feeling like these are on the right track, but I was still imagining the road areas would be much less organized. The top right is actually under-trained (8000 iterations) so the detail in the road area is due to those pixels not being changed, not due to the effect of the movement areas. It seems a relatively small number of iterations (50000) works quite well and also a quite large maximum degree of reorganization. The top left is actually way too reorganized with both too many training iterations (300,000) and too much reorganization. The bottom two images are the most successful and I also did a test with a fewer number of iterations and a greater degree of maximal reorganization, see following image. It looks like it has slightly too few training iterations, but is the strongest result so far.
I just started a test running these same params for a set of 35 images spanning from night time to early morning to get a sense of how they look in differing lighting conditions. After that I’ll see about changing the params slightly and then see about how to work with change over time. Each of these images takes about 45mins to calculate, so the whole 24h sequence at 10fps would take 70 years! The idea was to optimize since most of the frame is static over time; I’d train a base image that may take 45 mins but then only slowly refine that image frame by frame with an emphasis on the moving objects. The details of this process will have a huge effect on how the image evolves over time and also the aesthetic of each frame.
After writing some other code for the MPCAS (using the areas of the most movement to determine degree of pixel reorganization) I realized I was using version of the ML code (Self-Organizing Map) that has harder edges in the neighbourhood function. I went back to the code I used for the painting appropriation, and recalculated the night + day average colour over time gradients from the last post. These should be smoother now. I’ve included the previous gradients (left) and the new ones (right). I also just realized that I had been using code that uses time-seeding of random values so different runs with the same params and data would be different due to the order training samples are presented. For consistency I’d rather these results be deterministic.
Following from the last post, I’ve recalculated the gradients for each region using both day and night frames to increase their contrast and emphasize the day⇄night transition:
Wondering why the arch and roadway / sidewalk are flipped even though they have very similar colours? The algorithm creates emerging structures that result from complex interactions between the training data and the algorithm, even small changes in the parameters (such as the initial conditions or number of training iterations) can make significant changes to the final image.
Using the changes in average colour over time I thought I’d explore using ML to reorganize the stripes according to colour similarity. I started with the average for all regions and ended up with the following result where the original average colours are on top and the reorganized version is at the bottom.
Using these parameters as a starting point, I repeated the process using the region averages separated by night and day. The left column shows the originals and the right column the reorganized versions where the top half are day and the bottom half are night.
These parameters don’t work well in all cases; for example, the warmer peachy tones to the left of the plants (day) image, second from the top in the left column, are lost in the corresponding reorganized version. This is due to the infrequency of those tones and the general lack of colour variation in the source image. A fix for this would be to use fewer training iterations. I think these results are certainly interesting and could use more tweaking for each region and time-window rather than using the same parameters for all the images. The gradients are quite strong and the whole dusk and dawn transition is the most interesting thing about them.
A live version of this would be quite lovely; perhaps the window of time could be one day or one week and over time the generated gradient would change the distribution of colours and light. This would be most pronounced in the change of light and dark in the sequences where day and night are both included. I think I’ll do a quick exploration of that next. Even without being a live version, I should calculate such a gradient changing over time for a long-term time-lapse and create a video; the result would shows the changes of light colour and the length of days. A live version would not be technically very complex, be very computationally simple and lower-power use since the ML would happen over a long time-period; also the calculation of the average colour of an image is very efficient.
Following from the previous post, I’ve calculated the average colours over time and for the whole sequence for day and night frames. This yields some particularly nice results, in particular the average over time for the sky at night (upper left in gallery below). Rather than detecting a colour shift over time, I used the traffic lights in the frame as reference. The first frame where the first street light turns on is considered the first frame of night. Since the gross colour change towards the warm sodium lights is what I wanted to separate, this approach makes sense. A particular street light turns on early and turns off late, and since I’m working with a fixed frame I should be able to create a automatic detector to read this local change in a particular location. The following images show the same four regions (Sky, Plant-Life, Architecture and Sidewalk / Roadway, listed from top to bottom) with night (left column) and day (right column) versions of each.
The sodium colours really only dominate once the sun has set, so the night images still show significant dusk / dawn colours that may belong better in the day sets. Perhaps it may be better to detect the average brightness (or blueness?) of the sky region as the night / day detector. Weather manifests interestingly in the night sky, where clouds reflect the yellow city light; note how much darker the first and last nights are compared to the middle nights with rain and cloud. On my list of visual explorations is using machine learning to reorder these averages over time according to similarity. I believe I have some code that does this somewhere so it could be a quick implementation. The following gallery shows the corresponding average colours of the entire sequence separated by night and day. Note how much more significantly those warm sodium lights effect the built environment (Architecture and Roadway / Sidewalk). I think the next exploration will be a reorganization of these sequences of average colour.
I spent some time manually drawing boundaries around four main regions of the MPCAS frame: Sky, plant-life, architecture, and sidewalk / roadway. I took out my old Wacom tablet (which I have probably not used in over 10 years) to do the job and it was quite laborious! The extracted regions with a sample frame (left column) and their corresponding masks (right column) follow. The labour involved was worthwhile as the frame is fixed and I could use these masks to filter things like foreground objects and make different collages for each region.
Averages for Each Region Over Time
Averages of Whole Sequence
These results are promising! Interesting how similar the architecture and sidewalk / roadway average colours are and how much they differ from sky and plant life averages. The night frames with their very warm street lights have a much greater effect on the architecture and roadway sidewalk, which makes sense since those are the spaces lit by those lights. It would still be interesting to see these split up by night and day frames… I think I’ll try that next!
The next exploration in my list is the average colour of a whole sequence of frames. This follows a bit from this exploration, except the input images are reduced to a single average colour. Again, the X axis is time, but the Y axis is uniform:
We can see how there are 7 days and 7 nights in this set. Days end up being very grey and nights have an orange tinge due to the street lights. Compare the first 1920 frames (top) with the previous exploration (bottom):
These colours are quite muddy due to both the grey weather at the time of the capture, and also the dominance of the concrete in the intersection. Again, it seems like these results would be more interesting if I was to mask the image by regions, trees, road, sky etc. The averages of each masked area would be more interesting and I’m curious about the change over time. The last exploration is the simple average of the entire sequence, which is unsurprisingly a muddy colour. Another thing I should try is separating the night and day frames (and twilight?) and create averages colours of those, which I expect would be quite different.
Following from the previous post I tweaked the code a little bit and used the 7 Day time-lapse image set since that would be closer to what I had in mind for the collage. This image set has only about 10,000 frames and that resulted in 91,000 fragments (compared to the 800,000 for the previous collage). The colour is significantly different because the frames represent one weeks worth of diversity, rather than being dominated by the restrained palette caused by construction in the collage previously posted. The extraction of foreground objects is more messy as the background model takes longer to train when there is so much change between subsequent frames. I leaned into this and also tweaked the code so that the edges are not smoothed out as in the segments previously generated, see image blow.
I do like the edge quality, but there is still some fine texture that seems to be missing, I think this is due to the filtering of segments by area. I choose a somewhat arbitrary threshold to keep the number of segments per frame at a reasonable number (5–50 per frame). Another trick is to sort the segments by area so that the larger ones are drawn behind the smaller ones, this leads to a much more complex collage (e.g. this work in process) where the smaller and thinner pieces can tend to emphasize flow and segment orientation. All in all this is good progress for collages and I think I’ll leave this exploration aside for now.
I adjusted the ‘features’ (the numbers that define how the machine ‘sees’ each foreground object) so that only the average colour and orientation are used to determine similarity. This has the potential to speed up the process for larger image-sets. I think the overall structure is quite complex considering using simpler features (especially average colour) can lead to collages that become closer to gradients where there is too much organization. As part of this process I also revisited some code I wrote to visualize what a collage would look like without having to generate it. While for this case rendering the 90,000 segments (on an fast SSD) is not too slow it was certainly slow for 800,000 segments and I’m expecting many more for the long term time-lapse collage. The image below is the fast visualization where the colour and orientation of each segment is drawn using rectangles with one black edge. As you can see it’s not very organized and had much less training iterations than the previous collage.