X Axis as Time

I thought I would do a couple more explorations of the 7 days of frames following the previously posted average. The first is recreating the same frame where each column is actually extracted from a different time. I test this using the sample of different lighting conditions from this post and another test using the first 1920 frames from the full 7Day set below it. I think the latter results are quite uninteresting due to the amount of change minute to minute (i.e. the change of traffic manifest in vertical lines in the image).

Infrastructure Set Up for the MPCAS

I’ve visited the Public Art Control Room a few times now and set up a laptop and 5TB disk array for image capture. After doing a few tests to make sure my two ideas are possible, I’m moving forward with the “final” captures. My two ideas are (1) a time-lapse with one capture every minute for ~7 days and (2) a 30fps capture for 24 hours. I was not sure the old laptop would be up to the job, but I was able to capture to disk in real-time; though I did have to quit the window manager and run ffmpeg from the console to divert all available resources to the task. For the latter, I’ll be capturing on the summer solstice from midnight to midnight; the capture will actually be 48 hours to avoid the awkward hours and I’ll only keep frames captured during the solstice. As I’m saving individual JPEG frames, the 48 hour test takes close to 3TB of space. The laptop is now deleting the before and after midnight files, which takes a very long time; thus, I was unable to determine if I’ll have the space to keep the backup 24h test and the 48h solstice capture; I’ll find that out on Saturday morning when I’m back in the room to start the solstice capture.

The following images are a snap-shot of the diversity of weather over the 7 days of the capture. One thing I changed from the previous test was turn of the camera “sharpening” which seems to give everything a bit of a blurry look. I’m not sure how this will work for possible segmentation experiments, but it does seem to cut down on some of the JPEG artifacts. I should also explore changing the compression quality for the 7day capture since it takes up so much less space already; the 24hour will need to stay at this quality though.

Looking back on these images they really do look blurry; could the camera be having a focus issue? In comparing this to the previous test, the image quality is clearly less sharp and it seems “sharpening” is not actually a post-process for sharpening, but a smoothing post-process. Too bad about that, since I’m not likely to have the amazing diversity of weather over a 7 day period any time soon. I’ll have to keep an eye on the weather and see if I have another opportunity. It could also be the aperture of the camera… Also the exposure is automatic, but the sky very often gets totally blown out (e.g. frame 4212, middle left). The exposure of the street does look good though, so it is probably just exposing for the majority of the frame, exposing for the sky could make the street very dark; if I have to choose between exposed street and sky, I suppose the street makes more sense. I could also try the camera’s dynamic range mode. The sun certainly does shine directly into the camera at times (e.g. frame 9837, bottom left). The following video shows the entire time-lapse.

First look at the “Pubic Art Room”

Over the next six months or so I’ll be working on a new commission funded by the Grunt Gallery for the MPCAS (Mount Pleasant Community Art Screen)! During this residency period, I’ll be revisiting some of the methods I used in my work, “As our gaze peers off into the distance, imagination takes over reality…” Commissioned by the City of Vancouver for the Platforms 2016: Coastal City Program. For this new project I’ll be working with a video stream from a camera installed with the MPCAS, rather than a photographic panorama as used in the 2016 work.

Last week I got my first chance to get into the server room for the MPCAS. This is the room with all the hardware that drives the screen and also the outdoor PTZ camera. The room is smaller than I expected, about the size of a closet with a server rack on wheels. The space is so small that we have to prop the door open and push the rack slightly out the room to open the access terminal! After some fiddling last week, I was able to get an old laptop onto the local network and access the camera feed! Unfortunately, I was unable to control the camera pan/tilt, resulting in a stream of images like this:

The camera is a BirdDog A200 outdoor PTZ IP camera. Since working on the Dreaming Machine, I have not looked at surveillance camera technology and it seems a new protocol, NDI (Network Device Interface), looks to be quite popular. While this is an royalty-free protocol, working with it does not seem that straight forward. I found that ffmpeg does support these streams “built-in” but only specifically version 3.4. Newer versions do not support NDI because NewTek (the company that created NDI) distributed ffmpeg without adhering to the GPL. Downloading the ffmpeg source and the NDI SDK allowed me to compile ffmpeg and access the video stream.

This morning Sebnem (the Grunt gallery Tech) and I got back onto the room to check on the previous test. Unfortunately only a few hours of frames were saved as a technician with the screen company (who installed the camera) did something to the camera that required a power cycle. Once we were on site and cycled the power for the camera we were able to control it! I also noticed that ffmpeg does seem to recover and write new frames in the case that the camera is unaccessible, so that is good to know for the future. After a little exploration I settled on the following frame as a starting point:

Compare this to my description of what I imagined the camera would see from my initial proposal:

I envision a frame showing the city-scape to the East with dynamism of Kingsway below, Kingsgate Mall in the lower middle, the growing density of the area beyond in the middle upper and the mountains and sky above. The camera’s view will contain a rich diversity of forms, colours and textures that are both natural and artificial.

Kingsgate mall is out of frame to the right behind the trees, but the dynamism of the busy intersection of Kingsway and Broadway below contrast nicely with the trees, mountains and sky. I also saved a “preset” view of without Kingsway below and more sky, but I think this composition is more balanced. I ended up using as many manual settings as possible, but with this much contrast (between the shadows in the trees and the bright sky) the sky looks quite blown out and some settings may need to be tweaked. There should be some rain coming at the end of this week so I should get a better sense of changing light through day, night and variable weather when I access the footage next week. I’m also quite curious about the image quality at night.

The laptop is capturing one frame to disk every minute and I’ll check in next week and get a sense of the variation of the material over time. I’ll see if any camera settings need to be tweaked and start working with the new material and see where that leads starting with code from “As our gaze peers off into the distance, imagination takes over reality…” and also “Through the haze of a machine’s mind we may glimpse our collective imaginations (Blade Runner)“.

Zombie Formalist ML – G Data Set

Following from the last couple ML posts, I’ve been looking at the Integration G data-set. This set has 1734 uploaded compositions (only slightly less than the F data-set). Interestingly, without the filter by in-person attention mechanism (face detection) to determine if a composition is “good” enough to be uploaded, the “good” and “bad” classes are more balanced. i.e. about half the compositions are liked or retweeted. I presume less likes happen over the North American over-night as I’ve observed; hopefully the Hong Kong exhibition will increase the number of followers in the Eastern hemisphere. I should look at the distribution of engagements over night in North America.

If “bad” means no likes or retweets (RTs) and “good” means at least one, then there are ~1000 “good” and ~700 “bad” compositions. Since the classes are fairly balanced I did an initial experiment without re-balancing. Since the aim is to detect “good” compositions, it does not make sense to balance by throwing away those compositions. The results are OK, but the f1-scores are quite inconsistent for “good” and “bad” classes. The average test accuracy was 57% with a peak of 60%. The f1-score for the best performing model from the best search was 72% for “good” but only 29% for “bad”. I suspect this is due to the unbalanced classes and the test split used in that search being lucky (having more “good” or similar to “good” compositions). The best performing model from the worst search attained a test f1-score of 62% for “good” and 43% for “bad”. (The difference between these two searches is only initial weights and different random train/val/test splits).

I tired a different threshold for “good” where the compositions with only 1 like or retweet was considered “bad” and more than 1 was “good”. Compositions with no likes or retweets were removed. This resulted in 1003 training samples, so a significant reduction, with very balanced classes (503 “good” and 500 “bad”). While this resulted in slightly more balanced f1-scores, it also results in lower average accuracy and poorer f1-scores for the “good” class. The average accuracy on test sets was 55% (compared to 57%) with a peak test accuracy of 58% (compared to 60%). This seems consistent, but the details bare out a lack of improvement; for the best model in the best search, the f1-scores are 50% for “good” and 60% for bad, meaning a significant reduction in f1-score for “good” compositions from the previous 72%. The best model from the worst search was very similar, with f1-scores of 58% for “good” and 52% for bad.

The conclusion is that there is no improvement throwing away compositions with no likes and retweets assuming they are seen by less people. So we’re about in the same place, hovering around a 60% accuracy. The current integration test, H, includes a more constrained set of compositions (only circles with only 3 layers) this reduces the number of parameters and constrains the compositions significantly. My hope is that this results in better performance, but I’ll have to wait until it collects a similar number of samples. Before sending the ZF to the crater, there were 342 compositions generated during that test, so there will be quite a wait to generate enough data to compare, especially considering the travel time to and back from HK. So I’m going to set the ML aside again for now.

Zombie Formalist Assembly!

The following photos show the assembly process for The Zombie Formalist! This version will test for a week before getting crated and shipped off to Hong Kong for Art Machines 2! It’s more restrained (only Twitter engagement and generates circles with three layers) in the hopes that the ML part works better with more constraint in the training data.

Final Product!

Metal Enclosure!

I got the metal enclosure back from the fabricator! There were a few issues; the camera mounting holes were not quite in the correct position, the Jetson board mounting holes were reversed and I did not take into account the length of the power connectors, so the power supply does not fit as expected. My tech Bobbi had the tools, so we were able to make the modifications and I test fit the components. The gauge of the metal was thicker than I (or the designer) were expecting so the next unit will probably by a thinner gauge.

Also, the Zombie Formalist was accepted for the Art Machines 2 conference in Hong Kong In June, 2021! Robert is currently working on the design of the wood frame and I’m working on getting the Jetson board to interface with Bobbi’s button interface.

Zombie Formalist ML – Revisit of Hand-labelled Data Set

After all the issues with ‘lucky’ results I wanted to go back and confirm my 70% best-scenario results were not lucky! The good news is that those results are valid! I trained using the hand-labelled data using the same hyper-parameter search I’ve been using for the recent experiments and the results are great! The mean test accuracy was 71% and the range of average f1 scores for test sets were 68% to 73%. Thus I’ll only be aiming to get near 70% for this, at best, but again those results are not likely considering they collapse multiple aesthetics of the Twitter (or in person) audience. The current best performance on Twitter data is ~60% accuracy.

Zombie Formalist ML – Chance

After a few more experiments hoping to get a test validation close to the 63% I achieved recently, I realized I had not tried to run that same experiment again with the same hyperparameters. The only difference would be the random seeds used for initial weights and which samples end up in the test and validation sets. So I re-ran the recent test (using 3 as the twitter score threshold) and the best performing model, in terms of validation accuracy, achieved test validation of… 43%. Validation accuracy was quite consistent; being previously 73% and now being 71%. So the lesson is that I need to run each experiment multiple times with different test sets to know how well it is *actually* working because apparently my best results are merely accidentally generalizable test sets or favourable initial weights.

The next step is going to reset to using the F data set filtering for only the Twitter uploaded compositions and seeing how much variation there is in the test validation when using low twitter score thresholds. It is certainly an issue that a composition may have no likes not because it’s unliked but because it was not seen by anyone on Twitter. Perhaps I should consider compositions liked by only one person “bad” and those with greater than one “like” good; that way I’m only comparing compositions that have certainly been seen!

Final Enclosure Design!

This is the design going to the fabricator! It’s nice that things are finally moving after all the challenges finding a new designer and fabricator during COVID. The main changes to this design is that the fabricator requires a 3/8″ gap between all holes and bends, which means shifting things quite a bit. It also means changing the top where the camera and buttons are mounted.

I also thought I would take this chance to double check my calculations for the camera angle, and it’s good I did because they were incorrect! I interpreted the camera angle being 70° horizontal, but it was actually diagonal so I had to recalculate the field of view for the sensor to make sure the monitor does not block it. Short version, the vertical angle of view was 45°, not the 35° previously specified.

Zombie Formalist ML: High Thresholds for Twitter Engagement

Following from my previous post, I used the same approach to change the thresholds for how much Twitter engagement is required for a composition to be “good”. The following table shows the result where the “TWIT Threshold” is the sum of likes and RTs for each composition. Of course, the increasing threshold decreases the number of “good” samples significantly; there are 880 “good” samples in Threshold 1, 384 in Threshold 2, and 158 in Threshold 3. (This is comparable to the number of samples using attention to determine labels.) The small number of samples in high threshold is why I did not try thresholds higher than 3.

TWIT Threshold:123
Test Validation: 53%58%63%

Interestingly the results show the opposite pattern as observed using attention to generate labels where test validation accuracy increases as the threshold increases, It seems twitter engagement scores are actually much more accurate than those using the attention data. It makes sense that explicitly liking and RTing on Twitter is a better signal for “good”, even though it collapses many more peoples’ aesthetic. Indeed some would argue there are global and objective aesthetics most of us agree on, but I’m less convinced.

I also did a series of experiments using the amalgamated data-set (where the ZF code changed between subsequent test sessions) and the same twitter thresholds (with 1399 “good” in Threshold 1, 560 in Threshold 2, and 228 in Threshold 3) showed only a 2% difference in test accuracy that peaked at 58% test accuracy. Another experiment I was working on was a proxy for a single style ZF that would generate only circles, for example. This would be reducing some of the feature vector params and potentially increase accuracy as it would be an “apples to apples” comparison for the audience. This also involves reducing the amount of samples as, for example, “circles” are only 1/3 of a whole data set. Doing this for the F integration test resulted in a best accuracy of around 60% (where Threshold 3 has 129 “good” samples) and I’m considering doing the same with the amalgamated training set, which contains 1438 circle samples that were uploaded to Twitter, compared to the 898 that are included in the most recent integration test. Looking back at the amalgamated data-set, it actually has about the same number of circle compositions with high twitter scores as the F data-set, so no point in going back to that for more samples!

Through all of this it seems clear that online learning of viewer aesthetics from scratch would take a very very long time and perhaps shipping the project with a starting model based on Twitter data collected to date is the best approach. The Zombie Formalist has been on Twitter for about a year and over that time generated 15833 compositions, only slightly more than my initial hand-labelled training set of 15000 compositions, for which my best test accuracy was 70% (but I’ve done some feature engineering since then).

Zombie Formalist ML: High Thresholds for Attention

Looking at my data I noticed that there were quite a few weak compositions in the top 50 greatest attention set for the still-collecting F integration test. Some of these were due to outlier levels of attention caused by a false positive face detection in the bathroom, others seem to be either a change of heart, or my partner’s aesthetic. Since there seemed to be some quite poor results, I wondered about changing the attentional threshold to generate labels where “good” only if they received a lot of attention. The results are that the higher the threshold, the fewer the samples and the poorer the generalization:

ATTN Threshold100150200
Test Set Accuracy:56% 53% 45%

Next I’ll try the same thing with a few different thresholds for the Twitter engagement (likes and retweets). I have lower expectations here because there are is potentially much greater variance in aesthetics preferred by the Twitter Audience. At the same time, the Twitter audience is more explicit about their aesthetic since they need to interact with tweets.

Machine Learning of Parameter Groups and the Impossibility of Universal Aesthetic Prediction

Since I’ve been having trouble with generalizing classifier results (where the model achieves tolerable accuracy on training, and perhaps validation, data but poorly on test data) I thought I would throw more data at the problem; I combined all of the Twitter data collected to date (even though some of the code changed between various test runs) into a single data-set. This super-set contains 12861 generated compositions, 2651 of which were uploaded to twitter. I labelled samples as “good” where their score was greater than 100 (at least one like or RT and enough in person attention to upload to twitter). After filtering outliers (twice the system “saw” a face where there was no face, leading to very large and impossible attention values) this results in 1867 “good” compositions. When balancing the classes, the total set ends up with 3734 “good” and “bad” samples. Still not very big compared to my hand-labelled 15,000 sample pilot set, which contained 3971 “good” compositions. The amalgamated super-set was used for a number of experiments as follows.

Read more

Revisiting ML for Zombie Formalist

Since my past post on ML for the ZF, I’ve been running the system on Twitter and collecting data. The assumption being that the model’s lack of ability to generalize (work accurately for the test set) is due to a lack of data. Since classes are imbalanced, there are a lot of “bad” compositions compared to “good” ones, I end up throwing out a lot of generated data.

In the previous experiment I balanced classes only by removing samples that had very low attention. I considered these spurious interactions and thought they would just add noise. That data-set (E) had 568 good and 432 bad samples. The results of this most recent experiment follow.

Read more

Draft of Enclosure Design Ready for Quote Requests!

This most recent iteration of the case design is very close to finalized! There are still some tweaks, but I’m confident not too many changes will be needed. I’ve already sent this design off to a few local fabricators and only then will I have a good sense of where my budget lands and how many painting appropriation prints I can make!

Test Prints on Canvas!

I got some test prints from my printer! The images above are #19 (top) and #4 (bottom). #4 looks pretty fantastic; the blacks are quite deeps and the whites quite bright; visually comparing with my Endura Metallic prints, the blacks are a little lighter but the whites are quite close. I was a little concerned about the (relatively) low resolution of these works both due to the source images and also due to the slowness of processing. Looking at the digital file you can see a little banding due to the subtle gradients, but these look very seamless and the texture of the canvas certainly contributes to the smoothness.

While #19 was quite popular in my Twitter pole, it seems to fall quite flat on canvas; I think the luminosity contrast is too low. Looking at the luminosity contrast of the other short-listed compositions, it looks like #22, #24, and perhaps #3 could also fall quite flat. If I choose not to print those, I would eliminate the more contemporary paintings including cubist and surrealist pieces. The remainder source paintings were made from 1517 to 1633, so quite a narrow window. I’m unsure how to proceed, but I think I’ll need more test prints. I also did not include some of these in my video versions, so I’ll do some of that work next.

Enclosure Design!

This aspect of the project has been quite slow and I have not been up to date on the blog; my last post was when I finished my first sketchy drawing in December! The company I had originally gotten a quote from no longer was able to do the job, which included technical drawing, design and fabrication in wood and metal. I approached quite a few companies but no one was able to do all aspects of the job and / or did not want to take on the design task.

After desperate searching my partner suggested I ask a friend of hers and Robert Billard has taken on the design and technical drawing task! This is a real favour since an architect is far over qualified for a small job like this. Thanks to him, this part of the project is finally moving and I should be able to get realistic quotes for the metal fabrication job! The images following show various renderings of the enclosure through a number of iterations; they are incomplete, but do give a sense of progress from older (top) to newer (bottom).

Read more

Modifying Features for Extreme Offsets.

As each composition uses 5 layers, I wanted to create the illusion of less density without changing the number of parameters. To do this, I allow for the possibility of offsets where each layer slides completed out of view, making it invisible. This allows for compositions of only the background colour, as well as simplified compositions where only a few layers are visible.

The problem with this from an ML perspective is that the parameters of the layers that are not visible are still in the training data; this is because the training data represents the instructions for making the image, not the image itself. This causes a problem for the ML because the training data still holds the features of the layer, even if it’s not visible. I thought I would run another hyperparameter search where I zero out all the parameters for layers that are not visible. I reran an older experiment to test against and the results are promising.

Read more