Zombie Formalist ML – Chance

Posted: February 4, 2021 at 2:54 pm

After a few more experiments hoping to get a test validation close to the 63% I achieved recently, I realized I had not tried to run that same experiment again with the same hyperparameters. The only difference would be the random seeds used for initial weights and which samples end up in the test and validation sets. So I re-ran the recent test (using 3 as the twitter score threshold) and the best performing model, in terms of validation accuracy, achieved test validation of… 43%. Validation accuracy was quite consistent; being previously 73% and now being 71%. So the lesson is that I need to run each experiment multiple times with different test sets to know how well it is *actually* working because apparently my best results are merely accidentally generalizable test sets or favourable initial weights.

The next step is going to reset to using the F data set filtering for only the Twitter uploaded compositions and seeing how much variation there is in the test validation when using low twitter score thresholds. It is certainly an issue that a composition may have no likes not because it’s unliked but because it was not seen by anyone on Twitter. Perhaps I should consider compositions liked by only one person “bad” and those with greater than one “like” good; that way I’m only comparing compositions that have certainly been seen!

Final Enclosure Design!

Posted: January 20, 2021 at 5:02 pm

This is the design going to the fabricator! It’s nice that things are finally moving after all the challenges finding a new designer and fabricator during COVID. The main changes to this design is that the fabricator requires a 3/8″ gap between all holes and bends, which means shifting things quite a bit. It also means changing the top where the camera and buttons are mounted.

I also thought I would take this chance to double check my calculations for the camera angle, and it’s good I did because they were incorrect! I interpreted the camera angle being 70° horizontal, but it was actually diagonal so I had to recalculate the field of view for the sensor to make sure the monitor does not block it. Short version, the vertical angle of view was 45°, not the 35° previously specified.

Zombie Formalist ML: High Thresholds for Twitter Engagement

Posted: January 12, 2021 at 12:27 pm

Following from my previous post, I used the same approach to change the thresholds for how much Twitter engagement is required for a composition to be “good”. The following table shows the result where the “TWIT Threshold” is the sum of likes and RTs for each composition. Of course, the increasing threshold decreases the number of “good” samples significantly; there are 880 “good” samples in Threshold 1, 384 in Threshold 2, and 158 in Threshold 3. (This is comparable to the number of samples using attention to determine labels.) The small number of samples in high threshold is why I did not try thresholds higher than 3.

TWIT Threshold:123
Test Validation: 53%58%63%

Interestingly the results show the opposite pattern as observed using attention to generate labels where test validation accuracy increases as the threshold increases, It seems twitter engagement scores are actually much more accurate than those using the attention data. It makes sense that explicitly liking and RTing on Twitter is a better signal for “good”, even though it collapses many more peoples’ aesthetic. Indeed some would argue there are global and objective aesthetics most of us agree on, but I’m less convinced.

I also did a series of experiments using the amalgamated data-set (where the ZF code changed between subsequent test sessions) and the same twitter thresholds (with 1399 “good” in Threshold 1, 560 in Threshold 2, and 228 in Threshold 3) showed only a 2% difference in test accuracy that peaked at 58% test accuracy. Another experiment I was working on was a proxy for a single style ZF that would generate only circles, for example. This would be reducing some of the feature vector params and potentially increase accuracy as it would be an “apples to apples” comparison for the audience. This also involves reducing the amount of samples as, for example, “circles” are only 1/3 of a whole data set. Doing this for the F integration test resulted in a best accuracy of around 60% (where Threshold 3 has 129 “good” samples) and I’m considering doing the same with the amalgamated training set, which contains 1438 circle samples that were uploaded to Twitter, compared to the 898 that are included in the most recent integration test. Looking back at the amalgamated data-set, it actually has about the same number of circle compositions with high twitter scores as the F data-set, so no point in going back to that for more samples!

Through all of this it seems clear that online learning of viewer aesthetics from scratch would take a very very long time and perhaps shipping the project with a starting model based on Twitter data collected to date is the best approach. The Zombie Formalist has been on Twitter for about a year and over that time generated 15833 compositions, only slightly more than my initial hand-labelled training set of 15000 compositions, for which my best test accuracy was 70% (but I’ve done some feature engineering since then).

Zombie Formalist ML: High Thresholds for Attention

Posted: November 2, 2020 at 2:08 pm

Looking at my data I noticed that there were quite a few weak compositions in the top 50 greatest attention set for the still-collecting F integration test. Some of these were due to outlier levels of attention caused by a false positive face detection in the bathroom, others seem to be either a change of heart, or my partner’s aesthetic. Since there seemed to be some quite poor results, I wondered about changing the attentional threshold to generate labels where “good” only if they received a lot of attention. The results are that the higher the threshold, the fewer the samples and the poorer the generalization:

ATTN Threshold100150200
Test Set Accuracy:56% 53% 45%

Next I’ll try the same thing with a few different thresholds for the Twitter engagement (likes and retweets). I have lower expectations here because there are is potentially much greater variance in aesthetics preferred by the Twitter Audience. At the same time, the Twitter audience is more explicit about their aesthetic since they need to interact with tweets.

Machine Learning of Parameter Groups and the Impossibility of Universal Aesthetic Prediction

Posted: October 20, 2020 at 10:52 am

Since I’ve been having trouble with generalizing classifier results (where the model achieves tolerable accuracy on training, and perhaps validation, data but poorly on test data) I thought I would throw more data at the problem; I combined all of the Twitter data collected to date (even though some of the code changed between various test runs) into a single data-set. This super-set contains 12861 generated compositions, 2651 of which were uploaded to twitter. I labelled samples as “good” where their score was greater than 100 (at least one like or RT and enough in person attention to upload to twitter). After filtering outliers (twice the system “saw” a face where there was no face, leading to very large and impossible attention values) this results in 1867 “good” compositions. When balancing the classes, the total set ends up with 3734 “good” and “bad” samples. Still not very big compared to my hand-labelled 15,000 sample pilot set, which contained 3971 “good” compositions. The amalgamated super-set was used for a number of experiments as follows.


Revisiting ML for Zombie Formalist

Posted: October 1, 2020 at 11:30 am

Since my past post on ML for the ZF, I’ve been running the system on Twitter and collecting data. The assumption being that the model’s lack of ability to generalize (work accurately for the test set) is due to a lack of data. Since classes are imbalanced, there are a lot of “bad” compositions compared to “good” ones, I end up throwing out a lot of generated data.

In the previous experiment I balanced classes only by removing samples that had very low attention. I considered these spurious interactions and thought they would just add noise. That data-set (E) had 568 good and 432 bad samples. The results of this most recent experiment follow.


Draft of Enclosure Design Ready for Quote Requests!

Posted: October 1, 2020 at 10:32 am

This most recent iteration of the case design is very close to finalized! There are still some tweaks, but I’m confident not too many changes will be needed. I’ve already sent this design off to a few local fabricators and only then will I have a good sense of where my budget lands and how many painting appropriation prints I can make!

More test prints!

Posted: September 3, 2020 at 6:03 pm

Since I was on the fence about the two test prints I had previously done, I thought I should make smaller prints of all of the remaining short-listed paintings to do the final selection.

Test Prints on Canvas!

Posted: July 31, 2020 at 4:16 pm

I got some test prints from my printer! The images above are #19 (top) and #4 (bottom). #4 looks pretty fantastic; the blacks are quite deeps and the whites quite bright; visually comparing with my Endura Metallic prints, the blacks are a little lighter but the whites are quite close. I was a little concerned about the (relatively) low resolution of these works both due to the source images and also due to the slowness of processing. Looking at the digital file you can see a little banding due to the subtle gradients, but these look very seamless and the texture of the canvas certainly contributes to the smoothness.

While #19 was quite popular in my Twitter pole, it seems to fall quite flat on canvas; I think the luminosity contrast is too low. Looking at the luminosity contrast of the other short-listed compositions, it looks like #22, #24, and perhaps #3 could also fall quite flat. If I choose not to print those, I would eliminate the more contemporary paintings including cubist and surrealist pieces. The remainder source paintings were made from 1517 to 1633, so quite a narrow window. I’m unsure how to proceed, but I think I’ll need more test prints. I also did not include some of these in my video versions, so I’ll do some of that work next.

Enclosure Design!

Posted: July 28, 2020 at 1:00 pm

This aspect of the project has been quite slow and I have not been up to date on the blog; my last post was when I finished my first sketchy drawing in December! The company I had originally gotten a quote from no longer was able to do the job, which included technical drawing, design and fabrication in wood and metal. I approached quite a few companies but no one was able to do all aspects of the job and / or did not want to take on the design task.

After desperate searching my partner suggested I ask a friend of hers and Robert Billard has taken on the design and technical drawing task! This is a real favour since an architect is far over qualified for a small job like this. Thanks to him, this part of the project is finally moving and I should be able to get realistic quotes for the metal fabrication job! The images following show various renderings of the enclosure through a number of iterations; they are incomplete, but do give a sense of progress from older (top) to newer (bottom).


Painting Short List After Epoch Training!

Posted: July 6, 2020 at 2:37 pm

Since I revisited many of the paintings and used the epoch training method used for the videos, I’ve made a longer revised short list of paintings and here they are all together:

Painting #24 with Epoch Training

Posted: July 6, 2020 at 2:16 pm

This painting did not make the previous short list due to the patchy colour (bottom image); I thought I would go back to it with epoch training, and I’m quite happy with the results!

Revisiting Painting #22 with Epoch Training

Posted: June 29, 2020 at 4:15 pm

Modifying Features for Extreme Offsets.

Posted: June 29, 2020 at 4:12 pm

As each composition uses 5 layers, I wanted to create the illusion of less density without changing the number of parameters. To do this, I allow for the possibility of offsets where each layer slides completed out of view, making it invisible. This allows for compositions of only the background colour, as well as simplified compositions where only a few layers are visible.

The problem with this from an ML perspective is that the parameters of the layers that are not visible are still in the training data; this is because the training data represents the instructions for making the image, not the image itself. This causes a problem for the ML because the training data still holds the features of the layer, even if it’s not visible. I thought I would run another hyperparameter search where I zero out all the parameters for layers that are not visible. I reran an older experiment to test against and the results are promising.


Revisiting Painting #19 with Epoch Training

Posted: June 24, 2020 at 6:48 pm

Revisiting Painting #9 with Epoch Training

Posted: June 22, 2020 at 9:25 pm

Classifying Using Only Twitter Data as Label with Incidental Filtering and Class Balancing.

Posted: June 19, 2020 at 11:50 am

For this experiment I used the Twitter data (likes and retweets) alone to generate labels where ‘good’ compositions have at least 1 like or retweet. There are relatively few compositions that receive any likes or retweets (presumably due to upload timing and the twitter algorithm). Due to this, I random sample the ‘bad’ compositions to balance the classes, leading to 197 ‘good’ and 197 ‘bad’ samples. The best model archives an accuracy of 76.5% for the validation set and 56.6% on the test set. The best model archived f1-scores of 75% (bad) and 78% (good) for the validation set and 55% (bad) and 58% (good) for the test set. The following image shows the confusion matrix for the test set. The performance on the validation set is very good, but that does not generalize to the test set, likely because there is just too little data here to work with.

I was just thinking about this separation of likes from attention and realized that since compositions with little attention don’t get uploaded to twitter, they certainly have no likes; I should only be comparing compositions that have been uploaded to twitter if I’m using the twitter data without attention to generate labels. The set used in the experiment discussed herein contains 320 uploaded compositions and 74 compositions that were not uploaded. I don’t think it makes sense to bother with redoing this experiment with only the uploaded compositions because there are just too few samples to make any progress at this time.

In this data-set 755 compositions were uploaded and 197 received likes or retweets. For the data-collection in progress as of last night 172 compositions have been uploaded and 86 have received likes or retweets. So it’s going to be quite the wait until this test collects enough data to move the ML side of the project forward.

Classifying Using Only Attention as Label with Incidental Filtering

Posted: June 18, 2020 at 7:11 pm

The results from my second attempt using the attention only to determine label and filtering out samples with attention < 6 are in! This unbalanced data-set has much higher validation (74.2%) and test (66.5%) accuracies. The f1 scores archived by the best model are much better also: For the validation set 36% (bad) and 84% (good) and for the test set 27% (bad) and 78% (good). As this data-set is quite unbalanced and the aim is to predict ‘good’ compositions, not ‘bad’ ones, I think these results are promising. I thus chose not to balance the classes for this one because true positives are more important than true negatives so throwing away ‘good’ samples does not make sense.

It is unclear whether this improvement is due to fewer bad samples, or whether the samples with attention < 6 are noise without aesthetic meaning. The test confusion matrix is below, and shows how rarely predictions of ‘bad’ compositions are made, as well as a higher number of ‘bad’ compositions predicted to be ‘good’.

Classifying Using Only Attention as Label.

Posted: June 18, 2020 at 4:39 pm

Following from my previous ML post, I ran an experiment doing hyperparameter search using only the attention data, ignoring the Twitter data for now. The results are surprisingly poor with the best model achieving no better than chance accuracy and f1 scores on the test set! For the validation set, the best model achieved an accuracy of 65%. The following image shows the confusion matrix for the test set:

The f1 scores show that this model is equally poor at predicting good and bad classes: The f1 score for the validation set was 67% for bad classes and 62% for good. In the test set the f1 scores are very poor at 55% for the bad class and 45% for the good class.

As I mentioned in the previous post, I think a lot of noise is added with incidental interactions where someone walks by without actually attending to the composition. Watching behaviour around it, I’ve determined that attention values below 6 are very likely to be incidental. I’m now running a second experiment using the same setup as this one except where these low attention samples are removed. Of course this unbalances the data-set, in this case in favour of the ‘good’ compositions (754) compared to ‘bad’ compositions (339). As there is so little data here I’m not going to do more filtering of ‘good’ results to balance classes. After that I’ll repeat these results with the Twitter data and see where this leaves things.

Revisiting Painting #5 with Epoch Training

Posted: June 18, 2020 at 10:47 am

Revisiting Painting #4 with Epoch Training

Posted: June 14, 2020 at 9:08 am

Returning to Machine Learning with Twitter Data

Posted: June 10, 2020 at 3:31 pm

Now that I have the system running, uploading to Twitter and collected a pretty good amount of data, I’ve done some early ML work using this new data set! I spent a week looking at doing this as a regression (predicting scores) task vs a classification (predicting “good” or “bad” classes). The regression was not working well at all and I abandoned it; it was also impossible to compare results with previous classification work. I’ve returned to framing this as a classification problem and run a few parameter searches.


Pausing the Zombie Formalist: Stripes Fixed!

Posted: June 4, 2020 at 10:43 am

The Zombie Formalist is taking a break from posting compositions to Twitter to create space for, amplify, and be in solidarity with black and indigenous people being facing death, violence and harassment as facilitated by white colonial systems.

I took this pause in generation to tweak the code that generates stripes. Now the offsets don’t cut off the stripes because they use the frequency to determine appropriate places to cut (troughs). The following image shows a random selection of images using the new code. This change replaced a lot of work-around code (blurring, padding, etc.) and resulted in opening up aesthetic variation that was not previously possible.

Revisiting Painting #3 with Epoch Training

Posted: May 18, 2020 at 7:45 am

All of these results are looking consistently better so I think I’m just going to post the new progress (on top) and the previous best result (below) for comparison from now on.

Revisiting Painting #2 with Epoch Training

Posted: May 13, 2020 at 10:28 am

I’m quite happy with the results of the epoch training on the previous results! My favourite latest selection is the large image below. Under it is the previous best result on the left with another exploration using epoch training on the right. The top image is structurally equivalent to the previous results except without the artifacts and greater smoothness, which has been the case for all the epoch training explorations.

#1 Final Refinements

Posted: May 11, 2020 at 8:18 am

I’m torn between these two options. While the top is less organized (and thus more resembles the original) it’s structure is less central and the shift of the bright area from the chest to the top of the head is quite nice.

Revisiting #1 Appropriation Using Epoch Training.

Posted: May 1, 2020 at 10:47 am

Following from the previous post, I ran a test with a different training procedure. Previously I had been doing the canonical SOM training where the neighbourhood starts large and shrinks monotonically over time. For the videos, I want an increase of the degree of reorganization over time, so I train over a number of epochs where the starting neighbourhood size for each epoch increases over time. Within each epoch, that maximum neighbourhood size still shrinks for each training sample. In this test, results pictured in the large image below along with previous results underneath, I do multiple epochs where the maximum neighbourhood stays the same for every epoch.


Stills vs Videos: Painting Appropriation

Posted: April 1, 2020 at 11:46 am

After doing a little work on the painting appropriation videos I’m realizing that the very soft boundaries that I’ve been after for the stills just happen in the videos, “for free”. The gallery below shows the video approach (right) next to the finalized print version (left). Note The lack of reorganization (areas of contrasting colour) in the still versions; e.g. the green and purple in the upper right quadrant of the top left next to the bright blob in the centre.


Painting #1 Appropriation Video

Posted: March 31, 2020 at 2:04 pm

Painting #1 Appropriation Video Work in Progress

Posted: March 25, 2020 at 11:45 am

Now that the final selection of paintings has been made I’ve been able to start working on the video works. These are videos that show the deconstruction (abstraction) of paintings by the machine learning algorithm. Pixels are increasingly reorganized according to their similarity over time. The top gallery shows my finalized print (left) along with a few explorations at HD resolution that approximate it. These are “sketches” of the final frames of the video.

The image below shows the actual final frame of the video. As each frame is the result of an epoch with a different neighbourhood size (that determines the degree of abstraction / reorganization) from smallest (least abstract) to largest (most abstract) the final structure is more spatially similar to the original because there is no initial disruption due to large initial neighbourhood sizes.

I think I can get around this by training for more iterations, as the larger neighbourhoods will have a greater effect with more iterations. The question is whether I should continue with the same neighbourhood size (168) used to generate the sketches above, or whether I should continue the rate of increase from the first set of frames (2168 in 2675 steps). The latter seems most consistent with the rest of the training process, so I should go with that. I just need to make changes to the code to allow “resuming” a sequence by starting with a frame part way through. Luckily, I saved the weights of the network for each frame so that is possible without loosing precision.

A plus of this video approach is that the images are far more smooth than they are as stills, which makes me wonder if ruled out paintings would actually make strong videos.