Early Full-Resolution Explorations

Posted: August 18, 2019 at 1:05 pm

Starting from the lowest resolution images of the 7 short-listed, I’ve been exploring using them at full resolution. Using the previous parameters for the intermediary resolutions, I was unable to get any strong results, see below. I’m wondering if colour diversity tends to result in images that are poorer… The main aesthetic weakness is the hard edges that manifest, even though the neighbourhood function has Gaussian edges. This was not seen, at least to the same degree, in the expanded intermediary resolution explorations. I’m currently computing a full resolution version of #5 (intermediary, original) and hope it’s more successful.

Fewer Histogram Features

Posted: August 18, 2019 at 9:12 am

After the lack of success in the previous experiment using the 768 element vector, I have the results of the 96 histogram bin experiment. During the search, Talos reported a peak validation accuracy of 73.3%. The best model reported a validation accuracy of 66.4% and a training accuracy of 99.7%. Clearly the model is learning the training set well, but again not generalizing to the validation set. The following image shows the confusion matrix for the validation set. I note that there is no appreciable difference between 1000 and 10,000 epochs to validation accuracy.

Expanded Intermediary Resolution Explorations

Posted: August 16, 2019 at 2:27 pm

The following images were computed over night using the same params as in the previous post. The training time is significantly longer than estimated, due to the larger number of pixels (due to aspect ratio), so only three were generated at the time of writing. While these results are going in the right direction, they are still too similar to the original compositions (with the exception of 07, lower right) and need further abstraction (increase of neighbourhood size). I emailed the author of the GPU accelerated SOM I’m using and see if he can reproduce these spire effects. Since the number of iterations has such a significant effect, it seems I should be working image by image at full resolution. As inefficient as I may be that seems like the next step; I’ll prioritize the lowest resolution images for exploration sake!

Intermediary Resolution Explorations

Posted: August 15, 2019 at 4:47 pm

I’m thinking that it makes the most sense to move up in resolution and do some experimentation at each resolution until the desired resolution is reached. It will be clear from this post that the quality of the aesthetic changes significantly at various resolutions. In order to prevent the image from approaching a gradient with such a high number of training iterations (required to provide a good sampling of the underlying diversity of the original painting), I’ve been using very small neighbourhood sizes. The image below is my best choice and it’s trained over 0.5 epochs (half the pixels) and a neighbourhood of 35px. At HD resolution, this image takes 2.5 hours to compute. If you look carefully, you’ll see some dark ‘spires’ growing from the lower left that look to be the same as those I encountered during the development of “As our gaze peers off into the distance, imagination takes over reality…” (2016). I still have no explanation of them…

For comparison, I’ve included the original image and the low resolution sketch below. At the bottom of this post images show the other neighbourhood sizes I experimented with (left: 78px; right: 150px), and rejected due to their over-abstraction.


Decomposed Survey of Long-Listed Paintings

Posted: August 14, 2019 at 6:00 pm

I realized that I would not be able to get a survey of images that at least sketch out how they may look without down-scaling significantly. I’ve reduced the resolution of my working files from fitting in an HD frame down to 10% and calculated SOMs where the number of iterations matches the number of pixels. I’m quite happy with the quality of these results! Only a few seem quite weak to me, due to (a) the lack of diversity (which is exaggerated by the brutal down-sampling here) or (b) a lack of colour restraint. The images below are in the same order as the painting long-list post.

Painting Decomposition by SOM: Initial Work in Progress

Posted: August 14, 2019 at 5:00 pm

While Talos is searching for suitable models for the Zombie Formalist, I’ve started experimenting with revisiting the painting appropriation side of the project. For the initial exploration, I’m using da Vinci’s “Mona Lisa” (1517).

The following images are various explorations of abstracting the above image using the SOM to reorganize constituent pixels. Through exploring these I realized that one of the greatest influences on the quality of the result is the random sampling of pixels. The working image is 1080×1607 pixels, which means 1,735,560 training samples. In my tests using ~20,000 training iterations, only a small subset of the diversity of those pixels influence the resulting image. In these tests, I realized the most successful results are those that happen to select (randomly) a large diversity of pixels to train the SOM. The same parameters can produce very different results:

I think the image on the left is more successful because it happened to select a few brighter pixels in the original. I can produced better results by down-scaling the image to increase the diversity of pixels selected by random sampling, but that is not ideal since I’m limiting both the output resolution and the diversity of data used in training. It seems I should stick with the number of iterations that equal (at least) the number of training samples (the number of pixels in the original). Looking again at my old code, I did not realized I had fixed the neighbourhood function; in all the images below, the only variable that effects the output is the number of iterations.


3,000,000 Training Iterations using Close and Wide Collections.

Posted: August 14, 2019 at 3:34 pm

Combining Close and Wide TRIUMF Collections in a Single Composition.

Posted: August 14, 2019 at 11:29 am

The following images were generated by combining the segments from both collections of photographs (wide and close). There are a total of 135,226 segments inclusive of both collections. The top image is under-trained over only 50,000 iterations (meaning that ~2/3s of the segments were not presented to the network). The bottom image was trained over 150,000 iterations.

2 Million Training Iterations using Wide TRIUMF Collection

Posted: August 13, 2019 at 8:48 pm

Not seeing improvement with hist features.

Posted: August 13, 2019 at 2:09 pm

It took nearly 10 days for Talos to search possible models using the 768 item vector representing the colour histogram for each composition. The best validation accuracy listed by the search was 68.5% and the best model 66.2%. The best model achieved a training accuracy of 77.9%. 465 bad compositions were predicted to be bad, 294 bad compositions were predicted to be good, 232 good compositions were predicted to be bad and 568 good compositions were predicted to be good.

This is a very minor improvement from the variance features. The low training accuracy indicates there may not be enough epochs for such a large dimensional vector. I’m now running a second experiment where the 768 bin (256 bins per channel) histogram is reduced to a 96 bins (32 bins per channel). This is more comparable to the initial 57 element training vectors. If the problem is the size of the vector, this should allow for higher training accuracy and I hope, also better generalization in the next search.

TRIUMF Wide Collection

Posted: August 12, 2019 at 9:39 pm

Up to this point I’ve been working with half the photos I shot at TRIUMF, the close-up ones. Today I started working with the medium and wide shots that show larger scale structures, architecture, etc. Rather than ~57,000 segments, the density of the wider images resulted in ~77,000 segments. I think these images are the most successful yet, balancing abstraction and photo-realism as well as order and complexity. The composition ends up with larger areas of colour due to the larger areas of colour at the architectural scale. This was generated with 50,000 iterations and I’m now training a 2,000,000 iteration version.

Sorting by Area and More Training Iterations.

Posted: August 12, 2019 at 9:19 am

The following image and details shows the result of a smaller neighbourhood function (1/10 of SOM width) after 2,000,000 training iterations. I’ve also rendered the collage in the descending order by area such that the largest segments are rendered behind the smaller segments. This increases the sense of flow, but I don’t think the very small neighbourhood improves things. I still think the images are more successful when they are more chaotic and I’m training a network on fewer iterations to see what the results look like. With the larger area images in the background, the tension between abstraction and photo-realism is lost. The resulting density of textures are very interesting though.

Hue and Orientation Features

Posted: August 11, 2019 at 8:23 pm

Using a fitted ellipse for each segment, I’ve now included orientation features. This results in images such as the following that feel like they are really going in the right direction. The top one in particular cues magnetic fields, which is very apt. The bottom image uses a larger neighbourhood function, which leads to a smoother more organized macro-structure; I prefer the top image with more turbulence. I’m now training a version of the top with more iterations to see where that goes.

Revisiting The Robin Collection

Posted: August 11, 2019 at 11:14 am

After the early success using the hue histogram features on the TRIUMF collection, I thought I’d go back to the Robin collection. The results are certainly better than the initial BGR collage, but the muted natural tones and the organic quality of the segments leads to a composition that does not seem to balance order and disorder the way I would like; it’s a little too messy. I’ve included the full frame version with a few full resolution details. I’ve also posted a version at half resolution where the same-sized segments appear twice as large relative to frame.

Collages from TRIUMF Shoot

Posted: August 10, 2019 at 8:14 pm

I’ve made some progress on using the new TRIUMF photographs as material for new collages using the same set of segments. The image on the top is using simple BGR features, and the image on the bottom (and corresponding details) is using a 64 bin histogram of each segment’s hue channel as features. The BGR feature image was trained over 2,000,000 iterations while the hue histogram image was trained over 50,000 iterations; both images use a max neighbourhood size of 0.2. I’m going to also try exploring some orientation features. I’m now training a 500,000 iteration version.

Early Work-in-Progress on a New Collage Work

Posted: August 9, 2019 at 4:28 pm

I started working through some ideas for a new collage following from my previous works using cinematic material. Robin Gleason donated some photos of her material collections to start with. I think the main issues are that

  1. The diversity of tones in a photograph means there is much more detail than appears and when one resorts components by colour, we end up with something that often resembles a gradient.
  2. The quality of the edges from this organic source material means there is little meta-structure to appreciate and the size of segments means their content becomes merely texture and looses all photographic realism.

It will be interesting to see whether the hard-edge apparatus photographs will allow the preservation those hard edges. Also I’ll be going from 22 photographs to over a 100, so the size of segments can be increased (in theory). The following images shows a full-resolution collage and a few details; the ~50,000 segments were organized by mean colour similarity using a under-trained Self-Organized Map (SOM). I also included a few other visualizations of some SOM (not painted using the segments) results that show the lack of interesting structure. I also plan to explore using features other than mean colour, which should allow for more complexity.



Posted: August 9, 2019 at 10:17 am

On Wednesday I had the opportunity to spend a couple hours amongst the TRIUMF beam lines to take photographs for the project. I’m just posting a few photos here of the scrap area behind the shop, where Sarah Dunsiger, Robin Gleason and Karen Kazmer were doing a material exploration of the scrap materials.


Variance features result in even lower validation accuracy.

Posted: August 4, 2019 at 1:31 pm

The quick variance features were easy to implement, but provided no improvement and performed worse than the previous features. The parameter search resulted in a peak validation accuracy of 64.1% while the best model achieved 66% accuracy on training data and 62.1% on validation data. The following image shows the confusion matrix for validation data. I’m next going to generate colour histograms for the 15000B compositions and see if leads to any improvement.

Long list of paintings for appropriation.

Posted: August 3, 2019 at 1:49 pm

With all this focus on the Zombie Formalist I’ve been spending some of the ML search time researching for the painting history appropriation aspect of the project. I’ve narrowed down a long list of paintings based on popularity and their trajectory from Northern European Renaissance realism to modern problematizations of realism; I’ve selected works from the Renaissance, Cubism and Surrealism, as follows. Thumbnail images are included below the table.

The next step for this component of the project is to do some ML to reorganize the pixels and see what works best. The resolution some of the sources are quite high, quite low for others. It’s yet unclear how to consider the scale of the originals in the appropriation works as some are very large. I’m also not sure how large I will be able to go with the self-reorganization process.

Leonardo da VinciSalvator Mundi1500
MichelangeloThe Creation of Adam, Sistine Chapel ceiling1512
Leonardo da VinciMona Lisa1517
CaravaggioThe Conversion of Saint Paul1601
RembrandtThe Anatomy Lesson of Dr Nicolaes Tulp1632
RembrandtThe Storm on the Sea of Galilee1633
RembrandtThe Night Watch1642
Georges BraqueViolin and Palette (Violon et palette, Dans l’atelier)1909
Albert GleizesPortrait de Jacques Nayral1911
Georges BraqueNature Morte (The Pedestal Table)1911
Jean MetzingerLe goûter (Tea Time)1911
Albert GleizesL’Homme au Balcon (Man on a Balcony)1912
DuchampNude Descending a Staircase No. 21912
Fernand LégerLes Fumeurs (The Smokers)1912
Georges BraqueBottle and Fishes1912
Georges BraqueMan with a Guitar (Figure, L’homme à la guitare)1912
Juan GrisPortrait of Pablo Picasso1912
Juan GrisNature morte à la nappe à carreaux (Still Life with Checked Tablecloth)1915
Juan GrisGlass and Checkerboard1917
Yves TanguyMama, Papa Is Wounded1927
Rene MagritteThe Human Condition1933
Yves TanguyThrough birds through fire but not through glass1943
Rene MagritteThe Son of Man1964

No Improvement with New Features

Posted: July 31, 2019 at 5:22 pm

Running a scan of hyperparameters over 145 models resulted in no improvement of the 70% validation accuracy. (Well, actually one model reported 74% validation accuracy during the search, but was not saved as the “best model” by talos.) Below are the confusion matrices for both training and validation sets. Based on these results, it’s time to change to other features. I’ll try calculating the variance of each feature across layers first, since that’s pretty easy to implement, and resort to colour histograms of images if that leads no where.


New Features with new Dataset.

Posted: July 16, 2019 at 6:03 pm

Following from my last post I finished generating and labelling a new dataset. I’m right now rerunning the previous experiment in talos to see if this new data-set makes any change. An initial look at a few of the distributions of my features, good and bad compositions remain quite evenly distributed but a second look shows there is some uneveness to the distribution of some features, such as offset:

More offsets near 0 were labelled to be bad. Those two spikes are also quite far from an even distribution.

The new dataset is also 15,000 items including 3921 “good”, 3872 “bad” and 7207 “neutral” labels. Following is a random sampling of good and bad compositions from the new dataset:


Karen Barad

Posted: July 2, 2019 at 6:35 pm

After months I’ve finally finished reading the Karen Barad papers that where provided as part of their symposium at UBC. The following is my notes from the symposium, as well as my responses to the readings. These are lightly edited and clarified, and if I’m inspired to respond to the notes, I’ll include that in square brackets.

Troubling Time/s and Ecologies of Nothingness, Re-turning, Re-remembering, and Facing the Incalculable.


Deep Networks provide no increase of validation accuracy.

Posted: June 12, 2019 at 4:20 pm

After doing quite a bit more learning I used talos to search hyperparameters for deeper networks. I ran a few experiments and in no case could I make any significant improvement to validation accuracy from the simple single hidden layer network. While there is some improvement tuning various hyperparameters, all the tested network configurations resulted in validation accuracy ranging from 61% to 73% (60% to 100% for training data). The following plot shows the range of validation accuracy over the number of hidden layers. Note the jump from 1 to 2 hidden layers does increase validation accuracy, but only an mean increase of 0.3%.

The confusion matrix for the best model is about the same as it was for the single hidden layer model first trained in keras (without hyperparameter tuning!):

Through the last couple of weeks I have made no significant gains with keras. So the problem is clearly my features. Everything I’m seeing seems to indicate that my initial fears in regards to the lack of separability in my initial T-sne results. So I have a few ideas on how to move forward:

  1. Rather than using the raw features for classification, do some initial stats on those features and use those stats for training. This only effects features that can be grouped, e.g. stats on the set of colours of all layers in a composition. Two ideas are variance of such groups of features, or full histograms for each group of features.
  2. Since my features are normalized, they all have the same range. This means that regardless of their labels, all features will have the same stats, making #1 moot! So it looks like I should convert my code so that the features are the actual numbers used to generate compositions and not these 0-1 evenly distributed random numbers. This means generating and labelling a new data-set.

Reproducing previous R results in Keras.

Posted: May 31, 2019 at 6:35 pm

After spending so much time in R trying to get a simple network to work I’ve made the jump into Keras, sklearn, scipy, etc. in order to build deeper networks. Work flow is a lot more awkward, but I’ve managed to figure our the key metrics (categorical accuracy and confusion matrices) and reproduced the previous results in R (note, I did not use the indexical features in the keras model).

Comparable to the R model, a single 52 unit hidden layer network trained on 80% of the data over 300 epochs achieved an accuracy on the training data of 99% and an accuracy on the validation data of 72% (73% in the previous model). There were 105 ‘bad’ samples predicted to be ‘good’ (compared to the previous 101) and 113 ‘good’ samples predicted to be ‘bad’ (compared to the previous 109). The following images show the confusion matrices for training and validation data, respectively.

Training with indexical features.

Posted: May 23, 2019 at 10:43 am

I manually transformed a subset of features of my training data to be indexical. By indexical I mean that those features are categories and not continuous. They specify aspects of the composition that are constrained to a limited number of options, such as rendering style, layer frequency and offset. Previously, all features were real numbers and it was the renderer that thresholded them into categories. I thought that perhaps the constrained numbers may provide a stronger signal associated with “good” or “bad” classes. Unfortunately, that is not the case.

Using the same 10-fold optimization as previous (see image above), the accuracy on the validation set was only 73%. 109 good compositions were predicted to be bad, and 101 bad compositions were predicted to be good (see images below). The optimization results were very different though, where 52 (vs 32 previously) hidden units and a decay of 0.5 (vs 0.1 previously) performed best. This wide range seems to indicate the network is not learning very well at all. So the next step is to jump into deep networks and see if they can manage better learning.

“Good” compositions predicted to be “bad”.
“Bad” compositions predicted to be “good”.

Cross Validation and Hyperparameter Tuning in CARET.

Posted: May 19, 2019 at 4:13 pm

The above plot outlines the results of my 10-fold cross validation for parameter tuning. The best model had 32 hidden units and a decay of 0.1. Predictions based on this ‘best’ model are still not great; The accuracy is 61% on the validation set. 30 “good” compositions were labelled to be “bad” and 31 “bad” compositions labelled to be “good”. The following images show the misclassified prediction results, good predicted to be bad (top) and bad predicted to be good (bottom).

Before I jump into training a deeper model, I have an idea for transforming my input vectors. Since I generate vectors that all range from 0-1 and have the same resolution, I wonder if the network is having trouble leaning distributions of parameters that are not real numbers; i.e. parameters like frequency, offsets and render styles are more like categorical variables. So an idea is to change those features so their resolution matches the number of possible categories

Jumping into a deeper network would involve either continuing my workflow with another DNN library or dropping R and implementing in keras and running on my CUDA setup.

Training on Larger Data Set

Posted: May 17, 2019 at 12:18 pm

I ran a couple of experiments using a split of 80% training and 20% validation on the new larger (naturally) balanced data set. Unfortunately the results are not very good. At best I got an accuracy of ~70% where 125 “good” compositions were predicted to be “bad” and 111 bad compositions predicted to be “good”. My first attempt was even worse and I tried increasing the number of hidden units without much improvement. I’ll try cross-validation approaches next before jumping into some ideas for transforming the input data. I think this new data set size should be sufficient…

15,000 Sample Data Set

Posted: May 16, 2019 at 5:00 pm

The images above show a random sampling from the “good” (top) and “bad” (bottom) subsets from the newly labelled 15,000 sample data set. In labelling this set, I kept an eye on balance between “good” and “bad” and classes are quite even in this data set. This was accomplished by being more critical of compositions labelled “bad”. There are 1986 “good” and 1915 “bad” compositions in the new set. Approximately 13% of the data is now “good” and 13% bad, much higher than the previous 5% for the 5000 sample set.

It was interesting to look at a much broader collection of compositions as I noticed a few things that I wanted to be possible are possible, albeit very rare. There were a few single colour compositions (5ish?), and some quite soft compositions. I did not see any compositions with very broad gradients (collections of low frequency stripes with low contrast and some transparency) which would be nice. I also noticed some aesthetic weaknesses in general and have planned some renderer changes to resolve these (including increasing min circle size, increasing number of subdivisions of offsets and tweak the range of sine-wave frequencies).

Fewer Balanced Samples

Posted: May 4, 2019 at 10:38 am

Training with fewer balanced samples was no help at all. Accuracy of the model dropped from 96% (model trained on duplicated “good” samples) to 65%. 15 of the good samples were predicted to be bad and 23 bad samples predicted to be good. Since I’m much more confident about these bad samples (these are the worst of the “bad”) these results are terrible. There are only 500 training samples from 5000 generated compositions, which is not a lot considering these feature vectors are very abstract. Since deeper networks require more training data, it seems clear I just need to generate more training samples. If I generate another 10,000 compositions that would result in another ~1000 training samples, bringing the total up to ~1500 (750 good, 750 bad). I think that is the most I can conceivably label; based on previous experience it would be at least a week of labelling alone.

I’m just realizing that this predictor will mark all samples as good or bad, but I know that the vast majority of inputs are neither good nor bad, so it seems I should go back to three labels (good, bad and neutral). They would still be very unbalanced though. Or should I switch gears and use a modelling method that produces a confidence for each sample?

From Merely Bad, to Really Bad.

Posted: May 3, 2019 at 5:13 pm

So I went through my ~2500 previously labelled “bad” compositions and selected the worst of the worst. I ended up with about the same number of these really bad compositions (266), as there are good compositions (287). This should help a lot with my unbalanced data problems. This does mean I generate a lot of compositions that are not good nor bad (89% of generated data). I will probably need to generate another 5000 compositions just to get another ~550 that are good or bad. After going back through the bad compositions; I did think that some of them were not that bad, so if I’m not sure about what’s bad, I certainly can’t expect a model to be! The following images show a random sampling of the previously bad (top), and the really bad (bottom).

I’ll use these new labels to train a classifier and see where that leads before I jump into cross-validation approaches.