#17 Is Quite Weak

Posted: September 24, 2019 at 6:24 pm

I can’t say I’m very happy with the results for painting #17. I supposes it’s just far too monochromatic. See explorations below.

Stacking Order by Orientation Sort.

Posted: September 24, 2019 at 6:17 pm

I thought I would try sorting the segments according to orientation, rather than area, but the results look about the same as the shuffled version.

Classification using final model

Posted: September 24, 2019 at 5:56 pm

The following two images show the classification done by the final model trained on all the data using the architecture params from hyperparameter search. I think these are slightly better than those from the previous post.

“Good” Compositions
“Bad” Compositions

Looking back through my experiments I thought I would take a crack on one more histogram feature experiment. I saw a peak validation accuracy (using the ruled out problematic method) of 75% with a 24 bin colour histogram, so I thought it would be worth a revisit.

Meeting the Universe Halfway: Chapter 4 – Agential Realism

Posted: September 23, 2019 at 3:37 pm

I finally got to reading Karen Barad’s book (titled above) and thought I would post my notes here while I reflect on them. After reading I also realized that I had gotten Bohm and Bohr confused in my notes from the Karen Barad Seminar; this has now been corrected. In parallel with the collage production one idea is to reconsider my current Artist Statement and rewrite it to be consistent with Agential Realism. Next, I think I’m going to read Chapter 7 to focus on what is meant by “entanglements”. My notes on chapter 4 are as follows:


Refinement of 3,000,000 Training Iteration Version

Posted: September 23, 2019 at 12:30 pm

Since the previous post, I’ve focused on developing of the 3,000,000 iteration version. I was not happy with the shuffled version, shown below on the right of the 3,000,000 iteration version. I prefer the balance of large photo-readable segments and small segments that emphasize flow in the left (previously posted) version.

Following this I generated a sorted version of this composition where larger segments are behind the smaller segments; this emphasizes greater flow, but at the expense of photo-readable segments being visible. I’ve included the sorted version and a few details below. I was just thinking that perhaps I could include a small subset of the large (or medium) segments in the front of the small ones by manipulating of their order in a more complex way; for example, randomly select a few segments from the large end and insert them on the small end?

Explorations with 3,000,000 Training Iterations.

Posted: September 22, 2019 at 8:57 pm

After the previous explorations I thought I would focus on the 3,000,000 iteration collages and generated two more options. I still think the previous work is the strongest. I’m going to now generate unsorted, sorted and shuffled versions of that previous composition and decide which is most successful.

Fewer Iterations and Random Shuffling

Posted: September 21, 2019 at 10:13 am

Following from previous collages I thought I would try fewer iterations (100,000) and a randomly shuffling the stacking order of percepts. I can’t say I’m happy with these results; the most recent iteration is still the strongest. I’ve included below a few of these explorations. I’m now calculating a couple variations with 3,000,000 training iterations. I’m also going to focus on Barad and (re)framing my thinking about objects in relation to how I’ve been thinking about Machine Subjectivity. This will manifest rewriting my artist statement, and I’ve also been playing with the idea of the artist statement as indeterminate where the specific language is manifested as multiple permutations.

Splits and new classified compositions!

Posted: September 20, 2019 at 7:14 pm

One thing I realized in my previous experiments was that I did not change the train/validate/test split. So I ran a few experiments with different splits, 50/25/25 was my initial choice. I tried 80/10/10, 75/15/15 and 60/20/20. My results showed that 75/15/15 seemed to work the best and I wrote some code to classify new images using that trained model. The following are the results! I think the classification is actually working quite well; a couple compositions I consider “bad” made it in there, but looking at these two sets I’m quite happy with the results.

“Good” Compositions
“Bad” Compositions

My next ML steps are:

  • finalize my architecture and train the final model
  • integrate the painting generator and face detection to run as a prototype that logs looking durations for each composition
  • run some experiments using this new dataset collected in the ‘wild’ and decide on thresholds for mapping from duration of looking to “good” and “bad” labels.
  • finally determine the best approach to running training code on the Jetson (embed keras? use ANNetGPGPU? FANN?) and implement it.

#22 Exploration and Refinement

Posted: September 20, 2019 at 10:32 am

#22 turned out quite well; I’ve included my favourite choice on top and two explorations below. Perhaps it could be a little smoother, but I think its strong enough to serve in order to do the final selection.

Histogram Features Don’t Improve Classification Accuracy

Posted: September 17, 2019 at 4:16 pm

Rerunning the grid search using the 48 bin (16 bins per channel) colour histogram features provided no classification improvement. The search reported a peak validation accuracy of 74% and 83% for the training set. The best model achieved a classification accuracy of 84.6% for training, 70.6% for validation and 72.3% for testing. The confusion matrix for the test set is as follows:

  • 649 bad predicted to be bad.
  • 319 bad predicted to be good
  • 220 good predicted to be bad.
  • 761 good predicted to be good.

So it appears I’ve hit the wall and I’m out of ideas. I’ll stick with the initial (instructional) features and see if I can manage a 75% accuracy for an initial model. Looking back at my experiments, it looks like my validation accuracies have ranged from ~62% to ~75% and test from ~70% to ~74%.

At least all this experimentation has meant that I have a pretty good idea that such a model will work on the Jetson and I will not even need a deep network. I may even be able to implement the network using one of the C++ libraries I’ve already been using like FANN or ANNetGPGPU.

No Significant Improvement Using Dropout Layers nor Changing the Number of Hidden Units.

Posted: September 15, 2019 at 6:13 pm

After the realization that the ~80%+ results were in error, I’ve run a few more experiments using the initial features. Unfortunately no improvement from the ~70% results. I added dropout to input and hidden layers (there was previously only dropout on the input layer) and changed the number of units in the hidden layer (rather than using the same number of inputs). I did not try adding a second layer because I have not seen a second hidden layer improve performance in any experiment; perhaps this is due to a lack of sufficient training samples for deep networks.

The parameter search found a validation accuracy of 73.4%, while the best model showed a validation accuracy of 73.9% and a test accuracy of 71.8%. The network was not over-fit with a training accuracy of 88.1%. The confusion matrix for the test set is as follows:

  • 658 bad predicted to be bad.
  • 291 bad predicted to be good
  • 258 good predicted to be bad.
  • 742 good predicted to be good.

I’m now running a slightly broader hyperparameter search using the 48 bin colour histogram and if I still can’t get closer to 80% accuracy I’ll classify my third (small) data set and see how it looks. In thinking about this problem I did realize that there was always a tension in this project. If the network is always learning its output will become increasingly narrow and never be able to ‘nudge’ the audience’s aesthetic into new territories; there is a need for the system to show the audience ‘risky’ designs to find new aesthetic possibilities. This is akin to getting trapped in local minima; there may be compositions the audience likes even more, but those can only be generated by taking a risk.

#15 Exploration and Refinement

Posted: September 15, 2019 at 5:33 pm

The top image shows my favourite result for #15, which I think is pretty successful; I was not sure how the abstraction of the original (cubist) source would work out. I think this shows sufficient dissolution of the original. Explorations are included in a gallery below.


~86% Test Accuracy Appears to be Spurious

Posted: September 13, 2019 at 5:09 pm

After running a few more experiments, it seems the reported near 90% test accuracy is spurious and related to a lucky random split of data that was probably highly overlapping with the training data split. The highest test and validation accuracies I’ve seen after evaluating models using the same split as training are merely ~74% and 71%, respectively.

I did a little more reading on dropouts and realized I had not tried different numbers of hidden units in the hidden layer, so I’m running a new search with different input and hidden layer dropout rates, number of hidden units and some range of epochs and batch_size. If this does not significantly increase test and validation accuracy then I’ll go back to the colour histogram features and if that does not work… I have no idea…

#24 Exploration and Refinement

Posted: September 13, 2019 at 3:43 pm

I spent a little too much time on #24, but I quite like Yves Tanguy and I thought the muted colour palette here would be interesting. I can’t say I’m happy with the results. I suspect the lack of colour diversity is what causes these to require so many training iterations to obliterate the original. The top image is my favourite, and the gallery below shows the other explorations. I’m next moving onto #15.

#3 Refinement

Posted: September 7, 2019 at 9:39 am

I’ve found it quite difficult to get a version of #3 smooth and without remnants of the original. The image on the top here is closest, even though there is a very small detail in the original which is still visible. Images below were ruled out.

~86% Test Accuracy Using Initial Features?

Posted: September 5, 2019 at 3:58 pm

Following from the previous results using the new workflow, I went back to my initial features (the 52 vector of instructions used to generate compositions). The results are have turned out to be amazing. The best model achieved accuracies of 85.5% (training), 85.6% (validation) and 85.9% (test). This is a significant increase from the previous best result of 79% (validation). These accuracies are means of accuracies reported over five runs with different splits of the data-set. Note, these splits are still 50/25/25 so that the size of the subsets are comparable with previous results. The ‘training’ accuracy, is then not actually the accuracy on the data used to train the network, but the accuracy on a random subset of similar size as the training set. 616 bad compositions were predicted to be bad, 105 bad predicted to be good, 105 good predicted to be bad and 634 bad predicted to be bad. Again, these are averages over multiple predictions with different splits.

As I’m writing this I was thinking that my validation method is problematic. I set aside a test set (during training), to check generalizability beyond the training and validation sets. My validation code is a separate instance and has no access to that specific test split. I need to save that specific test set and then validate the best model based on it, not multiple random runs with random splits. This may be skewing my results, since my random splits use both training and validation samples. So what I need to do is save the split used during training and evaluation and run predictions on them. I’m working on those code changes now…

#1 Refinement

Posted: September 5, 2019 at 10:20 am

I ran a few more iterations appropriating #1 and they are looking quite nice. I think the top image is the most successful, but I’m not convinced by the blueish band near the right edge. I’m happy with the degree of abstraction where the structure breaks away from the figure form which is still visible in the lower image. I’m starting to realize my choice of neighbourhood size seems to be related to the size of faces in the source. Portraits of one person require larger neighbourhoods than group portraits. An interesting side exploration would be to use face detection to automatically determine neighbourhood size for paintings with faces (assuming face detection works well enough on painted faced). I think I’ll leave this one here for now and move along.

Revisiting Older Experiments

Posted: September 3, 2019 at 5:46 pm

After those recent strong results with the changed code, I’m revisiting older experiments to see if the they were in fact showing promise; I’m figuring out whether it was previous features, or the previous validation method that lead to that 70% accuracy ceiling.

The 24 colour histogram feature results do not improve upon the 24 hist + 31 non-colour parameter results. I did learn a few things in the process, including that the stochastic splits change the measured accuracy of the best selected model. From this point I’ll be reporting the mean of accuracy and confusion matrices of 5 runs using different random splits of validation and test data. I also re-ran the evaluation code on the previous experiment with 24+31 features in case the good results were a fluke. Following are the results.

31 + 24 Features

Mean Accuracy:


Mean of Confusion Matrices

375.0 bad predicted to be bad
106.4 bad predicted to be good
112.8 good predicted to be bad
381.8 good predicted to be good

24 Hist Features

Mean Accuracy:


Mean of Confusion Matrices

531.8 bad predicted to be bad.
194.6 bad predicted to be good
155.2 good predicted to be bad.
579.4 good predicted to be good.

So the results are that the 31 + 24 features have performed much better than 24 colour hist features alone. I’m rerunning the initial and variance feature experiments using the new validation method.

#1 and #3 Initial Sketches.

Posted: September 3, 2019 at 10:44 am

As I work my way up in resolution, I’ve generated an initial sketch of #1 and #3. #1 requires a lager neighbourhood to create more abstraction since the original is so well known. #3 also needs more iterations as some of the original painting (God’s face) is still visible. I also tried to do a run of one of the larger paintings, #4, but the process crashed; presumably due to a memory error.

#07 Refinements

Posted: August 30, 2019 at 11:14 am

I’m now setting this aside and moving onto the next images in the short list. The top image is the best result at this time.

#7 Explorations

Posted: August 29, 2019 at 11:17 am

While I’m not quite satisfied with these results, the top image shows what I think of as the most successful iteration; there is still a little of the initial conditions showing in in the faces though, so I’m running another session with slightly more iterations. The gallery below shows all my explorations of #7 up to this point. I’m struggling a little with the tension between smoothness and somewhat uniform colour patches with their harder edges. For this source painting, the patches in the ground can cue camouflage patterns that I’m not keen about.

Entropy Revisited

Posted: August 26, 2019 at 5:51 pm

Before starting the recent collage explorations, I had been doing more reading and thinking about entropy, see notes following. I also got a chance to watch a lecture that Sarah Dunsiger sent on entropy and emergence, I’ve included my notes on that below as well.


  • Natural log of the number of states in a system multiplied by a constant
  • The log reduces very large numbers and does little for small numbers. (e.g. ln(10e06) = ~16, ln(10e02) = ~7
  • The constant (Boltzmann) is a very small number (~1e-23)
  • So entropy is a small representation of really large numbers of possible states.
  • All possible 640×480 images in 8bit have 5e12 possible states and an ‘entropy’ of 4.04e-22. (Does it make any sense to think of entropy of an image?? An image is not dynamical, entropy is about dynamics, not structure.)
  • Second Law of Thermodynamics:
    • Entropy of closed systems never decreases (the number of possible states only increases until equilibrium, maximum entropy)
    • Entropy in open systems may decrease if the environment entropy increases (the number of possible states may decrease if the number of states in the environment increases)
  • is entropy about the propagation of energy? Does a system with more energy have more states? If it has more states, it looses that energy to the environment (increasing the number of its states in the environment).
  • is there some analogy in ML? Could the energy be the state of excitement of the initial conditions? The rate of learning?
  • More entropy means more complexity because more information is needed to represent the potential states of a system.
  • This seems more about the constraints of the system than the specific energy states.

Sarah Papers

  • order can be introduced from entropy alone
  • order from disorder?
  • the whole often resembles the part (chiral particles make chiral structure)

Entropy and Emergence (Video Lecture)

  • entropy as a measure of what you don’t know about the state of a system
  • fewer states means more certainty due to less possibilities.
  • a high-entropy system is random / has many states and no constraint.
  • entropy as the minimum number of binary questions one must ask to fully determine the system.
  • random needs every question whereas a pattern can be compressed
  • Key take-away: entropy does not indicate disorder because a system may have more ordered states than disordered states.

#23 and #8 Revisited

Posted: August 26, 2019 at 5:33 pm

As I mentioned in the previous post, I wanted to revisit the previously ruled out paintings. I used smaller learning rates to see if that salvaged them. I can’t say I’m happy with the results; although they are more smooth, they are still lacking.

Further Narrowing Down for #5.

Posted: August 25, 2019 at 3:33 pm

After doing a few more runs with tweaked parameters I’m not sure I’m doing much better so I’m going to leave #05 here and re-run the two lower resolution paintings that were previously ruled out (#23 and #18). The first image is the most successful, but is very similar to the those in the top row of the gallery. The bottom row includes the least successful, though I still think there is something to the larger neighbourhood in the lower right image.

Narrowing Down Explorations of #5 With Smaller Learning Rates

Posted: August 24, 2019 at 12:15 pm

After the insight in the previous post, I’ve explored a few variations using learning-rates smaller than 1.0. The following images are my favourites. They balance abstraction and emergent structure quite well, but are not quite there. The image on the left is insufficiently abstract where remnants of the mast in the original are still present. The wave-like structures in the lower left are very interesting and suggest quite a bit of depth and also cue the waves in the original. The image on the right shows quite good abstraction, but lacks some of that complexity in the waves, due to the larger neighbourhood (sigma = 200px).

The following images show the rest of the explorations, including highly over-abstracted versions that approach gradients. I’ve also included an attempt with a relatively high learning rate of 0.5, the highest of these explorations where the rest are 0.25 or 0.1. In that image (upper left of bottom gallery) the wave section in the lower left is very interesting, although approaches the appearance of spires; I’m not sure about the harder edges and mottled patches. That composition also shows a degree of under-organization at the smaller scale, e.g. splashes of red in the area above the bright spot.

Spires and Full Resolution Explorations.

Posted: August 21, 2019 at 12:56 pm

The images above show a few attempts to reproduce the aesthetic of the mid-resolution exploration of #5 at full resolution. As the ‘spires’ clearly overwhelm the image I wrote to the author of ANNetGPGPU. The conclusion is that the interaction of high learning rates and small neighbourhood functions lead to cases where the next BMU is very likely to be close to the previous BMU. The result is a trail of BMUs that progress across the SOM. It is unclear why they always progress at the same angle. I’m now running a test with a learning rate of 0.75 (rather than 1.0 as used previously) and I’ll continue to change learning rates and see how that looks! I may want to also revisit my previously ruled out paintings with this new insight. Now that I know these spires are an emergent result of the SOM, it’s something I should explicitly explore in the future!

Finally Cracked the 70% Validation Accuracy Wall!

Posted: August 21, 2019 at 12:46 pm

I changed my Talos code to explicitly include a best model selection call, running Predict(), and added a call to do 10-fold cross validation of models, running Evaluate() before saving the search session. It is not quite clear to me whether these two actions change the criteria by which models are selected for deployment, but in my first use of these calls my performance has jumped 10%.

I also split my data differently; data is split into 50/25/25% for training, validation and testing. The validation set is used in Talos Scan() and the testing set is used in Evaluate(). The features of this last session were using 31 features from the initial dataset (instructions to generate compositions, excluding colour data) and 25 colour histogram features. I was also wondering if the number of dimensions of my features meant I was not going to get anywhere with as few samples as I have.

The best model reported an accuracy of 78.4% on the training set, 80% on the validation set and 77% on the testing set. This indicates a huge improvement and makes me wonder if Talos was just selecting a very poor ‘best model’ previously. One caveat is that the log Talos generates that shows performance during training shows very different results; in the log, the greatest accuracy was reported as 56.8% on the validation set and 100% on the training set, highly divergent from the prediction accuracy made by the best model. I should also note that I removed the fixed RNG seeds for splits and data shuffling, so the search is stochastic and may be getting a broader picture since it’s not limited by reproducibility. The best model using the validation set predicted 304 bad compositions to be bad, 70 bad to be good, 74 good to be bad and 284 good to be good.

If I can reproduce this performance, I’ll then generate a new set of random compositions and see how the best model classifies them.

Early Full-Resolution Explorations

Posted: August 18, 2019 at 1:05 pm

Starting from the lowest resolution images of the 7 short-listed, I’ve been exploring using them at full resolution. Using the previous parameters for the intermediary resolutions, I was unable to get any strong results, see below. I’m wondering if colour diversity tends to result in images that are poorer… The main aesthetic weakness is the hard edges that manifest, even though the neighbourhood function has Gaussian edges. This was not seen, at least to the same degree, in the expanded intermediary resolution explorations. I’m currently computing a full resolution version of #5 (intermediary, original) and hope it’s more successful.

Fewer Histogram Features

Posted: August 18, 2019 at 9:12 am

After the lack of success in the previous experiment using the 768 element vector, I have the results of the 96 histogram bin experiment. During the search, Talos reported a peak validation accuracy of 73.3%. The best model reported a validation accuracy of 66.4% and a training accuracy of 99.7%. Clearly the model is learning the training set well, but again not generalizing to the validation set. The following image shows the confusion matrix for the validation set. I note that there is no appreciable difference between 1000 and 10,000 epochs to validation accuracy.

Expanded Intermediary Resolution Explorations

Posted: August 16, 2019 at 2:27 pm

The following images were computed over night using the same params as in the previous post. The training time is significantly longer than estimated, due to the larger number of pixels (due to aspect ratio), so only three were generated at the time of writing. While these results are going in the right direction, they are still too similar to the original compositions (with the exception of 07, lower right) and need further abstraction (increase of neighbourhood size). I emailed the author of the GPU accelerated SOM I’m using and see if he can reproduce these spire effects. Since the number of iterations has such a significant effect, it seems I should be working image by image at full resolution. As inefficient as I may be that seems like the next step; I’ll prioritize the lowest resolution images for exploration sake!