Deep Networks provide no increase of validation accuracy.

Posted: June 12, 2019 at 4:20 pm

After doing quite a bit more learning I used talos to search hyperparameters for deeper networks. I ran a few experiments and in no case could I make any significant improvement to validation accuracy from the simple single hidden layer network. While there is some improvement tuning various hyperparameters, all the tested network configurations resulted in validation accuracy ranging from 61% to 73% (60% to 100% for training data). The following plot shows the range of validation accuracy over the number of hidden layers. Note the jump from 1 to 2 hidden layers does increase validation accuracy, but only an mean increase of 0.3%.

The confusion matrix for the best model is about the same as it was for the single hidden layer model first trained in keras (without hyperparameter tuning!):

Through the last couple of weeks I have made no significant gains with keras. So the problem is clearly my features. Everything I’m seeing seems to indicate that my initial fears in regards to the lack of separability in my initial T-sne results. So I have a few ideas on how to move forward:

  1. Rather than using the raw features for classification, do some initial stats on those features and use those stats for training. This only effects features that can be grouped, e.g. stats on the set of colours of all layers in a composition. Two ideas are variance of such groups of features, or full histograms for each group of features.
  2. Since my features are normalized, they all have the same range. This means that regardless of their labels, all features will have the same stats, making #1 moot! So it looks like I should convert my code so that the features are the actual numbers used to generate compositions and not these 0-1 evenly distributed random numbers. This means generating and labelling a new data-set.

Reproducing previous R results in Keras.

Posted: May 31, 2019 at 6:35 pm

After spending so much time in R trying to get a simple network to work I’ve made the jump into Keras, sklearn, scipy, etc. in order to build deeper networks. Work flow is a lot more awkward, but I’ve managed to figure our the key metrics (categorical accuracy and confusion matrices) and reproduced the previous results in R (note, I did not use the indexical features in the keras model).

Comparable to the R model, a single 52 unit hidden layer network trained on 80% of the data over 300 epochs achieved an accuracy on the training data of 99% and an accuracy on the validation data of 72% (73% in the previous model). There were 105 ‘bad’ samples predicted to be ‘good’ (compared to the previous 101) and 113 ‘good’ samples predicted to be ‘bad’ (compared to the previous 109). The following images show the confusion matrices for training and validation data, respectively.

Training with indexical features.

Posted: May 23, 2019 at 10:43 am

I manually transformed a subset of features of my training data to be indexical. By indexical I mean that those features are categories and not continuous. They specify aspects of the composition that are constrained to a limited number of options, such as rendering style, layer frequency and offset. Previously, all features were real numbers and it was the renderer that thresholded them into categories. I thought that perhaps the constrained numbers may provide a stronger signal associated with “good” or “bad” classes. Unfortunately, that is not the case.

Using the same 10-fold optimization as previous (see image above), the accuracy on the validation set was only 73%. 109 good compositions were predicted to be bad, and 101 bad compositions were predicted to be good (see images below). The optimization results were very different though, where 52 (vs 32 previously) hidden units and a decay of 0.5 (vs 0.1 previously) performed best. This wide range seems to indicate the network is not learning very well at all. So the next step is to jump into deep networks and see if they can manage better learning.

“Good” compositions predicted to be “bad”.
“Bad” compositions predicted to be “good”.

Cross Validation and Hyperparameter Tuning in CARET.

Posted: May 19, 2019 at 4:13 pm

The above plot outlines the results of my 10-fold cross validation for parameter tuning. The best model had 32 hidden units and a decay of 0.1. Predictions based on this ‘best’ model are still not great; The accuracy is 61% on the validation set. 30 “good” compositions were labelled to be “bad” and 31 “bad” compositions labelled to be “good”. The following images show the misclassified prediction results, good predicted to be bad (top) and bad predicted to be good (bottom).

Before I jump into training a deeper model, I have an idea for transforming my input vectors. Since I generate vectors that all range from 0-1 and have the same resolution, I wonder if the network is having trouble leaning distributions of parameters that are not real numbers; i.e. parameters like frequency, offsets and render styles are more like categorical variables. So an idea is to change those features so their resolution matches the number of possible categories

Jumping into a deeper network would involve either continuing my workflow with another DNN library or dropping R and implementing in keras and running on my CUDA setup.

Training on Larger Data Set

Posted: May 17, 2019 at 12:18 pm

I ran a couple of experiments using a split of 80% training and 20% validation on the new larger (naturally) balanced data set. Unfortunately the results are not very good. At best I got an accuracy of ~70% where 125 “good” compositions were predicted to be “bad” and 111 bad compositions predicted to be “good”. My first attempt was even worse and I tried increasing the number of hidden units without much improvement. I’ll try cross-validation approaches next before jumping into some ideas for transforming the input data. I think this new data set size should be sufficient…

15,000 Sample Data Set

Posted: May 16, 2019 at 5:00 pm

The images above show a random sampling from the “good” (top) and “bad” (bottom) subsets from the newly labelled 15,000 sample data set. In labelling this set, I kept an eye on balance between “good” and “bad” and classes are quite even in this data set. This was accomplished by being more critical of compositions labelled “bad”. There are 1986 “good” and 1915 “bad” compositions in the new set. Approximately 13% of the data is now “good” and 13% bad, much higher than the previous 5% for the 5000 sample set.

It was interesting to look at a much broader collection of compositions as I noticed a few things that I wanted to be possible are possible, albeit very rare. There were a few single colour compositions (5ish?), and some quite soft compositions. I did not see any compositions with very broad gradients (collections of low frequency stripes with low contrast and some transparency) which would be nice. I also noticed some aesthetic weaknesses in general and have planned some renderer changes to resolve these (including increasing min circle size, increasing number of subdivisions of offsets and tweak the range of sine-wave frequencies).

Fewer Balanced Samples

Posted: May 4, 2019 at 10:38 am

Training with fewer balanced samples was no help at all. Accuracy of the model dropped from 96% (model trained on duplicated “good” samples) to 65%. 15 of the good samples were predicted to be bad and 23 bad samples predicted to be good. Since I’m much more confident about these bad samples (these are the worst of the “bad”) these results are terrible. There are only 500 training samples from 5000 generated compositions, which is not a lot considering these feature vectors are very abstract. Since deeper networks require more training data, it seems clear I just need to generate more training samples. If I generate another 10,000 compositions that would result in another ~1000 training samples, bringing the total up to ~1500 (750 good, 750 bad). I think that is the most I can conceivably label; based on previous experience it would be at least a week of labelling alone.

I’m just realizing that this predictor will mark all samples as good or bad, but I know that the vast majority of inputs are neither good nor bad, so it seems I should go back to three labels (good, bad and neutral). They would still be very unbalanced though. Or should I switch gears and use a modelling method that produces a confidence for each sample?

From Merely Bad, to Really Bad.

Posted: May 3, 2019 at 5:13 pm

So I went through my ~2500 previously labelled “bad” compositions and selected the worst of the worst. I ended up with about the same number of these really bad compositions (266), as there are good compositions (287). This should help a lot with my unbalanced data problems. This does mean I generate a lot of compositions that are not good nor bad (89% of generated data). I will probably need to generate another 5000 compositions just to get another ~550 that are good or bad. After going back through the bad compositions; I did think that some of them were not that bad, so if I’m not sure about what’s bad, I certainly can’t expect a model to be! The following images show a random sampling of the previously bad (top), and the really bad (bottom).

I’ll use these new labels to train a classifier and see where that leads before I jump into cross-validation approaches.

Classification of Larger Dataset

Posted: May 2, 2019 at 5:40 pm

Following Sofian’s advice, I attempted the classification of the larger data-set (5000 rather than 1000 samples). The results are not much better. Model accuracy was 87% and 24 bad compositions were predicted to be good, and 53 good compositions were predicted to be bad. The following image shows the 24 bad compositions that were predicted to be good:

Looking at them, they are not the most bad and could be perhaps considered mediocre. I also tried increasing the number of good samples to balance the two classes; I increased the number of good samples in the expanded data-set from 287 to 2583 (where there are 2737 bad samples) by simply copying them. Training on this “extra good” training set resulted in a classifier with an accuracy of 96% (a significant improvement). This model predicted that 43 bad compositions were good and, interestingly, no good compositions were predicted to be bad. The following image shows 42 of the the 43 bad compositions that were predicted to be good:

I’m doing a little reading on cross-validation, I’ve been using a fixed 80% training and 20% validation split up to this point, and see where that leads. Looking at these results they are not terrible, just not good. Perhaps I should try constraining my labelling such that the two classes are balanced. This would mean only labelling the really bad compositions as bad; this would be a lot less labour than increasing the training data size again, so maybe I’ll try that first.

Expanded Training Set

Posted: April 26, 2019 at 6:04 pm

Thanks to the suggestion of Sofian Audry, I’ve added an additional 4000 compositions to the initial set of 1000. It took me the entire week to label these new compositions. The complete 5000 composition set breaks down with these labels: good (287); neutral (1976); bad (2737). Similarly to the initial training set, there are about 5% good compositions, but the number of bad compositions grew from ~38% to ~55%. Following are random samples from the good and bad sets, respectively.

The next steps are to train using this data-set; if that does not work then investigate some methods to re-balance the training-data, since “good” samples are very rare.

RMS Distance to Reference

Posted: April 10, 2019 at 5:29 pm

I selected one composition (#569) from the training set as the reference and computed its distance (RMS) from all other samples in the training set (without neutral samples). The result is not unexpectedly that the good compositions are spread throughout the bad compositions (see below).

Also there seems to be no visual relationship between compositions with shorter RMS distances; #878 is not more similar to #569 than #981 is. This is confirmed by plotting the images themselves according to their RMS distance to the reference (in the upper left corner, filling rows first):

So it seems using these instructions as feature vectors may be a no go. The benefit of using these vectors was that the composition could be evaluated by the classifier without it actually getting rendered. I’ll next try the using colour histogram features and see if my results are any better.

Initial (MLP) Classification Attempts.

Posted: April 10, 2019 at 1:35 pm

Using the labelled data set, I was unable to get a (simple MLP) classifier to perform with accuracy better than 50%; it seems my fears, based on the t-sne visualization previously posted, were warranted. There is the underlying question regarding whether I should even treat the instructions to make compositions (my vectors) as features of the compositions. To look at this a different way, I thought I should generate histograms for each composition and see how t-sne and simple classifiers perform on those features.

I was thinking that perhaps the rarity of “good” compositions in the training set was a problem. Splitting the data-set into 80% training and 20% validation (using sampling that keeps the distribution of the three labels similar in both sets) leads to a training set with 41 “good” compositions, and a validation set with 10 (~12% in both cases).

There are also a lot of “neutral” samples that are neither good nor bad, and that is certainly not helping with what (at least initially) is a binary classification problem. So I did a test removing all the neutral samples and the classifier accuracy jumped from 50% to 82%, which is obviously significant. Unfortunately (because of the rarity of good compositions?) this translates into 6 “bad” compositions predicted to be “good” and 0 “good” compositions predicted to be “good”. The following images were labelled “bad” and predicted to be “good”:

I have a few other things to investigate, including arranging images according to their distance to a reference (an arbitrary composition) and see if (a) the distance corresponds to some sense of visual similarity, and (b) the distribution of good and bad compositions (are good compositions more distant from bad ones?). I suspect the latter will mirror the t-sne results, but it’s worth looking at whether distances in vector space matches any sense of visual similarity. Another investigation will be to generate colour histograms for each composition and see how those features look according to t-sne and the classifier.

t-sne on labelled compositions.

Posted: April 4, 2019 at 5:59 pm

The plot above shows the t-sne results of the vectors that represent each composition. It’s very clear that bad, good, and neutral compositions are evenly distributed and conflated with no discernible separability. I spent a little time trying to figure out what the implication is, but I seem to only find information on linearly separability and classification. My concern is a classification of these vectors will not be able to discriminate between good and bad compositions. If this is the case, I would need a different representation of each composition and it’s unclear what an appropriate representation would be.

Initial Training Set

Posted: April 3, 2019 at 4:53 pm

In preparation for Machine Learning aspect of this project I’ve generated 1000 images (and the vectors that represent them) and labelled them as good, bad or neither good nor bad. There were 51 good, 376 bad and 513 neutral images in the training set.

The labelling is based in my intuitive compositional sense and is a stand-in for viewer interaction (via preferential looking and social media likes). The idea is to get a sense of this data set and train a few classifiers to see if they can discriminate good and bad compositions.

Good Compositions

The next step is to plot the corresponding vectors using t-sne in R and see how my labels are distributed in that vector space.

Face Detection & Non-defective Display!

Posted: March 12, 2019 at 10:29 am

Last week I managed to get facial tracking code working on the Jetson. It’s using the old CUDA-based Haar-feature method, but seems to be working more or less fast and well enough. Though I did notice that the (a) it’s a little noisy (i.e. the detection of a face sometimes oscillates over time) and (b) the plant behind me (as seen above the mouse in the image above) was occasionally recognized as a face. This is good enough for this stage and hopefully I won’t need to train my own classifier to improve things. Later this week I’ll integrate the ‘painting’ rendering code and I’ll see how the experience feels in terms of change only when no one is looking.

The third EIZO display arrived last week and it does not have any noticeable “pressure marks”. I suppose the first two were just bad luck and I hope I have better luck when I order the second unit.

Bulk Compositions Generated on Jetson

Posted: February 23, 2019 at 12:13 pm

I spent the last couple of days reorganizing the visual exploration code into separate files. Now there is a class that holds all the rendering code and also a class for each layer of the composition. This makes the main program very clean and simple and a good stage for more ongoing development. The image above is a random generation of compositions on the Jetson. No editing here, literally the random output of the system. Most compositions are not very interesting; this is a good way for me to tune the system in terms of the breadth of generated diversity.

I wanted to get a little more work done using the square screen before I ship it back next week, due to the dark spots it arrived with. This is the second monitor I’ve received from them with this problem and I sure hope the third time is the charm. I’m going to record my unboxing in case there is a problem I can show them if it arrives with damage.

OFX 0.10.1 Working on TX2

Posted: February 18, 2019 at 10:55 am

I followed (the English translation of ) this blog post to get openframeworks to build on the Jetson. The aesthetic exploration code written on the shuttle seems to run just as snappy on the Jetson! The only initial issue is that the frame-rate does not seem fixed to vblank, so there is some tearing on rendering. Following is my own notes on the process of getting ofx working on the Jetson, see orig post for details:

sudo ./

change line 79: armv7l to aarch64


Comment out lines 41-44 and 69-71

Copy the precompiled libs over the arm included libs. (See Japanese blog post for the download link)

cd ~/Downloads/OF10.0lib

mv libkiss.a libs/kiss/lib/linuxarmv7l/

mv libtess2.a libs/tess2/lib/linuxarmv7l/


cd ~/src/of_v0.10.1_linuxarmv7l_release/scripts/linux

./ -j4

Solved Full-Resolution Artifacts!

Posted: February 15, 2019 at 11:19 am

It turns out the problem was that the 1920×1920 signal generated by the Jetson by default was 60hz. While this is a valid resolution according to the specs, I think it requires a dual-link DVI and it’s unclear how HDMI effects this. Anyhow I realized that there is also a 30hz 1920×1920 signal in the EDID and using xrandr to use that resolution resolved the 60hz artifacts. Now that I think about it, my post to the nvidia developer forum does make sense, since I was initially running on DVI and changed to HDMI to rule out the cable. Turns out when I switched from DVI to HDMI, the GeForce card automatically switched from 60hz to 30hz and I did not notice.

I did have some issues making my changes stick on boot, as documented in the post linked above. After installed xubuntu, I can only assume its xrandr-based display settings allowed my preferred 30hz resolution stick on boot.

NVIDIA Jetson TX2 Arrived!

Posted: February 13, 2019 at 10:51 am

The machine-learning embedded platform (NVIDIA Jetson) arrived last week! This is the board I chose for the Zombie Formalist so I could get decent GPU accelerated facial recognition with hopefully low power use and noise. The board is less hackable than I was expecting (e.g. switches are surface mounted!) so I may need to get a different board for the final work. So I installed Jetpack 3.3 and hooked up the square display to find a problem… (more…)

Paul Mogensen

Posted: February 13, 2019 at 10:09 am

“no title (Earth Red)”, Paul Mogensen, 1969

“No Title”, Paul Mogensen, 1973

I’ve finally finished all three volumes of Claudine Humblet’s The New American Abstraction (1950–1970)! This will conclude the bulk of my art-historical research for the Zombie Formalist, though I expect to look back at these artists as a continue to refine the visual aesthetic of the work. (more…)

Agnes Martin

Posted: January 31, 2019 at 2:51 pm

“Summer”, Agnes Martin, 1964

“Untitled #2”, Agnes Martin, 1992

It has been a while since I got back to my research on colour field painters. Martin is one of the very few women in the field who gained prominence and provides a good precedent for the grid and a systematic (but not rigidly so) compositional process.

Like other painters in the field, Martin aims for a “pursuit of the essential” (Claudine Humblet, The New American Abstraction (1950–1970)). Martin’s lack of rigidity in the system is manifest in slight variations in her composition, for example the position of the dots in the “Summer” above. This is described as a “constant vibration” (Ibid.) and is “…far removed from the ‘impersonality’ once hoped for from geometric form.” (Ibid.) Martin’s emphasis on perception, where the viewer  completes the work, is consistent with other painters focused on perception: “The observer makes the painting.” (Martin quoted by Claudine Humblet, The New American Abstraction (1950–1970)) This again connects very well with the Zombie Formalist that is an empty mechanism with no intention whose random actions are given value and meaning through the attention of the viewer.

Dense Circles

Posted: December 29, 2018 at 6:10 pm

These are generated the same as the previous circles, except the number of layers is increased from 3 to 10. I focused on circles, but the stripes and chevrons at this density looked very interesting too! With 10 layers that would be at least a 74 item vector that describes the composition! Maybe 5-7 would be sufficient (39-53 item vectors). I think this is enough time with the visual explorations and it’s time to get the Jetson board and see what it can do; I’m also well overdue to start working on face detection and social media integration! I hope the monitor issues work out. I’m even more convinced that square is the way to go.


Posted: December 29, 2018 at 5:39 pm


Posted: December 29, 2018 at 5:27 pm

The Circles Return!

Posted: December 24, 2018 at 6:53 pm

Thanks to the OpenFrameworks forum, some code was provided to convert textures from rectangular to polar coordinates. This allowed me to get circles working again! I also tweaked the code quite a bit in regards to the frequencies of sinewaves. In this version sinewaves (and offsets) are randomly selected, but limited to a particular granularity; the result is the frequencies are a lot more constrained and I’ve also lowered the max frequency leading to broader bands. I’m quite happy with these results! I have not looked at how these changes effect the stripe and chevron rendering modes, but I’ll take a look at that soon. Following are a couple of full-resolution selections from above.

H and V Stripes with Offset

Posted: November 23, 2018 at 6:26 pm

I tweaked the code a little more and put the layer by layer offsets back in. I have to say I am happier with these less dense results with offsets and where the frequency is constrained more. The selections below show stripes and chevrons, respectively. Note the skew method (rather than rotation, as mentioned in the last post), means some chevrons may end up as vertical stripes. The offsets are constrained to the same structure as the X translation for chevron and circle compositions (Left, OneThird, Centre, TwoThirds, Right positions).

Horizontal and Vertical Stripes

Posted: November 23, 2018 at 11:37 am

I had in mind an exploration using horizontal and vertical stripes. These end up being very grid-like, but part of that is because the stripes themselves are not offset (where more background is visible), so the stripes fill the whole composition and are quite dense. Somewhat interesting, but even with only two layers they tend to me very dense. Maybe there should be a constraint so that the frequency of the layers are not similar… Also I’m not sure I want to have the chevrons skewed so much as rotated (so that the vertical stripes stay perpendicular to the horizontal stripes in chevrons). I’ll put back in the offset code and will post some of those results later today. Following are images showing the chevron and stripe render modes. I was unable to easily fix the circle rendering due to my misunderstanding of texture coordinates, which previously worked due to 1px tall images used to generate stripes of a single orientation.

Refined Sinewave Stripes with X Offset

Posted: November 22, 2018 at 10:59 am

I refined the code and added a random X offset from a fixed set of intervals. The code was also tweaked a little, but I think I would still like to see a greater range of frequencies. (more…)

Sinewave Stripes

Posted: November 15, 2018 at 6:53 pm

Above is a selection of results from a sine-wave based stripe generator. This allows for few parameters to describe a wide variety of densities. Also, the gaps between stripes, their thickness, and the softness of their edges are parameterized; contrast and threshold parameters allow the sinewaves to become stripes of various widths with edges of various softness. All of these images are generated with 5 layers (one wave per layer). In these results the blur shader is disabled;  any softness is due to the contrast parameter. I’ve included a few strong results below at full resolution… (more…)

Tweaks to transparency, blurred edges and density.

Posted: November 9, 2018 at 3:10 pm

The above image shows the results of some code tweaking. I added random transparency to layers (inspired by Paul Reed) and decreased the range of blur. The result is a little more variety of colour and a greater likelihood of hard edge.