Reproducing previous R results in Keras.

Posted: May 31, 2019 at 6:35 pm

After spending so much time in R trying to get a simple network to work I’ve made the jump into Keras, sklearn, scipy, etc. in order to build deeper networks. Work flow is a lot more awkward, but I’ve managed to figure our the key metrics (categorical accuracy and confusion matrices) and reproduced the previous results in R (note, I did not use the indexical features in the keras model).

Comparable to the R model, a single 52 unit hidden layer network trained on 80% of the data over 300 epochs achieved an accuracy on the training data of 99% and an accuracy on the validation data of 72% (73% in the previous model). There were 105 ‘bad’ samples predicted to be ‘good’ (compared to the previous 101) and 113 ‘good’ samples predicted to be ‘bad’ (compared to the previous 109). The following images show the confusion matrices for training and validation data, respectively.

Training with indexical features.

Posted: May 23, 2019 at 10:43 am

I manually transformed a subset of features of my training data to be indexical. By indexical I mean that those features are categories and not continuous. They specify aspects of the composition that are constrained to a limited number of options, such as rendering style, layer frequency and offset. Previously, all features were real numbers and it was the renderer that thresholded them into categories. I thought that perhaps the constrained numbers may provide a stronger signal associated with “good” or “bad” classes. Unfortunately, that is not the case.

Using the same 10-fold optimization as previous (see image above), the accuracy on the validation set was only 73%. 109 good compositions were predicted to be bad, and 101 bad compositions were predicted to be good (see images below). The optimization results were very different though, where 52 (vs 32 previously) hidden units and a decay of 0.5 (vs 0.1 previously) performed best. This wide range seems to indicate the network is not learning very well at all. So the next step is to jump into deep networks and see if they can manage better learning.

“Good” compositions predicted to be “bad”.
“Bad” compositions predicted to be “good”.

Cross Validation and Hyperparameter Tuning in CARET.

Posted: May 19, 2019 at 4:13 pm

The above plot outlines the results of my 10-fold cross validation for parameter tuning. The best model had 32 hidden units and a decay of 0.1. Predictions based on this ‘best’ model are still not great; The accuracy is 61% on the validation set. 30 “good” compositions were labelled to be “bad” and 31 “bad” compositions labelled to be “good”. The following images show the misclassified prediction results, good predicted to be bad (top) and bad predicted to be good (bottom).

Before I jump into training a deeper model, I have an idea for transforming my input vectors. Since I generate vectors that all range from 0-1 and have the same resolution, I wonder if the network is having trouble leaning distributions of parameters that are not real numbers; i.e. parameters like frequency, offsets and render styles are more like categorical variables. So an idea is to change those features so their resolution matches the number of possible categories

Jumping into a deeper network would involve either continuing my workflow with another DNN library or dropping R and implementing in keras and running on my CUDA setup.

Training on Larger Data Set

Posted: May 17, 2019 at 12:18 pm

I ran a couple of experiments using a split of 80% training and 20% validation on the new larger (naturally) balanced data set. Unfortunately the results are not very good. At best I got an accuracy of ~70% where 125 “good” compositions were predicted to be “bad” and 111 bad compositions predicted to be “good”. My first attempt was even worse and I tried increasing the number of hidden units without much improvement. I’ll try cross-validation approaches next before jumping into some ideas for transforming the input data. I think this new data set size should be sufficient…

15,000 Sample Data Set

Posted: May 16, 2019 at 5:00 pm

The images above show a random sampling from the “good” (top) and “bad” (bottom) subsets from the newly labelled 15,000 sample data set. In labelling this set, I kept an eye on balance between “good” and “bad” and classes are quite even in this data set. This was accomplished by being more critical of compositions labelled “bad”. There are 1986 “good” and 1915 “bad” compositions in the new set. Approximately 13% of the data is now “good” and 13% bad, much higher than the previous 5% for the 5000 sample set.

It was interesting to look at a much broader collection of compositions as I noticed a few things that I wanted to be possible are possible, albeit very rare. There were a few single colour compositions (5ish?), and some quite soft compositions. I did not see any compositions with very broad gradients (collections of low frequency stripes with low contrast and some transparency) which would be nice. I also noticed some aesthetic weaknesses in general and have planned some renderer changes to resolve these (including increasing min circle size, increasing number of subdivisions of offsets and tweak the range of sine-wave frequencies).

Consumption and the Machine: Appropriation in the Age of AI

Link to Article
Local PDF Copy

[B. D. R. Bogart. Consumption and the machine: Appropriation in the age of AI. Full Bleed, 03: Machines, 2019.]

Fewer Balanced Samples

Posted: May 4, 2019 at 10:38 am

Training with fewer balanced samples was no help at all. Accuracy of the model dropped from 96% (model trained on duplicated “good” samples) to 65%. 15 of the good samples were predicted to be bad and 23 bad samples predicted to be good. Since I’m much more confident about these bad samples (these are the worst of the “bad”) these results are terrible. There are only 500 training samples from 5000 generated compositions, which is not a lot considering these feature vectors are very abstract. Since deeper networks require more training data, it seems clear I just need to generate more training samples. If I generate another 10,000 compositions that would result in another ~1000 training samples, bringing the total up to ~1500 (750 good, 750 bad). I think that is the most I can conceivably label; based on previous experience it would be at least a week of labelling alone.

I’m just realizing that this predictor will mark all samples as good or bad, but I know that the vast majority of inputs are neither good nor bad, so it seems I should go back to three labels (good, bad and neutral). They would still be very unbalanced though. Or should I switch gears and use a modelling method that produces a confidence for each sample?

From Merely Bad, to Really Bad.

Posted: May 3, 2019 at 5:13 pm

So I went through my ~2500 previously labelled “bad” compositions and selected the worst of the worst. I ended up with about the same number of these really bad compositions (266), as there are good compositions (287). This should help a lot with my unbalanced data problems. This does mean I generate a lot of compositions that are not good nor bad (89% of generated data). I will probably need to generate another 5000 compositions just to get another ~550 that are good or bad. After going back through the bad compositions; I did think that some of them were not that bad, so if I’m not sure about what’s bad, I certainly can’t expect a model to be! The following images show a random sampling of the previously bad (top), and the really bad (bottom).

I’ll use these new labels to train a classifier and see where that leads before I jump into cross-validation approaches.

Classification of Larger Dataset

Posted: May 2, 2019 at 5:40 pm

Following Sofian’s advice, I attempted the classification of the larger data-set (5000 rather than 1000 samples). The results are not much better. Model accuracy was 87% and 24 bad compositions were predicted to be good, and 53 good compositions were predicted to be bad. The following image shows the 24 bad compositions that were predicted to be good:

Looking at them, they are not the most bad and could be perhaps considered mediocre. I also tried increasing the number of good samples to balance the two classes; I increased the number of good samples in the expanded data-set from 287 to 2583 (where there are 2737 bad samples) by simply copying them. Training on this “extra good” training set resulted in a classifier with an accuracy of 96% (a significant improvement). This model predicted that 43 bad compositions were good and, interestingly, no good compositions were predicted to be bad. The following image shows 42 of the the 43 bad compositions that were predicted to be good:

I’m doing a little reading on cross-validation, I’ve been using a fixed 80% training and 20% validation split up to this point, and see where that leads. Looking at these results they are not terrible, just not good. Perhaps I should try constraining my labelling such that the two classes are balanced. This would mean only labelling the really bad compositions as bad; this would be a lot less labour than increasing the training data size again, so maybe I’ll try that first.