Initial (MLP) Classification Attempts.

Posted: April 10, 2019 at 1:35 pm

Using the labelled data set, I was unable to get a (simple MLP) classifier to perform with accuracy better than 50%; it seems my fears, based on the t-sne visualization previously posted, were warranted. There is the underlying question regarding whether I should even treat the instructions to make compositions (my vectors) as features of the compositions. To look at this a different way, I thought I should generate histograms for each composition and see how t-sne and simple classifiers perform on those features.

I was thinking that perhaps the rarity of “good” compositions in the training set was a problem. Splitting the data-set into 80% training and 20% validation (using sampling that keeps the distribution of the three labels similar in both sets) leads to a training set with 41 “good” compositions, and a validation set with 10 (~12% in both cases).

There are also a lot of “neutral” samples that are neither good nor bad, and that is certainly not helping with what (at least initially) is a binary classification problem. So I did a test removing all the neutral samples and the classifier accuracy jumped from 50% to 82%, which is obviously significant. Unfortunately (because of the rarity of good compositions?) this translates into 6 “bad” compositions predicted to be “good” and 0 “good” compositions predicted to be “good”. The following images were labelled “bad” and predicted to be “good”:

I have a few other things to investigate, including arranging images according to their distance to a reference (an arbitrary composition) and see if (a) the distance corresponds to some sense of visual similarity, and (b) the distribution of good and bad compositions (are good compositions more distant from bad ones?). I suspect the latter will mirror the t-sne results, but it’s worth looking at whether distances in vector space matches any sense of visual similarity. Another investigation will be to generate colour histograms for each composition and see how those features look according to t-sne and the classifier.