Following Sofian’s advice, I attempted the classification of the larger data-set (5000 rather than 1000 samples). The results are not much better. Model accuracy was 87% and 24 bad compositions were predicted to be good, and 53 good compositions were predicted to be bad. The following image shows the 24 bad compositions that were predicted to be good:
Looking at them, they are not the most bad and could be perhaps considered mediocre. I also tried increasing the number of good samples to balance the two classes; I increased the number of good samples in the expanded data-set from 287 to 2583 (where there are 2737 bad samples) by simply copying them. Training on this “extra good” training set resulted in a classifier with an accuracy of 96% (a significant improvement). This model predicted that 43 bad compositions were good and, interestingly, no good compositions were predicted to be bad. The following image shows 42 of the the 43 bad compositions that were predicted to be good:
I’m doing a little reading on cross-validation, I’ve been using a fixed 80% training and 20% validation split up to this point, and see where that leads. Looking at these results they are not terrible, just not good. Perhaps I should try constraining my labelling such that the two classes are balanced. This would mean only labelling the really bad compositions as bad; this would be a lot less labour than increasing the training data size again, so maybe I’ll try that first.