From Merely Bad, to Really Bad.

Posted: May 3, 2019 at 5:13 pm

So I went through my ~2500 previously labelled “bad” compositions and selected the worst of the worst. I ended up with about the same number of these really bad compositions (266), as there are good compositions (287). This should help a lot with my unbalanced data problems. This does mean I generate a lot of compositions that are not good nor bad (89% of generated data). I will probably need to generate another 5000 compositions just to get another ~550 that are good or bad. After going back through the bad compositions; I did think that some of them were not that bad, so if I’m not sure about what’s bad, I certainly can’t expect a model to be! The following images show a random sampling of the previously bad (top), and the really bad (bottom).

I’ll use these new labels to train a classifier and see where that leads before I jump into cross-validation approaches.