Expanded Training Set

Posted: April 26, 2019 at 6:04 pm

Thanks to the suggestion of Sofian Audry, I’ve added an additional 4000 compositions to the initial set of 1000. It took me the entire week to label these new compositions. The complete 5000 composition set breaks down with these labels: good (287); neutral (1976); bad (2737). Similarly to the initial training set, there are about 5% good compositions, but the number of bad compositions grew from ~38% to ~55%. Following are random samples from the good and bad sets, respectively.

The next steps are to train using this data-set; if that does not work then investigate some methods to re-balance the training-data, since “good” samples are very rare.