Following from my previous ML post, I ran an experiment doing hyperparameter search using only the attention data, ignoring the Twitter data for now. The results are surprisingly poor with the best model achieving no better than chance accuracy and f1 scores on the test set! For the validation set, the best model achieved an accuracy of 65%. The following image shows the confusion matrix for the test set:
The f1 scores show that this model is equally poor at predicting good and bad classes: The f1 score for the validation set was 67% for bad classes and 62% for good. In the test set the f1 scores are very poor at 55% for the bad class and 45% for the good class.
As I mentioned in the previous post, I think a lot of noise is added with incidental interactions where someone walks by without actually attending to the composition. Watching behaviour around it, I’ve determined that attention values below 6 are very likely to be incidental. I’m now running a second experiment using the same setup as this one except where these low attention samples are removed. Of course this unbalances the data-set, in this case in favour of the ‘good’ compositions (754) compared to ‘bad’ compositions (339). As there is so little data here I’m not going to do more filtering of ‘good’ results to balance classes. After that I’ll repeat these results with the Twitter data and see where this leaves things.