I uploaded 110 “good” compositions to Twitter; “Good” was defined by thresholding (> 50) the attention (number of frames where faces are detected) for each composition generated in the last (A-HOG) integrated test. The max number of likes was 6 and the max retweets 2. The mean likes was 0.62 and the mean retweets was 0.17. The following plot shows the likes (red), retweets (green) and their sum (blue) on the y axis for each composition (x axis). The peaks in the sum indicate one very successful composition (6 likes + 2 retweets) and 5 quite successful compositions. These compositions are included in the gallery below.
I’m now uploading a 100 random samples of the compositions with < 50 attention. I’m not expected much difference between the two classes, since the labels are not specified but inferred; during the test I did find myself seeing a good composition in passing but had already looked away for too long and it was replaced by a new composition. In other words, false positives are unlikely (staring at an unpleasant composition), but false negatives are very likely (only looking at a pleasant composition in passing).