I’m now setting this aside and moving onto the next images in the short list. The top image is the best result at this time.
While I’m not quite satisfied with these results, the top image shows what I think of as the most successful iteration; there is still a little of the initial conditions showing in in the faces though, so I’m running another session with slightly more iterations. The gallery below shows all my explorations of #7 up to this point. I’m struggling a little with the tension between smoothness and somewhat uniform colour patches with their harder edges. For this source painting, the patches in the ground can cue camouflage patterns that I’m not keen about.
Before starting the recent collage explorations, I had been doing more reading and thinking about entropy, see notes following. I also got a chance to watch a lecture that Sarah Dunsiger sent on entropy and emergence, I’ve included my notes on that below as well.
- Natural log of the number of states in a system multiplied by a constant
- The log reduces very large numbers and does little for small numbers. (e.g. ln(10e06) = ~16, ln(10e02) = ~7
- The constant (Boltzmann) is a very small number (~1e-23)
- So entropy is a small representation of really large numbers of possible states.
- All possible 640×480 images in 8bit have 5e12 possible states and an ‘entropy’ of 4.04e-22. (Does it make any sense to think of entropy of an image?? An image is not dynamical, entropy is about dynamics, not structure.)
- Second Law of Thermodynamics:
- Entropy of closed systems never decreases (the number of possible states only increases until equilibrium, maximum entropy)
- Entropy in open systems may decrease if the environment entropy increases (the number of possible states may decrease if the number of states in the environment increases)
- is entropy about the propagation of energy? Does a system with more energy have more states? If it has more states, it looses that energy to the environment (increasing the number of its states in the environment).
- is there some analogy in ML? Could the energy be the state of excitement of the initial conditions? The rate of learning?
- More entropy means more complexity because more information is needed to represent the potential states of a system.
- This seems more about the constraints of the system than the specific energy states.
- order can be introduced from entropy alone
- order from disorder?
- the whole often resembles the part (chiral particles make chiral structure)
Entropy and Emergence (Video Lecture)
- entropy as a measure of what you don’t know about the state of a system
- fewer states means more certainty due to less possibilities.
- a high-entropy system is random / has many states and no constraint.
- entropy as the minimum number of binary questions one must ask to fully determine the system.
- random needs every question whereas a pattern can be compressed
- Key take-away: entropy does not indicate disorder because a system may have more ordered states than disordered states.
As I mentioned in the previous post, I wanted to revisit the previously ruled out paintings. I used smaller learning rates to see if that salvaged them. I can’t say I’m happy with the results; although they are more smooth, they are still lacking.
After doing a few more runs with tweaked parameters I’m not sure I’m doing much better so I’m going to leave #05 here and re-run the two lower resolution paintings that were previously ruled out (#23 and #18). The first image is the most successful, but is very similar to the those in the top row of the gallery. The bottom row includes the least successful, though I still think there is something to the larger neighbourhood in the lower right image.
After the insight in the previous post, I’ve explored a few variations using learning-rates smaller than 1.0. The following images are my favourites. They balance abstraction and emergent structure quite well, but are not quite there. The image on the left is insufficiently abstract where remnants of the mast in the original are still present. The wave-like structures in the lower left are very interesting and suggest quite a bit of depth and also cue the waves in the original. The image on the right shows quite good abstraction, but lacks some of that complexity in the waves, due to the larger neighbourhood (sigma = 200px).
The following images show the rest of the explorations, including highly over-abstracted versions that approach gradients. I’ve also included an attempt with a relatively high learning rate of 0.5, the highest of these explorations where the rest are 0.25 or 0.1. In that image (upper left of bottom gallery) the wave section in the lower left is very interesting, although approaches the appearance of spires; I’m not sure about the harder edges and mottled patches. That composition also shows a degree of under-organization at the smaller scale, e.g. splashes of red in the area above the bright spot.
The images above show a few attempts to reproduce the aesthetic of the mid-resolution exploration of #5 at full resolution. As the ‘spires’ clearly overwhelm the image I wrote to the author of ANNetGPGPU. The conclusion is that the interaction of high learning rates and small neighbourhood functions lead to cases where the next BMU is very likely to be close to the previous BMU. The result is a trail of BMUs that progress across the SOM. It is unclear why they always progress at the same angle. I’m now running a test with a learning rate of 0.75 (rather than 1.0 as used previously) and I’ll continue to change learning rates and see how that looks! I may want to also revisit my previously ruled out paintings with this new insight. Now that I know these spires are an emergent result of the SOM, it’s something I should explicitly explore in the future!
I changed my Talos code to explicitly include a best model selection call, running Predict(), and added a call to do 10-fold cross validation of models, running Evaluate() before saving the search session. It is not quite clear to me whether these two actions change the criteria by which models are selected for deployment, but in my first use of these calls my performance has jumped 10%.
I also split my data differently; data is split into 50/25/25% for training, validation and testing. The validation set is used in Talos Scan() and the testing set is used in Evaluate(). The features of this last session were using 31 features from the initial dataset (instructions to generate compositions, excluding colour data) and 25 colour histogram features. I was also wondering if the number of dimensions of my features meant I was not going to get anywhere with as few samples as I have.
The best model reported an accuracy of 78.4% on the training set, 80% on the validation set and 77% on the testing set. This indicates a huge improvement and makes me wonder if Talos was just selecting a very poor ‘best model’ previously. One caveat is that the log Talos generates that shows performance during training shows very different results; in the log, the greatest accuracy was reported as 56.8% on the validation set and 100% on the training set, highly divergent from the prediction accuracy made by the best model. I should also note that I removed the fixed RNG seeds for splits and data shuffling, so the search is stochastic and may be getting a broader picture since it’s not limited by reproducibility. The best model using the validation set predicted 304 bad compositions to be bad, 70 bad to be good, 74 good to be bad and 284 good to be good.
If I can reproduce this performance, I’ll then generate a new set of random compositions and see how the best model classifies them.
Starting from the lowest resolution images of the 7 short-listed, I’ve been exploring using them at full resolution. Using the previous parameters for the intermediary resolutions, I was unable to get any strong results, see below. I’m wondering if colour diversity tends to result in images that are poorer… The main aesthetic weakness is the hard edges that manifest, even though the neighbourhood function has Gaussian edges. This was not seen, at least to the same degree, in the expanded intermediary resolution explorations. I’m currently computing a full resolution version of #5 (intermediary, original) and hope it’s more successful.
After the lack of success in the previous experiment using the 768 element vector, I have the results of the 96 histogram bin experiment. During the search, Talos reported a peak validation accuracy of 73.3%. The best model reported a validation accuracy of 66.4% and a training accuracy of 99.7%. Clearly the model is learning the training set well, but again not generalizing to the validation set. The following image shows the confusion matrix for the validation set. I note that there is no appreciable difference between 1000 and 10,000 epochs to validation accuracy.
The following images were computed over night using the same params as in the previous post. The training time is significantly longer than estimated, due to the larger number of pixels (due to aspect ratio), so only three were generated at the time of writing. While these results are going in the right direction, they are still too similar to the original compositions (with the exception of 07, lower right) and need further abstraction (increase of neighbourhood size). I emailed the author of the GPU accelerated SOM I’m using and see if he can reproduce these spire effects. Since the number of iterations has such a significant effect, it seems I should be working image by image at full resolution. As inefficient as I may be that seems like the next step; I’ll prioritize the lowest resolution images for exploration sake!
I’m thinking that it makes the most sense to move up in resolution and do some experimentation at each resolution until the desired resolution is reached. It will be clear from this post that the quality of the aesthetic changes significantly at various resolutions. In order to prevent the image from approaching a gradient with such a high number of training iterations (required to provide a good sampling of the underlying diversity of the original painting), I’ve been using very small neighbourhood sizes. The image below is my best choice and it’s trained over 0.5 epochs (half the pixels) and a neighbourhood of 35px. At HD resolution, this image takes 2.5 hours to compute. If you look carefully, you’ll see some dark ‘spires’ growing from the lower left that look to be the same as those I encountered during the development of “As our gaze peers off into the distance, imagination takes over reality…” (2016). I still have no explanation of them…
For comparison, I’ve included the original image and the low resolution sketch below. At the bottom of this post images show the other neighbourhood sizes I experimented with (left: 78px; right: 150px), and rejected due to their over-abstraction.
I realized that I would not be able to get a survey of images that at least sketch out how they may look without down-scaling significantly. I’ve reduced the resolution of my working files from fitting in an HD frame down to 10% and calculated SOMs where the number of iterations matches the number of pixels. I’m quite happy with the quality of these results! Only a few seem quite weak to me, due to (a) the lack of diversity (which is exaggerated by the brutal down-sampling here) or (b) a lack of colour restraint. The images below are in the same order as the painting long-list post.
While Talos is searching for suitable models for the Zombie Formalist, I’ve started experimenting with revisiting the painting appropriation side of the project. For the initial exploration, I’m using da Vinci’s “Mona Lisa” (1517).
The following images are various explorations of abstracting the above image using the SOM to reorganize constituent pixels. Through exploring these I realized that one of the greatest influences on the quality of the result is the random sampling of pixels. The working image is 1080×1607 pixels, which means 1,735,560 training samples. In my tests using ~20,000 training iterations, only a small subset of the diversity of those pixels influence the resulting image. In these tests, I realized the most successful results are those that happen to select (randomly) a large diversity of pixels to train the SOM. The same parameters can produce very different results:
I think the image on the left is more successful because it happened to select a few brighter pixels in the original. I can produced better results by down-scaling the image to increase the diversity of pixels selected by random sampling, but that is not ideal since I’m limiting both the output resolution and the diversity of data used in training. It seems I should stick with the number of iterations that equal (at least) the number of training samples (the number of pixels in the original). Looking again at my old code, I did not realized I had fixed the neighbourhood function; in all the images below, the only variable that effects the output is the number of iterations.
The following images were generated by combining the segments from both collections of photographs (wide and close). There are a total of 135,226 segments inclusive of both collections. The top image is under-trained over only 50,000 iterations (meaning that ~2/3s of the segments were not presented to the network). The bottom image was trained over 150,000 iterations.
It took nearly 10 days for Talos to search possible models using the 768 item vector representing the colour histogram for each composition. The best validation accuracy listed by the search was 68.5% and the best model 66.2%. The best model achieved a training accuracy of 77.9%. 465 bad compositions were predicted to be bad, 294 bad compositions were predicted to be good, 232 good compositions were predicted to be bad and 568 good compositions were predicted to be good.
This is a very minor improvement from the variance features. The low training accuracy indicates there may not be enough epochs for such a large dimensional vector. I’m now running a second experiment where the 768 bin (256 bins per channel) histogram is reduced to a 96 bins (32 bins per channel). This is more comparable to the initial 57 element training vectors. If the problem is the size of the vector, this should allow for higher training accuracy and I hope, also better generalization in the next search.
Up to this point I’ve been working with half the photos I shot at TRIUMF, the close-up ones. Today I started working with the medium and wide shots that show larger scale structures, architecture, etc. Rather than ~57,000 segments, the density of the wider images resulted in ~77,000 segments. I think these images are the most successful yet, balancing abstraction and photo-realism as well as order and complexity. The composition ends up with larger areas of colour due to the larger areas of colour at the architectural scale. This was generated with 50,000 iterations and I’m now training a 2,000,000 iteration version.
The following image and details shows the result of a smaller neighbourhood function (1/10 of SOM width) after 2,000,000 training iterations. I’ve also rendered the collage in the descending order by area such that the largest segments are rendered behind the smaller segments. This increases the sense of flow, but I don’t think the very small neighbourhood improves things. I still think the images are more successful when they are more chaotic and I’m training a network on fewer iterations to see what the results look like. With the larger area images in the background, the tension between abstraction and photo-realism is lost. The resulting density of textures are very interesting though.
Using a fitted ellipse for each segment, I’ve now included orientation features. This results in images such as the following that feel like they are really going in the right direction. The top one in particular cues magnetic fields, which is very apt. The bottom image uses a larger neighbourhood function, which leads to a smoother more organized macro-structure; I prefer the top image with more turbulence. I’m now training a version of the top with more iterations to see where that goes.
After the early success using the hue histogram features on the TRIUMF collection, I thought I’d go back to the Robin collection. The results are certainly better than the initial BGR collage, but the muted natural tones and the organic quality of the segments leads to a composition that does not seem to balance order and disorder the way I would like; it’s a little too messy. I’ve included the full frame version with a few full resolution details. I’ve also posted a version at half resolution where the same-sized segments appear twice as large relative to frame.
I’ve made some progress on using the new TRIUMF photographs as material for new collages using the same set of segments. The image on the top is using simple BGR features, and the image on the bottom (and corresponding details) is using a 64 bin histogram of each segment’s hue channel as features. The BGR feature image was trained over 2,000,000 iterations while the hue histogram image was trained over 50,000 iterations; both images use a max neighbourhood size of 0.2. I’m going to also try exploring some orientation features. I’m now training a 500,000 iteration version.
I started working through some ideas for a new collage following from my previous works using cinematic material. Robin Gleason donated some photos of her material collections to start with. I think the main issues are that
- The diversity of tones in a photograph means there is much more detail than appears and when one resorts components by colour, we end up with something that often resembles a gradient.
- The quality of the edges from this organic source material means there is little meta-structure to appreciate and the size of segments means their content becomes merely texture and looses all photographic realism.
It will be interesting to see whether the hard-edge apparatus photographs will allow the preservation those hard edges. Also I’ll be going from 22 photographs to over a 100, so the size of segments can be increased (in theory). The following images shows a full-resolution collage and a few details; the ~50,000 segments were organized by mean colour similarity using a under-trained Self-Organized Map (SOM). I also included a few other visualizations of some SOM (not painted using the segments) results that show the lack of interesting structure. I also plan to explore using features other than mean colour, which should allow for more complexity.
On Wednesday I had the opportunity to spend a couple hours amongst the TRIUMF beam lines to take photographs for the project. I’m just posting a few photos here of the scrap area behind the shop, where Sarah Dunsiger, Robin Gleason and Karen Kazmer were doing a material exploration of the scrap materials.
I also captured a few of the chaotic offices, which were selected by my tour-guide Stuart, for their remarkable (dis)organization. Apparently it’s something that does come up on tours with the general public!
The quick variance features were easy to implement, but provided no improvement and performed worse than the previous features. The parameter search resulted in a peak validation accuracy of 64.1% while the best model achieved 66% accuracy on training data and 62.1% on validation data. The following image shows the confusion matrix for validation data. I’m next going to generate colour histograms for the 15000B compositions and see if leads to any improvement.
With all this focus on the Zombie Formalist I’ve been spending some of the ML search time researching for the painting history appropriation aspect of the project. I’ve narrowed down a long list of paintings based on popularity and their trajectory from Northern European Renaissance realism to modern problematizations of realism; I’ve selected works from the Renaissance, Cubism and Surrealism, as follows. Thumbnail images are included below the table.
The next step for this component of the project is to do some ML to reorganize the pixels and see what works best. The resolution some of the sources are quite high, quite low for others. It’s yet unclear how to consider the scale of the originals in the appropriation works as some are very large. I’m also not sure how large I will be able to go with the self-reorganization process.
|Leonardo da Vinci||Mona Lisa||1517|
|Leonardo da Vinci||Salvator Mundi||1500|
|Michelangelo||The Creation of Adam, Sistine Chapel ceiling||1512|
|Caravaggio||The Conversion of Saint Paul||1601|
|Rembrandt||The Storm on the Sea of Galilee||1633|
|Rembrandt||The Anatomy Lesson of Dr Nicolaes Tulp||1632|
|Rembrandt||The Night Watch||1642|
|Juan Gris||Nature morte à la nappe à carreaux (Still_Life_with_Checked_Tablecloth)||1915|
|Juan Gris||Portrait of Pablo Picasso||1912|
|Duchamp||Nude Descending a Staircase No. 2||1912|
|Jean Metzinger||Le goûter (Tea Time)||1911|
|Georges Braque||Violin and Palette (Violon et palette, Dans l’atelier)||1909|
|Georges Braque||Nature Morte (The Pedestal Table)||1911|
|Georges Braque||Man with a Guitar (Figure, L’homme à la guitare)||1912|
|Georges Braque||Bottle and Fishes||1912|
|Fernand Léger||Les Fumeurs (The Smokers)||1912|
|Albert Gleizes||Portrait de Jacques Nayral||1911|
|Albert Gleizes||L’Homme au Balcon (Man on a Balcony)||1912|
|Rene Magritte||The Son of Man||1964|
|Rene Magritte||The Human Condition||1933|
|Yves Tanguy||Mama, Papa Is Wounded||1927|
|Yves Tanguy||Through birds through fire but not through glass||1943|