<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="en">
	<title>Justin Pinkney</title>
	<subtitle>Justin Pinkney&#39;s home on the web</subtitle>
	<link href="https://justinpinkney.com/feed/feed.xml" rel="self"/>
	<link href="https://justinpinkney.com/"/>
	<updated>2024-08-17T00:00:00Z</updated>
	<id>https://justinpinkney.com/</id>
	<author>
		<name>Justin Pinkney</name>
		<email></email>
	</author>
	
	<entry>
		<title>Trailer Faces HQ Dataset</title>
		<link href="https://justinpinkney.com/blog/2024/trailer-faces/"/>
		<updated>2024-08-17T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2024/trailer-faces/</id>
		<content type="html">&lt;h4 id=&quot;a-dataset-of-187-thousand-high-resolution-face-images-from-movie-trailers-download-it-from-huggingface&quot; tabindex=&quot;-1&quot;&gt;A dataset of 187 thousand high resolution face images from movie trailers! Download it from &lt;a href=&quot;https://huggingface.co/datasets/justinpinkney/trailer-faces-hq&quot;&gt;huggingface&lt;/a&gt;. &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/&quot;&gt;#&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/tfhq-3panel.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/yz-3HQ29l--200.webp 200w, https://justinpinkney.com/img/yz-3HQ29l--320.webp 320w, https://justinpinkney.com/img/yz-3HQ29l--500.webp 500w, https://justinpinkney.com/img/yz-3HQ29l--800.webp 800w, https://justinpinkney.com/img/yz-3HQ29l--1024.webp 1024w, https://justinpinkney.com/img/yz-3HQ29l--1600.webp 1600w, https://justinpinkney.com/img/yz-3HQ29l--3072.webp 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/yz-3HQ29l--200.jpeg 200w, https://justinpinkney.com/img/yz-3HQ29l--320.jpeg 320w, https://justinpinkney.com/img/yz-3HQ29l--500.jpeg 500w, https://justinpinkney.com/img/yz-3HQ29l--800.jpeg 800w, https://justinpinkney.com/img/yz-3HQ29l--1024.jpeg 1024w, https://justinpinkney.com/img/yz-3HQ29l--1600.jpeg 1600w, https://justinpinkney.com/img/yz-3HQ29l--3072.jpeg 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;examples from the tfhq dataset&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/yz-3HQ29l--200.jpeg&quot; width=&quot;3072&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Before the advent of giant web scale image datasets &lt;a href=&quot;https://github.com/NVlabs/ffhq-dataset&quot;&gt;FFHQ&lt;/a&gt; used to be considered a big dataset full of images of faces on which many, many GANs were trained. One of the many issues with FFHQ (side note: don’t get me wrong there are lots of others, but I was focused on this one at the time) is the lack of diversity in expressions on people’s faces, being scraped from Flickr it tends to mostly be fairly bland smiles all round. At the time I was working on a project related to extracting latent directions corresponding to different emotions and the lack of emotions in ffhq is very noticeable, both in the original dataset as well as the ability of the stylegan model to represent those non smiling/neutral expressions.&lt;/p&gt;
&lt;h2 id=&quot;finding-emotions&quot; tabindex=&quot;-1&quot;&gt;Finding emotions &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After this I was on a hunt for a large diverse high resolution dataset of faces with varied emotions, unfortunately none of the exist ones seemed to fit the bill examples?&lt;/p&gt;
&lt;p&gt;I guess that meant I had to collect my own. Trying to think of places that had high res closeups of faces in a great variety of emotional states, movies seemed an obvious possibility.&lt;/p&gt;
&lt;p&gt;At the time it turned out that the apple trailers website was very easy to scrape, it was literally just a list of links to .mov files, and they were at very high bit rate and resolution, generally showing much less compression than a similar trailer on YouTube. High bit rate is important as it’s not often obvious just how bad any still from even a 4k trailer look with typical web levels of compression. The use of trailers also meant there was a good variety of faces compared to tv shows for example which just have the same few faces over and over again.&lt;/p&gt;
&lt;h2 id=&quot;tfhq-dataset&quot; tabindex=&quot;-1&quot;&gt;TFHQ dataset &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the end I collected around 186 thousand high resolution face images to make my new dataset, if you’re interested in some of the details they are listed below:&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. These bits were expanded from a paper that never went anywhere.&lt;/span&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Todo picking which trailers to process&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I downloaded all movie trailers and featurettes listed on the Apple Movie Trailers website as of August 2022. That resulted in  15,379 trailers at Full HD (1080p) resolution, amounting to approximately 2 TB/507 hours of video.&lt;/li&gt;
&lt;li&gt;to detect faces I used the pre-trained Yolov5-face large model [32] rejecting any detections with bounding box less than 256 px in height, or confidence less than 0.5.&lt;/li&gt;
&lt;li&gt;One clear challenge was deduplication, film sequences obviously contain many similar frames and these are often motion blurred due to the low frame rate of movies. That means I would get face detection of the same face over the course of a shot and I needed a way to choose only the sharpest frame and discard the rest. First I computed an image similarity metric using a pre-trained CLIP ViT-B/32 [33] to detect sequences of detections which were very similar. Then I  measured the variance of the Laplacian over the images as an approximate relative sharpness metric. The for each set of similar frames I discarded all but the sharpest images, the similarity threshold was tuned by hand, considering frames where there was no significant motion, or significant changes in expression as similar, e.g consecutive frames of a person talking without much head motion should be consider similar.&lt;/li&gt;
&lt;li&gt;Finally I did the usual FFHQ alignment on the faces crops [21]. This also involved a lot of frames which needed padding due to the aspect ratio of movies tending to crop the top of the head, for this I did the usual reflection and blur padding used for ffhq.&lt;/li&gt;
&lt;li&gt;many of the final images had undesirable properties such as occlusion, non-photographic faces, or overlaid text. To get only the best quality images and remove any false detections from the face detector I did further perform quality filtering by training a classifier on several hundred subjectively determined “good”/“bad” example images, and use this to exclude predicted “bad” images leaving a total of 186,553 images in the dataset.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;face-identification&quot; tabindex=&quot;-1&quot;&gt;Face identification &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In order to train my &lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion&quot;&gt;face identity conditioned diffusion model&lt;/a&gt; I also wanted to extract face identity information for the dataset and associate different image of the same person with each other for building a retreival based training dataset. I extracted face embedding vectors for every image using a pretrained model from &lt;a href=&quot;https://github.com/TreB1eN/InsightFace_Pytorch&quot;&gt;InsightFace_Pytorch&lt;/a&gt; (specifically the IR-SE50 model) and then I created a .json file which for every file listed all the other image files with a high cosine similarity of face embeddings of greater than 0.5 (side note I also added a threshold to prevent any matches with a clip similarity greater than 0.9 to avoid very similar images). This gives a dataset where for every image you can easily access the other images of the same identiy in different poses or settings.&lt;/p&gt;
&lt;p&gt;This produces a .json file with entries which looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;00000000.jpg&amp;quot;: [],
  &amp;quot;00000001.jpg&amp;quot;: [&amp;quot;00024932.jpg&amp;quot;, &amp;quot;00036845.jpg&amp;quot;, ...],
  ... etc
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;empty lists indicate that a given image has no face id matches. For those with matches the face identifier gives pretty satisifactory results:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/ret-example.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/UMovulZKb9-200.webp 200w, https://justinpinkney.com/img/UMovulZKb9-320.webp 320w, https://justinpinkney.com/img/UMovulZKb9-500.webp 500w, https://justinpinkney.com/img/UMovulZKb9-800.webp 800w, https://justinpinkney.com/img/UMovulZKb9-1024.webp 1024w, https://justinpinkney.com/img/UMovulZKb9-1600.webp 1600w, https://justinpinkney.com/img/UMovulZKb9-2731.webp 2731w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/UMovulZKb9-200.jpeg 200w, https://justinpinkney.com/img/UMovulZKb9-320.jpeg 320w, https://justinpinkney.com/img/UMovulZKb9-500.jpeg 500w, https://justinpinkney.com/img/UMovulZKb9-800.jpeg 800w, https://justinpinkney.com/img/UMovulZKb9-1024.jpeg 1024w, https://justinpinkney.com/img/UMovulZKb9-1600.jpeg 1600w, https://justinpinkney.com/img/UMovulZKb9-2731.jpeg 2731w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Example of a query image and the matching faces retreived&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/UMovulZKb9-200.jpeg&quot; width=&quot;2731&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;usage&quot; tabindex=&quot;-1&quot;&gt;Usage &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/trailer-faces/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The dataset is released under the &lt;a href=&quot;https://creativecommons.org/licenses/by-sa/4.0/&quot;&gt;Creative Commons BY-SA&lt;/a&gt;&lt;img src=&quot;https://i.creativecommons.org/l/by-sa/4.0/80x15.png&quot; alt=&quot;https://creativecommons.org/licenses/by-sa/4.0/&quot;&gt; license. If you use it for something cool please &lt;a href=&quot;https://x.com/Buntworthy&quot;&gt;let me know&lt;/a&gt;! And if you want to cite the dataset, here is a bibtex entry:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@misc{pinkney2023tfhq,
      author = {Pinkney, Justin N. M.},
      title = {Trailer Faces HQ dataset},
      year={2023},
      howpublished= {&#92;url{https://www.justinpinkney.com/blog/2024/trailer-faces}}
}
&lt;/code&gt;&lt;/pre&gt;
</content>
	</entry>
	
	<entry>
		<title>Experiments with Swapping Autoencoder</title>
		<link href="https://justinpinkney.com/blog/2024/swapping-autoencoder/"/>
		<updated>2024-05-05T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2024/swapping-autoencoder/</id>
		<content type="html">&lt;p&gt;Back in the GAN days most of my time was spent playing with famous models like &lt;a href=&quot;https://justinpinkney.com/tags/stylegan/&quot;&gt;StyleGAN&lt;/a&gt; and &lt;a href=&quot;https://justinpinkney.com/tags/pix2pix/&quot;&gt;pix2pix&lt;/a&gt;. But another of my favourite models, which was much less well known, was &lt;a href=&quot;https://taesung.me/SwappingAutoencoder/&quot;&gt;Swapping Autoencoder&lt;/a&gt;, a lovely model from Adobe research for which all the code and checkpoints were released&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. It was also the method (if not the exact model) used for the landscape mixer feature in Photoshop&lt;/span&gt;
&lt;/span&gt;. This is a little post for me to give the model a little of the love it deserves and to record some of the things I made with it.&lt;/p&gt;
&lt;p&gt;The basic principle is to try is to generate images but to separate control of the large scale structure from the small scale texture/style details, and to learn this ability without any access to specially created data.&lt;/p&gt;
&lt;p&gt;I&#39;m not going to describe how it&#39;s trained (the &lt;a href=&quot;https://arxiv.org/abs/2007.00653&quot;&gt;paper&lt;/a&gt; is fairly accessible if you&#39;re interesting), but in the end the model encodes and image into to parts: a style and a structure code. With style being defined as the idea that multiple small crops should plausibly look like they came from the same image. Then you can generate new images with the model by providing a style and structure code, and the cool bit being that you can arbitrarily swap these between different images to get new unexpected things!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/swapping-ae-schematic.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/jlhAbzJUGR-200.webp 200w, https://justinpinkney.com/img/jlhAbzJUGR-320.webp 320w, https://justinpinkney.com/img/jlhAbzJUGR-500.webp 500w, https://justinpinkney.com/img/jlhAbzJUGR-800.webp 800w, https://justinpinkney.com/img/jlhAbzJUGR-1024.webp 1024w, https://justinpinkney.com/img/jlhAbzJUGR-1416.webp 1416w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/jlhAbzJUGR-200.jpeg 200w, https://justinpinkney.com/img/jlhAbzJUGR-320.jpeg 320w, https://justinpinkney.com/img/jlhAbzJUGR-500.jpeg 500w, https://justinpinkney.com/img/jlhAbzJUGR-800.jpeg 800w, https://justinpinkney.com/img/jlhAbzJUGR-1024.jpeg 1024w, https://justinpinkney.com/img/jlhAbzJUGR-1416.jpeg 1416w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Schematic of the Swapping Autoencoder architecture&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/jlhAbzJUGR-200.jpeg&quot; width=&quot;1416&quot; height=&quot;951&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Schematic of the training scheme of Swapping Autoencoder from: &lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/(https:/arxiv.org/abs/2007.00653&quot;&gt;&quot;Swapping Autoencoder for Deep Image Manipulation&quot;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;The intended way to use the model was to take the content representation from one image, and the style from another to do a sort of &amp;quot;style transfer&amp;quot;, although each model was limited to a single domain like landscapes, churches etc. So it was more like a texture/appearance transfer. But there were a lot more interesting things you could do with these broken down latent codes. And I spent a lot of time experimenting witht things and made a tremendous number of images and videos I found fascinating, and here are a bunch of them:&lt;/p&gt;
&lt;h2 id=&quot;spatially-blending-landscapes&quot; tabindex=&quot;-1&quot;&gt;Spatially blending landscapes &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First off it&#39;s fun to take two image and encode them into style and content, then squash the two content feature blocks together side by side (with maybe some interpolation where they meet), pick one of the styles or combine them, then generate a new image. You get pleasantly blended impossible landscape images this way.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/landscape_blend_1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/PT_ag0vnOh-200.webp 200w, https://justinpinkney.com/img/PT_ag0vnOh-320.webp 320w, https://justinpinkney.com/img/PT_ag0vnOh-500.webp 500w, https://justinpinkney.com/img/PT_ag0vnOh-800.webp 800w, https://justinpinkney.com/img/PT_ag0vnOh-1024.webp 1024w, https://justinpinkney.com/img/PT_ag0vnOh-1600.webp 1600w, https://justinpinkney.com/img/PT_ag0vnOh-3776.webp 3776w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/PT_ag0vnOh-200.jpeg 200w, https://justinpinkney.com/img/PT_ag0vnOh-320.jpeg 320w, https://justinpinkney.com/img/PT_ag0vnOh-500.jpeg 500w, https://justinpinkney.com/img/PT_ag0vnOh-800.jpeg 800w, https://justinpinkney.com/img/PT_ag0vnOh-1024.jpeg 1024w, https://justinpinkney.com/img/PT_ag0vnOh-1600.jpeg 1600w, https://justinpinkney.com/img/PT_ag0vnOh-3776.jpeg 3776w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Blended landscape made with swapping autoencoder&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/PT_ag0vnOh-200.jpeg&quot; width=&quot;3776&quot; height=&quot;2048&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/landscape_blend_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/n7Gxf1HM2d-200.webp 200w, https://justinpinkney.com/img/n7Gxf1HM2d-320.webp 320w, https://justinpinkney.com/img/n7Gxf1HM2d-500.webp 500w, https://justinpinkney.com/img/n7Gxf1HM2d-800.webp 800w, https://justinpinkney.com/img/n7Gxf1HM2d-1024.webp 1024w, https://justinpinkney.com/img/n7Gxf1HM2d-1600.webp 1600w, https://justinpinkney.com/img/n7Gxf1HM2d-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/n7Gxf1HM2d-200.jpeg 200w, https://justinpinkney.com/img/n7Gxf1HM2d-320.jpeg 320w, https://justinpinkney.com/img/n7Gxf1HM2d-500.jpeg 500w, https://justinpinkney.com/img/n7Gxf1HM2d-800.jpeg 800w, https://justinpinkney.com/img/n7Gxf1HM2d-1024.jpeg 1024w, https://justinpinkney.com/img/n7Gxf1HM2d-1600.jpeg 1600w, https://justinpinkney.com/img/n7Gxf1HM2d-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Blended landscape made with swapping autoencoder&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/n7Gxf1HM2d-200.jpeg&quot; width=&quot;2560&quot; height=&quot;2048&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;pca-latent-editing&quot; tabindex=&quot;-1&quot;&gt;PCA latent editing &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another classic technique from the StyleGAN era is to find latent directions in whatever latent space you are working in. GANSpace introduced the idea of encoding a bunch of images then doing PCA to find promising directions. The directions I found encoded in the Swapping Autoencoder latent space are a bit odd, but probably with more effort you could find more semantically meaningful ones. (But who need semantic meaning when you can have weird ones?)&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/landscape_blend_5.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/UHuNrBNrd0-200.webp 200w, https://justinpinkney.com/img/UHuNrBNrd0-320.webp 320w, https://justinpinkney.com/img/UHuNrBNrd0-500.webp 500w, https://justinpinkney.com/img/UHuNrBNrd0-800.webp 800w, https://justinpinkney.com/img/UHuNrBNrd0-944.webp 944w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/UHuNrBNrd0-200.jpeg 200w, https://justinpinkney.com/img/UHuNrBNrd0-320.jpeg 320w, https://justinpinkney.com/img/UHuNrBNrd0-500.jpeg 500w, https://justinpinkney.com/img/UHuNrBNrd0-800.jpeg 800w, https://justinpinkney.com/img/UHuNrBNrd0-944.jpeg 944w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Blended landscape with latent editing&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/UHuNrBNrd0-200.jpeg&quot; width=&quot;944&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
I can&#39;t actually remember if this is a latent direction in the style or content space, but either way it&#39;s probably roughly the &quot;more clouds&quot; direction.
&lt;/div&gt;
&lt;h2 id=&quot;clip-guidance&quot; tabindex=&quot;-1&quot;&gt;CLIP guidance &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Around this time was the CLIP guidance craze. I thought I&#39;d have a go using applying the typical CLIP loss to optimise the swapping auto-encoder style latent based on a text prompt. As you can imagine this works pretty well for image editing.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/sa_clip_guidance_1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/95d8tIMHPn-200.webp 200w, https://justinpinkney.com/img/95d8tIMHPn-320.webp 320w, https://justinpinkney.com/img/95d8tIMHPn-500.webp 500w, https://justinpinkney.com/img/95d8tIMHPn-800.webp 800w, https://justinpinkney.com/img/95d8tIMHPn-1024.webp 1024w, https://justinpinkney.com/img/95d8tIMHPn-1600.webp 1600w, https://justinpinkney.com/img/95d8tIMHPn-1824.webp 1824w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/95d8tIMHPn-200.jpeg 200w, https://justinpinkney.com/img/95d8tIMHPn-320.jpeg 320w, https://justinpinkney.com/img/95d8tIMHPn-500.jpeg 500w, https://justinpinkney.com/img/95d8tIMHPn-800.jpeg 800w, https://justinpinkney.com/img/95d8tIMHPn-1024.jpeg 1024w, https://justinpinkney.com/img/95d8tIMHPn-1600.jpeg 1600w, https://justinpinkney.com/img/95d8tIMHPn-1824.jpeg 1824w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;CLIP guided editing of a swapping autoencoder style latent&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/95d8tIMHPn-200.jpeg&quot; width=&quot;1824&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Original image on the left. Optimisation of the Swapping AE style vector with a text prompt of something like &quot;lush green trees&quot; and resulting image on the right.
&lt;/div&gt;
&lt;p&gt;I also tried optimising the content latent image with CLIP, this turns out weird, but quite interesting. Much less recognisable stuff, but a lot of interesting textures going on.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/sa_clip_guidance_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/duj8rubLkL-200.webp 200w, https://justinpinkney.com/img/duj8rubLkL-320.webp 320w, https://justinpinkney.com/img/duj8rubLkL-500.webp 500w, https://justinpinkney.com/img/duj8rubLkL-800.webp 800w, https://justinpinkney.com/img/duj8rubLkL-848.webp 848w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/duj8rubLkL-200.jpeg 200w, https://justinpinkney.com/img/duj8rubLkL-320.jpeg 320w, https://justinpinkney.com/img/duj8rubLkL-500.jpeg 500w, https://justinpinkney.com/img/duj8rubLkL-800.jpeg 800w, https://justinpinkney.com/img/duj8rubLkL-848.jpeg 848w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/duj8rubLkL-200.jpeg&quot; width=&quot;848&quot; height=&quot;448&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/sa_clip_guidance.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/U3hVQJTrP3-200.webp 200w, https://justinpinkney.com/img/U3hVQJTrP3-320.webp 320w, https://justinpinkney.com/img/U3hVQJTrP3-500.webp 500w, https://justinpinkney.com/img/U3hVQJTrP3-672.webp 672w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/U3hVQJTrP3-200.jpeg 200w, https://justinpinkney.com/img/U3hVQJTrP3-320.jpeg 320w, https://justinpinkney.com/img/U3hVQJTrP3-500.jpeg 500w, https://justinpinkney.com/img/U3hVQJTrP3-672.jpeg 672w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/U3hVQJTrP3-200.jpeg&quot; width=&quot;672&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;unsynchronised-style-content-video-generation&quot; tabindex=&quot;-1&quot;&gt;Unsynchronised style/content video generation &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The autoencoding ability of Swapping AE is sufficiently good that you can auto-encode a video and it still looks fairly good (apart from the texture sticking we are all familiar with from non-StyleGAN3 GANs). I don&#39;t know why you would want to do that. But if you offset in time or average or otherwise mess with the style and content encodings relative to each other you can create some interesting video transition effects:&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://assets.justinpinkney.com/blog/swapping-autoencoder/landscape_swap_1.mp4&quot; loop=&quot;false&quot; preload=&quot;auto&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;recursive-content-transformation-videos&quot; tabindex=&quot;-1&quot;&gt;Recursive content transformation videos &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://assets.justinpinkney.com/blog/swapping-autoencoder/recursion_1.mp4&quot; loop=&quot;false&quot; preload=&quot;auto&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Another fun effect I stumbled across was making videos my recursively encoding and then generating an image over and over, then you can stitch all the generated images together to make a video, like the one above.&lt;/p&gt;
&lt;p&gt;It gets more interesting if you apply a transformation to the content encoding between each frame. Something like a simple zoom in/out or a rotation give all sorts of weird and wonderful patterns emerging &lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-2&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;2. I found that applying a zoom out transform was more effective than zoom in for preserving details and not simply degenerating into noise. Then I reveresed the frames to preserve the zooming in effect. Watch to the end of the second video to see what the starting image was.&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/video_transform_schematic.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/1Mwd_Av6CT-200.webp 200w, https://justinpinkney.com/img/1Mwd_Av6CT-320.webp 320w, https://justinpinkney.com/img/1Mwd_Av6CT-500.webp 500w, https://justinpinkney.com/img/1Mwd_Av6CT-800.webp 800w, https://justinpinkney.com/img/1Mwd_Av6CT-1024.webp 1024w, https://justinpinkney.com/img/1Mwd_Av6CT-1600.webp 1600w, https://justinpinkney.com/img/1Mwd_Av6CT-2280.webp 2280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/1Mwd_Av6CT-200.jpeg 200w, https://justinpinkney.com/img/1Mwd_Av6CT-320.jpeg 320w, https://justinpinkney.com/img/1Mwd_Av6CT-500.jpeg 500w, https://justinpinkney.com/img/1Mwd_Av6CT-800.jpeg 800w, https://justinpinkney.com/img/1Mwd_Av6CT-1024.jpeg 1024w, https://justinpinkney.com/img/1Mwd_Av6CT-1600.jpeg 1600w, https://justinpinkney.com/img/1Mwd_Av6CT-2280.jpeg 2280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/1Mwd_Av6CT-200.jpeg&quot; width=&quot;2280&quot; height=&quot;846&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
A little schematic of how each video frame is generated with transforms applied to the content encoding&lt;span class=&quot;sidenote&quot;&gt;
		&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-3&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/label&gt;
		&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-3&quot;&gt;
		&lt;span class=&quot;sidenote-content&quot;&gt;3. After drawing this schematic in Procreate I realised I had made it way too low resolution, but &lt;a href=&quot;https://github.com/philz1337x/clarity-upscaler&quot;&gt;Clarity&lt;/a&gt; turns out to be a decent upscaler, especially if you use &lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/diagram-flowers.jpg&quot;&gt;the &#39;wrong&#39; settings&lt;/a&gt;.
&lt;br&gt; &lt;br&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/diagram-flowers.jpg&quot;&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/diagram-flowers.jpg&quot;&gt;&lt;/a&gt;&lt;/span&gt;
	&lt;/span&gt;.
&lt;/div&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://assets.justinpinkney.com/blog/swapping-autoencoder/swapping-video-02.mp4&quot; loop=&quot;false&quot; preload=&quot;auto&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://assets.justinpinkney.com/blog/swapping-autoencoder/swapping-video-08.mp4&quot; loop=&quot;false&quot; preload=&quot;auto&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/churches.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/4Lj7EYJMbh-200.webp 200w, https://justinpinkney.com/img/4Lj7EYJMbh-320.webp 320w, https://justinpinkney.com/img/4Lj7EYJMbh-500.webp 500w, https://justinpinkney.com/img/4Lj7EYJMbh-800.webp 800w, https://justinpinkney.com/img/4Lj7EYJMbh-944.webp 944w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/4Lj7EYJMbh-200.jpeg 200w, https://justinpinkney.com/img/4Lj7EYJMbh-320.jpeg 320w, https://justinpinkney.com/img/4Lj7EYJMbh-500.jpeg 500w, https://justinpinkney.com/img/4Lj7EYJMbh-800.jpeg 800w, https://justinpinkney.com/img/4Lj7EYJMbh-944.jpeg 944w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/4Lj7EYJMbh-200.jpeg&quot; width=&quot;944&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;train-rides&quot; tabindex=&quot;-1&quot;&gt;Train rides &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I also used Swapping Autoencoder for one of my earlier attempts to make &lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/#swapping-autoencoder&quot;&gt;synthetic train rides&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;https://youtu.be/ITrWUeHqwu4?si=ggp1KoFoimbCmlvt&lt;/p&gt;
&lt;h2 id=&quot;the-end&quot; tabindex=&quot;-1&quot;&gt;The end &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/swapping-autoencoder/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Well that&#39;s about it, and interesting model with plenty of scope to play with. As with most GANs it&#39;s a domain limited model and not general purpose. Maybe in the brave new world of diffusion we could train something that can do anything. But it&#39;s not exactly obvious how to train something like this with diffusion, the style training learns to match a distribution indirectly as the loss is defined by the Discriminator enforcing that crops should match the data distribution of other crops. With the direct, incremental loss of diffusion how to do this seems less obvious. Though there are other diffusion based methods which try and seperate style and content. (I&#39;ll add a list sometime...)&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Image Mixer Diffusion</title>
		<link href="https://justinpinkney.com/blog/2024/image-mixer-diffusion/"/>
		<updated>2024-04-19T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2024/image-mixer-diffusion/</id>
		<content type="html">&lt;p&gt;&lt;strong&gt;Just want to play with the model? Try it on &lt;a href=&quot;https://cloud.lambdalabs.com/demos/lambda/image-mixer-demo&quot;&gt;Lambda labs&lt;/a&gt; or &lt;a href=&quot;https://huggingface.co/spaces/lambdalabs/image-mixer-demo&quot;&gt;Huggingface&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer_banner.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/3G6bXDiZGX-200.webp 200w, https://justinpinkney.com/img/3G6bXDiZGX-320.webp 320w, https://justinpinkney.com/img/3G6bXDiZGX-500.webp 500w, https://justinpinkney.com/img/3G6bXDiZGX-800.webp 800w, https://justinpinkney.com/img/3G6bXDiZGX-1024.webp 1024w, https://justinpinkney.com/img/3G6bXDiZGX-1600.webp 1600w, https://justinpinkney.com/img/3G6bXDiZGX-4212.webp 4212w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/3G6bXDiZGX-200.jpeg 200w, https://justinpinkney.com/img/3G6bXDiZGX-320.jpeg 320w, https://justinpinkney.com/img/3G6bXDiZGX-500.jpeg 500w, https://justinpinkney.com/img/3G6bXDiZGX-800.jpeg 800w, https://justinpinkney.com/img/3G6bXDiZGX-1024.jpeg 1024w, https://justinpinkney.com/img/3G6bXDiZGX-1600.jpeg 1600w, https://justinpinkney.com/img/3G6bXDiZGX-4212.jpeg 4212w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/3G6bXDiZGX-200.jpeg&quot; width=&quot;4212&quot; height=&quot;1917&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I&#39;m doing more catching up with write ups of past models I trained, this time it&#39;s the turn of &amp;quot;Image Mixer&amp;quot; at the time I did at least share &lt;a href=&quot;https://huggingface.co/lambdalabs/image-mixer&quot;&gt;the checkpoint&lt;/a&gt; and a &lt;a href=&quot;https://huggingface.co/spaces/lambdalabs/image-mixer-demo&quot;&gt;demo&lt;/a&gt;, but here&#39;s a more detailed write up of the process of training the model itself.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Even before I worked there I was very impressed with Midjourney&#39;s blend feature, where you can combine two (or more) images together&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. I&#39;d love to see someone make a version of &lt;a href=&quot;https://neal.fun/infinite-craft/&quot;&gt;Infinite Craft&lt;/a&gt; but blending images instead of words.&lt;/span&gt;
&lt;/span&gt;. This way of prompting always resonated with me as it gave me vibes of interpolating in StyleGAN&#39;s latent space.&lt;/p&gt;
&lt;p&gt;In this type of image blending everything gets mixed together in a big soup roughly 50/50, so the style content and objects in the image all end up mixed together between the two images &lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-2&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;2. Now we have more fine-grained ways of blending images in Midjourney, Style and Character references let you take specific aspects of source images and they are great!&lt;/span&gt;
&lt;/span&gt;. This gives very cool effects but I wanted to explore a more &amp;quot;compositional&amp;quot; method for image blending, i.e. taking the elements from each image and placing them together in a new image. The resulting model was Image Mixer, I don&#39;t think it really succeeded, but that&#39;s what I was going for, and something fun came out anyway.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/GTKfVAl64Z-200.webp 200w, https://justinpinkney.com/img/GTKfVAl64Z-320.webp 320w, https://justinpinkney.com/img/GTKfVAl64Z-500.webp 500w, https://justinpinkney.com/img/GTKfVAl64Z-800.webp 800w, https://justinpinkney.com/img/GTKfVAl64Z-1024.webp 1024w, https://justinpinkney.com/img/GTKfVAl64Z-1600.webp 1600w, https://justinpinkney.com/img/GTKfVAl64Z-2725.webp 2725w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/GTKfVAl64Z-200.jpeg 200w, https://justinpinkney.com/img/GTKfVAl64Z-320.jpeg 320w, https://justinpinkney.com/img/GTKfVAl64Z-500.jpeg 500w, https://justinpinkney.com/img/GTKfVAl64Z-800.jpeg 800w, https://justinpinkney.com/img/GTKfVAl64Z-1024.jpeg 1024w, https://justinpinkney.com/img/GTKfVAl64Z-1600.jpeg 1600w, https://justinpinkney.com/img/GTKfVAl64Z-2725.jpeg 2725w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/GTKfVAl64Z-200.jpeg&quot; width=&quot;2725&quot; height=&quot;1067&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To make the model I fine tuned my &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;image variations model&lt;/a&gt;, instead of using the clip embed of the whole image to condition the diffusion model, I took several small crops of a single image and used all those image embeds as conditions (concatenating them into a sequence of tokens). The hope being that the model will learn that it has to use the appearance and objects visible in those crops, but will figure out how to compose them together into a reasonable image.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-6.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/XFJdz5Sh5r-200.webp 200w, https://justinpinkney.com/img/XFJdz5Sh5r-320.webp 320w, https://justinpinkney.com/img/XFJdz5Sh5r-500.webp 500w, https://justinpinkney.com/img/XFJdz5Sh5r-800.webp 800w, https://justinpinkney.com/img/XFJdz5Sh5r-1024.webp 1024w, https://justinpinkney.com/img/XFJdz5Sh5r-1600.webp 1600w, https://justinpinkney.com/img/XFJdz5Sh5r-2730.webp 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/XFJdz5Sh5r-200.jpeg 200w, https://justinpinkney.com/img/XFJdz5Sh5r-320.jpeg 320w, https://justinpinkney.com/img/XFJdz5Sh5r-500.jpeg 500w, https://justinpinkney.com/img/XFJdz5Sh5r-800.jpeg 800w, https://justinpinkney.com/img/XFJdz5Sh5r-1024.jpeg 1024w, https://justinpinkney.com/img/XFJdz5Sh5r-1600.jpeg 1600w, https://justinpinkney.com/img/XFJdz5Sh5r-2730.jpeg 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/XFJdz5Sh5r-200.jpeg&quot; width=&quot;2730&quot; height=&quot;1070&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sometimes it works a bit, but it doesn&#39;t quite get there in general. I increased the training resolution to 640x640 to give me space to extract crops which didn&#39;t contain too much of the original image. The model does manage to mash different concepts together although the image quality can be somewhat off sometimes, and it still tends to blend the concepts together. The other nice effect is that you can add text conditions as the model is trained to use the shared image text latent space of clip &lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-3&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-3&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;3. Of course you can do most of these things using the base Image Variations model too, and mix images either by &lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/&quot;&gt;averaging or concatenating embeddings&lt;/a&gt;&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-4.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/jZ3j9UEKOY-200.webp 200w, https://justinpinkney.com/img/jZ3j9UEKOY-320.webp 320w, https://justinpinkney.com/img/jZ3j9UEKOY-500.webp 500w, https://justinpinkney.com/img/jZ3j9UEKOY-800.webp 800w, https://justinpinkney.com/img/jZ3j9UEKOY-1024.webp 1024w, https://justinpinkney.com/img/jZ3j9UEKOY-1600.webp 1600w, https://justinpinkney.com/img/jZ3j9UEKOY-2742.webp 2742w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/jZ3j9UEKOY-200.jpeg 200w, https://justinpinkney.com/img/jZ3j9UEKOY-320.jpeg 320w, https://justinpinkney.com/img/jZ3j9UEKOY-500.jpeg 500w, https://justinpinkney.com/img/jZ3j9UEKOY-800.jpeg 800w, https://justinpinkney.com/img/jZ3j9UEKOY-1024.jpeg 1024w, https://justinpinkney.com/img/jZ3j9UEKOY-1600.jpeg 1600w, https://justinpinkney.com/img/jZ3j9UEKOY-2742.jpeg 2742w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/jZ3j9UEKOY-200.jpeg&quot; width=&quot;2742&quot; height=&quot;1083&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I also tried training a version which had one global clip embed of the full image and a set of smaller crops. I was hoping that would allow a more style transfer sort of approach with the global embed controlling the content and the local ones more able to affect the details of texture and style. It didn&#39;t work out that way unfortunately.&lt;/p&gt;
&lt;p&gt;Still it&#39;s a fun model to play with, even if it&#39;s a bit outdated by modern image generation standards. Take it for a spin,  &lt;a href=&quot;https://cloud.lambdalabs.com/demos/lambda/image-mixer-demo&quot;&gt;here&lt;/a&gt; or &lt;a href=&quot;https://huggingface.co/spaces/lambdalabs/image-mixer-demo&quot;&gt;here&lt;/a&gt;, or download &lt;a href=&quot;https://huggingface.co/lambdalabs/image-mixer&quot;&gt;the checkpoint&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some more examples:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-0.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/wRKk4YmlkF-200.webp 200w, https://justinpinkney.com/img/wRKk4YmlkF-320.webp 320w, https://justinpinkney.com/img/wRKk4YmlkF-500.webp 500w, https://justinpinkney.com/img/wRKk4YmlkF-800.webp 800w, https://justinpinkney.com/img/wRKk4YmlkF-1024.webp 1024w, https://justinpinkney.com/img/wRKk4YmlkF-1600.webp 1600w, https://justinpinkney.com/img/wRKk4YmlkF-2754.webp 2754w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/wRKk4YmlkF-200.jpeg 200w, https://justinpinkney.com/img/wRKk4YmlkF-320.jpeg 320w, https://justinpinkney.com/img/wRKk4YmlkF-500.jpeg 500w, https://justinpinkney.com/img/wRKk4YmlkF-800.jpeg 800w, https://justinpinkney.com/img/wRKk4YmlkF-1024.jpeg 1024w, https://justinpinkney.com/img/wRKk4YmlkF-1600.jpeg 1600w, https://justinpinkney.com/img/wRKk4YmlkF-2754.jpeg 2754w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/wRKk4YmlkF-200.jpeg&quot; width=&quot;2754&quot; height=&quot;1069&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/u2MxPFm26L-200.webp 200w, https://justinpinkney.com/img/u2MxPFm26L-320.webp 320w, https://justinpinkney.com/img/u2MxPFm26L-500.webp 500w, https://justinpinkney.com/img/u2MxPFm26L-800.webp 800w, https://justinpinkney.com/img/u2MxPFm26L-1024.webp 1024w, https://justinpinkney.com/img/u2MxPFm26L-1600.webp 1600w, https://justinpinkney.com/img/u2MxPFm26L-2730.webp 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/u2MxPFm26L-200.jpeg 200w, https://justinpinkney.com/img/u2MxPFm26L-320.jpeg 320w, https://justinpinkney.com/img/u2MxPFm26L-500.jpeg 500w, https://justinpinkney.com/img/u2MxPFm26L-800.jpeg 800w, https://justinpinkney.com/img/u2MxPFm26L-1024.jpeg 1024w, https://justinpinkney.com/img/u2MxPFm26L-1600.jpeg 1600w, https://justinpinkney.com/img/u2MxPFm26L-2730.jpeg 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/u2MxPFm26L-200.jpeg&quot; width=&quot;2730&quot; height=&quot;1083&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-3.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/lbMz8jjInT-200.webp 200w, https://justinpinkney.com/img/lbMz8jjInT-320.webp 320w, https://justinpinkney.com/img/lbMz8jjInT-500.webp 500w, https://justinpinkney.com/img/lbMz8jjInT-800.webp 800w, https://justinpinkney.com/img/lbMz8jjInT-1024.webp 1024w, https://justinpinkney.com/img/lbMz8jjInT-1600.webp 1600w, https://justinpinkney.com/img/lbMz8jjInT-2752.webp 2752w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/lbMz8jjInT-200.jpeg 200w, https://justinpinkney.com/img/lbMz8jjInT-320.jpeg 320w, https://justinpinkney.com/img/lbMz8jjInT-500.jpeg 500w, https://justinpinkney.com/img/lbMz8jjInT-800.jpeg 800w, https://justinpinkney.com/img/lbMz8jjInT-1024.jpeg 1024w, https://justinpinkney.com/img/lbMz8jjInT-1600.jpeg 1600w, https://justinpinkney.com/img/lbMz8jjInT-2752.jpeg 2752w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/lbMz8jjInT-200.jpeg&quot; width=&quot;2752&quot; height=&quot;1069&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-5.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Fs4ayU2ayA-200.webp 200w, https://justinpinkney.com/img/Fs4ayU2ayA-320.webp 320w, https://justinpinkney.com/img/Fs4ayU2ayA-500.webp 500w, https://justinpinkney.com/img/Fs4ayU2ayA-800.webp 800w, https://justinpinkney.com/img/Fs4ayU2ayA-1024.webp 1024w, https://justinpinkney.com/img/Fs4ayU2ayA-1600.webp 1600w, https://justinpinkney.com/img/Fs4ayU2ayA-2741.webp 2741w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Fs4ayU2ayA-200.jpeg 200w, https://justinpinkney.com/img/Fs4ayU2ayA-320.jpeg 320w, https://justinpinkney.com/img/Fs4ayU2ayA-500.jpeg 500w, https://justinpinkney.com/img/Fs4ayU2ayA-800.jpeg 800w, https://justinpinkney.com/img/Fs4ayU2ayA-1024.jpeg 1024w, https://justinpinkney.com/img/Fs4ayU2ayA-1600.jpeg 1600w, https://justinpinkney.com/img/Fs4ayU2ayA-2741.jpeg 2741w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Fs4ayU2ayA-200.jpeg&quot; width=&quot;2741&quot; height=&quot;1139&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-7.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/_2WOeZ2d8S-200.webp 200w, https://justinpinkney.com/img/_2WOeZ2d8S-320.webp 320w, https://justinpinkney.com/img/_2WOeZ2d8S-500.webp 500w, https://justinpinkney.com/img/_2WOeZ2d8S-800.webp 800w, https://justinpinkney.com/img/_2WOeZ2d8S-1024.webp 1024w, https://justinpinkney.com/img/_2WOeZ2d8S-1600.webp 1600w, https://justinpinkney.com/img/_2WOeZ2d8S-2731.webp 2731w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/_2WOeZ2d8S-200.jpeg 200w, https://justinpinkney.com/img/_2WOeZ2d8S-320.jpeg 320w, https://justinpinkney.com/img/_2WOeZ2d8S-500.jpeg 500w, https://justinpinkney.com/img/_2WOeZ2d8S-800.jpeg 800w, https://justinpinkney.com/img/_2WOeZ2d8S-1024.jpeg 1024w, https://justinpinkney.com/img/_2WOeZ2d8S-1600.jpeg 1600w, https://justinpinkney.com/img/_2WOeZ2d8S-2731.jpeg 2731w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/_2WOeZ2d8S-200.jpeg&quot; width=&quot;2731&quot; height=&quot;1053&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-8.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/m1KnvihaTN-200.webp 200w, https://justinpinkney.com/img/m1KnvihaTN-320.webp 320w, https://justinpinkney.com/img/m1KnvihaTN-500.webp 500w, https://justinpinkney.com/img/m1KnvihaTN-800.webp 800w, https://justinpinkney.com/img/m1KnvihaTN-1024.webp 1024w, https://justinpinkney.com/img/m1KnvihaTN-1600.webp 1600w, https://justinpinkney.com/img/m1KnvihaTN-2762.webp 2762w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/m1KnvihaTN-200.jpeg 200w, https://justinpinkney.com/img/m1KnvihaTN-320.jpeg 320w, https://justinpinkney.com/img/m1KnvihaTN-500.jpeg 500w, https://justinpinkney.com/img/m1KnvihaTN-800.jpeg 800w, https://justinpinkney.com/img/m1KnvihaTN-1024.jpeg 1024w, https://justinpinkney.com/img/m1KnvihaTN-1600.jpeg 1600w, https://justinpinkney.com/img/m1KnvihaTN-2762.jpeg 2762w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/m1KnvihaTN-200.jpeg&quot; width=&quot;2762&quot; height=&quot;1251&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/image-mixer-diffusion/im_mixer-9.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/oJDLC40kEu-200.webp 200w, https://justinpinkney.com/img/oJDLC40kEu-320.webp 320w, https://justinpinkney.com/img/oJDLC40kEu-500.webp 500w, https://justinpinkney.com/img/oJDLC40kEu-800.webp 800w, https://justinpinkney.com/img/oJDLC40kEu-1024.webp 1024w, https://justinpinkney.com/img/oJDLC40kEu-1600.webp 1600w, https://justinpinkney.com/img/oJDLC40kEu-2699.webp 2699w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/oJDLC40kEu-200.jpeg 200w, https://justinpinkney.com/img/oJDLC40kEu-320.jpeg 320w, https://justinpinkney.com/img/oJDLC40kEu-500.jpeg 500w, https://justinpinkney.com/img/oJDLC40kEu-800.jpeg 800w, https://justinpinkney.com/img/oJDLC40kEu-1024.jpeg 1024w, https://justinpinkney.com/img/oJDLC40kEu-1600.jpeg 1600w, https://justinpinkney.com/img/oJDLC40kEu-2699.jpeg 2699w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/oJDLC40kEu-200.jpeg&quot; width=&quot;2699&quot; height=&quot;1051&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Face Mixer Diffusion</title>
		<link href="https://justinpinkney.com/blog/2024/face-mixer-diffusion/"/>
		<updated>2024-01-19T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2024/face-mixer-diffusion/</id>
		<content type="html">&lt;p&gt;I&#39;m still trying to catch up on writing up some of the &amp;quot;old&amp;quot; diffusion experiments I did and models I made during 2022. This is another quick write up of a model designed to generate new images of a person&#39;s face given a single image of that face with no further fine tuning. There are now better models available for this purpose, in particular take a look at &lt;a href=&quot;https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter_sdxl_plus-face_demo.ipynb&quot;&gt;IPAdapter face&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/2312.04461&quot;&gt;PhotoMaker&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/face-id2.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/fBgktjFPoG-200.webp 200w, https://justinpinkney.com/img/fBgktjFPoG-320.webp 320w, https://justinpinkney.com/img/fBgktjFPoG-500.webp 500w, https://justinpinkney.com/img/fBgktjFPoG-800.webp 800w, https://justinpinkney.com/img/fBgktjFPoG-1024.webp 1024w, https://justinpinkney.com/img/fBgktjFPoG-1600.webp 1600w, https://justinpinkney.com/img/fBgktjFPoG-1845.webp 1845w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/fBgktjFPoG-200.jpeg 200w, https://justinpinkney.com/img/fBgktjFPoG-320.jpeg 320w, https://justinpinkney.com/img/fBgktjFPoG-500.jpeg 500w, https://justinpinkney.com/img/fBgktjFPoG-800.jpeg 800w, https://justinpinkney.com/img/fBgktjFPoG-1024.jpeg 1024w, https://justinpinkney.com/img/fBgktjFPoG-1600.jpeg 1600w, https://justinpinkney.com/img/fBgktjFPoG-1845.jpeg 1845w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;todo&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/fBgktjFPoG-200.jpeg&quot; width=&quot;1845&quot; height=&quot;1119&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Mixing faces: taking identity from the left images and &#39;other stuff&#39; from the top images, make the fake faces in the middle.
&lt;/div&gt;
&lt;h2 id=&quot;feed-forward-dreambooth&quot; tabindex=&quot;-1&quot;&gt;Feed forward Dreambooth? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Shortly after Stable Diffusion was released &lt;a href=&quot;https://arxiv.org/abs/2208.12242&quot;&gt;Dreambooth&lt;/a&gt; showed how to fine tune such models for particular styles subjects or people, &lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify&quot;&gt;I&#39;ve known for a while&lt;/a&gt; how much people are obsessed with their own faces, and for a while Dreambooth led to a blossoming of short lived apps and websites for generating images with your own face in (who knows maybe they still exist, people seem to endlessly love their own face).&lt;/p&gt;
&lt;p&gt;Dreambooth is somewhat cumbersome as a technique in that it involves multiple example images and training the diffusion model itself. I had fully been expecting someone to come up with a model which could do the same task but without the need for subject specific training, but it seems like that&#39;s been a somewhat challenging problem, the first hints of models which can do this (although still not with the level of likeness that Dreambooth could achieve) are only just coming out now.&lt;/p&gt;
&lt;p&gt;Of course I had a go myself. The aim being to make a model which you could give a single image of a face and for it to make new images of that person. I took an approach that skipped out the text prompt all together, fine-tuning Stable Diffusion to take an image of a person, crop out the face region, and used a pretrained model to make tha into a vector to use as the condition for generating a new image.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/schematic-face-orig.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/gIAgRX8Ac8-200.webp 200w, https://justinpinkney.com/img/gIAgRX8Ac8-320.webp 320w, https://justinpinkney.com/img/gIAgRX8Ac8-500.webp 500w, https://justinpinkney.com/img/gIAgRX8Ac8-800.webp 800w, https://justinpinkney.com/img/gIAgRX8Ac8-1024.webp 1024w, https://justinpinkney.com/img/gIAgRX8Ac8-1600.webp 1600w, https://justinpinkney.com/img/gIAgRX8Ac8-2147.webp 2147w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/gIAgRX8Ac8-200.jpeg 200w, https://justinpinkney.com/img/gIAgRX8Ac8-320.jpeg 320w, https://justinpinkney.com/img/gIAgRX8Ac8-500.jpeg 500w, https://justinpinkney.com/img/gIAgRX8Ac8-800.jpeg 800w, https://justinpinkney.com/img/gIAgRX8Ac8-1024.jpeg 1024w, https://justinpinkney.com/img/gIAgRX8Ac8-1600.jpeg 1600w, https://justinpinkney.com/img/gIAgRX8Ac8-2147.jpeg 2147w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;schematic of face conditioned diffusion&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/gIAgRX8Ac8-200.jpeg&quot; width=&quot;2147&quot; height=&quot;824&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Schematic of the training setup for face conditioned diffusion. This version just extract a single embedding for the face region using CLIP, in some variations I tried pretrained face recognition networks.
&lt;/div&gt;
&lt;p&gt;I tried a few variations of the concept, not all of which I can remember honestly, but all were trained on datatsets of aligned faces, where I could easily extract a central crop which contained only the face and the rest of the image contained all the &amp;quot;other stuff&amp;quot; (hair, background, accessories, hats, etc.). For all the models I took the central crop and then ran it through an image encoder, either CLIP or a pretrained face recognition network, both would give a vector representation of the face. This can then be passed to the cross attention layers instead of the text embeddings as in the original model. After training like this for a while you can pass a new image of a face, crop and extract the appropriate embedding then use this to generate a new image of (hopefully) the same face.&lt;/p&gt;
&lt;h2 id=&quot;results-making-faces&quot; tabindex=&quot;-1&quot;&gt;Results - Making faces &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;My first model was trained on &lt;a href=&quot;https://github.com/NVlabs/ffhq-dataset&quot;&gt;FFHQ&lt;/a&gt; using an open source face recognition network as the face ID embedding. After training the samples looked promising when I took random faces from FFHQ and asked the network to create new images of the same person.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/montage-1-train.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/bLbEEZffpe-200.webp 200w, https://justinpinkney.com/img/bLbEEZffpe-320.webp 320w, https://justinpinkney.com/img/bLbEEZffpe-500.webp 500w, https://justinpinkney.com/img/bLbEEZffpe-800.webp 800w, https://justinpinkney.com/img/bLbEEZffpe-1024.webp 1024w, https://justinpinkney.com/img/bLbEEZffpe-1600.webp 1600w, https://justinpinkney.com/img/bLbEEZffpe-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/bLbEEZffpe-200.jpeg 200w, https://justinpinkney.com/img/bLbEEZffpe-320.jpeg 320w, https://justinpinkney.com/img/bLbEEZffpe-500.jpeg 500w, https://justinpinkney.com/img/bLbEEZffpe-800.jpeg 800w, https://justinpinkney.com/img/bLbEEZffpe-1024.jpeg 1024w, https://justinpinkney.com/img/bLbEEZffpe-1600.jpeg 1600w, https://justinpinkney.com/img/bLbEEZffpe-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;grid of face images generated by a diffusion model&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/bLbEEZffpe-200.jpeg&quot; width=&quot;2560&quot; height=&quot;3255&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Left column are all real faces from the FFHQ dataset, and remaining columns are samples generated from the Face ID condition diffusion model given the face ID from the images on the left. These do (sort of) look like new images of the person on the left.
&lt;/div&gt;
&lt;p&gt;Unfortunately, when I tested in on validation images (faces that the network hadn&#39;t seen during training), things weren&#39;t so good. The model generated a set of consistant faces of people, but not of the input image. I guess this network had overfit on the original training dataset, FFHQ only has around 70k images, which is tiny by modern standards, I had tried to apply some image augmenation to the faces before passing them to the face recognition network but it seems like this wasn&#39;t enough. It gets some features right, there is a clear focus of the ID network on shapes of noses, but these samples don&#39;t really resemble the original people at all.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/montage-1-val.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/XZp10STSCL-200.webp 200w, https://justinpinkney.com/img/XZp10STSCL-320.webp 320w, https://justinpinkney.com/img/XZp10STSCL-500.webp 500w, https://justinpinkney.com/img/XZp10STSCL-800.webp 800w, https://justinpinkney.com/img/XZp10STSCL-1024.webp 1024w, https://justinpinkney.com/img/XZp10STSCL-1600.webp 1600w, https://justinpinkney.com/img/XZp10STSCL-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/XZp10STSCL-200.jpeg 200w, https://justinpinkney.com/img/XZp10STSCL-320.jpeg 320w, https://justinpinkney.com/img/XZp10STSCL-500.jpeg 500w, https://justinpinkney.com/img/XZp10STSCL-800.jpeg 800w, https://justinpinkney.com/img/XZp10STSCL-1024.jpeg 1024w, https://justinpinkney.com/img/XZp10STSCL-1600.jpeg 1600w, https://justinpinkney.com/img/XZp10STSCL-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;grid of face images generated by a diffusion model&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/XZp10STSCL-200.jpeg&quot; width=&quot;2560&quot; height=&quot;3767&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Generated samples using faces from CelebA-HQ datasets, these haven&#39;t been seen by the model during training. There are some elements of similarity (if you squint) but the model has clearly failed to generalise.
&lt;/div&gt;
&lt;h2 id=&quot;back-to-clip&quot; tabindex=&quot;-1&quot;&gt;Back to CLIP &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As an alterative face encoding I went back to trusty old CLIP image embeddings which are clearly very &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;well proved out&lt;/a&gt; when it comes to conditioning a diffusion model. I followed the same procedure as above, but used the LAION CLIP-L model to extract the embedding vector correspdonding to the face region. (I might also have resumed from my existing image variations model, but I can&#39;t remember).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/montage-2-train.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/GfYScwjCEy-200.webp 200w, https://justinpinkney.com/img/GfYScwjCEy-320.webp 320w, https://justinpinkney.com/img/GfYScwjCEy-500.webp 500w, https://justinpinkney.com/img/GfYScwjCEy-800.webp 800w, https://justinpinkney.com/img/GfYScwjCEy-1024.webp 1024w, https://justinpinkney.com/img/GfYScwjCEy-1600.webp 1600w, https://justinpinkney.com/img/GfYScwjCEy-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/GfYScwjCEy-200.jpeg 200w, https://justinpinkney.com/img/GfYScwjCEy-320.jpeg 320w, https://justinpinkney.com/img/GfYScwjCEy-500.jpeg 500w, https://justinpinkney.com/img/GfYScwjCEy-800.jpeg 800w, https://justinpinkney.com/img/GfYScwjCEy-1024.jpeg 1024w, https://justinpinkney.com/img/GfYScwjCEy-1600.jpeg 1600w, https://justinpinkney.com/img/GfYScwjCEy-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;grid of face images generated by a diffusion model&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/GfYScwjCEy-200.jpeg&quot; width=&quot;2560&quot; height=&quot;3255&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Generated faces from a diffusion model conditioned on the CLIP image embedding for faces from the FFHQ datset (left).
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/montage-2-val.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/6efUntQ6b3-200.webp 200w, https://justinpinkney.com/img/6efUntQ6b3-320.webp 320w, https://justinpinkney.com/img/6efUntQ6b3-500.webp 500w, https://justinpinkney.com/img/6efUntQ6b3-800.webp 800w, https://justinpinkney.com/img/6efUntQ6b3-1024.webp 1024w, https://justinpinkney.com/img/6efUntQ6b3-1600.webp 1600w, https://justinpinkney.com/img/6efUntQ6b3-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/6efUntQ6b3-200.jpeg 200w, https://justinpinkney.com/img/6efUntQ6b3-320.jpeg 320w, https://justinpinkney.com/img/6efUntQ6b3-500.jpeg 500w, https://justinpinkney.com/img/6efUntQ6b3-800.jpeg 800w, https://justinpinkney.com/img/6efUntQ6b3-1024.jpeg 1024w, https://justinpinkney.com/img/6efUntQ6b3-1600.jpeg 1600w, https://justinpinkney.com/img/6efUntQ6b3-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;grid of face images generated by a diffusion model&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/6efUntQ6b3-200.jpeg&quot; width=&quot;2560&quot; height=&quot;3767&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Generated faces from a diffusion model conditioned on the CLIP image embedding for faces from the CelebA-HQ datset (left).
&lt;/div&gt;
&lt;h2 id=&quot;trailer-faces-hq-and-retrieval-training&quot; tabindex=&quot;-1&quot;&gt;Trailer Faces HQ and retrieval training &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Around the same time I made a large dataset along the lines of FFHQ, but with data scraped from movie trailers called &lt;a href=&quot;https://huggingface.co/datasets/justinpinkney/trailer-faces-hq&quot;&gt;Trailer Faces HQ (TFHQ)&lt;/a&gt; (I&#39;ll blog about that sometime too). As well as being a lot bigger than FFHQ, it also had a lot of repeated faces, both from different shots of the same trailer, as well as the same actor in different movies. This meant that if I could find the corresponding sets of images I could train my model with retrieval, in other words instead of just taking a crop of the face and trying to augment it with some image transforms, I could use a different image of the same person.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/retreival.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/z9ij5PMA4p-200.webp 200w, https://justinpinkney.com/img/z9ij5PMA4p-320.webp 320w, https://justinpinkney.com/img/z9ij5PMA4p-500.webp 500w, https://justinpinkney.com/img/z9ij5PMA4p-800.webp 800w, https://justinpinkney.com/img/z9ij5PMA4p-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/z9ij5PMA4p-200.jpeg 200w, https://justinpinkney.com/img/z9ij5PMA4p-320.jpeg 320w, https://justinpinkney.com/img/z9ij5PMA4p-500.jpeg 500w, https://justinpinkney.com/img/z9ij5PMA4p-800.jpeg 800w, https://justinpinkney.com/img/z9ij5PMA4p-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;todo&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/z9ij5PMA4p-200.jpeg&quot; width=&quot;1024&quot; height=&quot;384&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
An example input image on the left, and retreived images of the same actor on the right. I used a pretrained face recognition network to find similar faces, and also use CLIP to avoid including images which are too similar (e.g. from adjacent frames in a video).
&lt;/div&gt;
&lt;h2 id=&quot;controlling-face-and-background&quot; tabindex=&quot;-1&quot;&gt;Controlling face and background &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Making new images of a person is nice, but at this point I&#39;ve stripped out all the text control, so the face embedding is the only way to control this model&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. I wish I&#39;d done some more experiments like embedding interpolation, SDEdit type image-to-image, or multi-cfg on the different embeddings, but I didn&#39;t at the time, and now I&#39;ve lost the checkpoint!&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;To try and add back some control I trained a new version with an extra embedding added corresponding to the background of the image (i.e. all the image with the face region masked out in grey). Apart from that everything is the same, and once trained if you give a single image, and extract a face and background embedding, then you can again generate new images of people&#39;s faces:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/schematic-face.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/x7expgfVGe-200.webp 200w, https://justinpinkney.com/img/x7expgfVGe-320.webp 320w, https://justinpinkney.com/img/x7expgfVGe-500.webp 500w, https://justinpinkney.com/img/x7expgfVGe-800.webp 800w, https://justinpinkney.com/img/x7expgfVGe-1024.webp 1024w, https://justinpinkney.com/img/x7expgfVGe-1600.webp 1600w, https://justinpinkney.com/img/x7expgfVGe-1851.webp 1851w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/x7expgfVGe-200.jpeg 200w, https://justinpinkney.com/img/x7expgfVGe-320.jpeg 320w, https://justinpinkney.com/img/x7expgfVGe-500.jpeg 500w, https://justinpinkney.com/img/x7expgfVGe-800.jpeg 800w, https://justinpinkney.com/img/x7expgfVGe-1024.jpeg 1024w, https://justinpinkney.com/img/x7expgfVGe-1600.jpeg 1600w, https://justinpinkney.com/img/x7expgfVGe-1851.jpeg 1851w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;todo&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/x7expgfVGe-200.jpeg&quot; width=&quot;1851&quot; height=&quot;819&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Schematic of the face conditioned diffusion model. The face region and the whole image with the face masked out are both passed through the CLIP image encoder to produce two embeddings for conditioning the diffusion model.
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/face-id1.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/fuGXdwh40g-200.webp 200w, https://justinpinkney.com/img/fuGXdwh40g-320.webp 320w, https://justinpinkney.com/img/fuGXdwh40g-500.webp 500w, https://justinpinkney.com/img/fuGXdwh40g-800.webp 800w, https://justinpinkney.com/img/fuGXdwh40g-1024.webp 1024w, https://justinpinkney.com/img/fuGXdwh40g-1600.webp 1600w, https://justinpinkney.com/img/fuGXdwh40g-1788.webp 1788w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/fuGXdwh40g-200.jpeg 200w, https://justinpinkney.com/img/fuGXdwh40g-320.jpeg 320w, https://justinpinkney.com/img/fuGXdwh40g-500.jpeg 500w, https://justinpinkney.com/img/fuGXdwh40g-800.jpeg 800w, https://justinpinkney.com/img/fuGXdwh40g-1024.jpeg 1024w, https://justinpinkney.com/img/fuGXdwh40g-1600.jpeg 1600w, https://justinpinkney.com/img/fuGXdwh40g-1788.jpeg 1788w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;todo&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/fuGXdwh40g-200.jpeg&quot; width=&quot;1788&quot; height=&quot;692&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Real images on the far left, rest are generated. When you use the same image for both the face and background you get more pictures of the same person in a similar setting.
&lt;/div&gt;
&lt;p&gt;It gets more interesting when you mix the face embedding from  one image, with the background embedding from another. In an ideal world you should get an image of the person in image 1, with the hair/lighting/hat/background etc, of image 2.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/face-id3.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/DO1hdsxif2-200.webp 200w, https://justinpinkney.com/img/DO1hdsxif2-320.webp 320w, https://justinpinkney.com/img/DO1hdsxif2-500.webp 500w, https://justinpinkney.com/img/DO1hdsxif2-800.webp 800w, https://justinpinkney.com/img/DO1hdsxif2-1024.webp 1024w, https://justinpinkney.com/img/DO1hdsxif2-1600.webp 1600w, https://justinpinkney.com/img/DO1hdsxif2-1845.webp 1845w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/DO1hdsxif2-200.jpeg 200w, https://justinpinkney.com/img/DO1hdsxif2-320.jpeg 320w, https://justinpinkney.com/img/DO1hdsxif2-500.jpeg 500w, https://justinpinkney.com/img/DO1hdsxif2-800.jpeg 800w, https://justinpinkney.com/img/DO1hdsxif2-1024.jpeg 1024w, https://justinpinkney.com/img/DO1hdsxif2-1600.jpeg 1600w, https://justinpinkney.com/img/DO1hdsxif2-1845.jpeg 1845w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;todo&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/DO1hdsxif2-200.jpeg&quot; width=&quot;1845&quot; height=&quot;1119&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Mixing faces, face embeddings taken from the images on the left, background embeddings from the images on top, generated mixed faces in the middle.
&lt;/div&gt;
&lt;p&gt;And it sort of worked! Many of the generated images do appear to be the original person, and you do pick up various features from the the &amp;quot;background&amp;quot; image. Some things change that you might not expect like gender, and other things stay fixed like expression, which is reasonable given the simple way I divided up the image for the different embeddings.&lt;/p&gt;
&lt;h2 id=&quot;where-did-i-put-that-checkpoint-again&quot; tabindex=&quot;-1&quot;&gt;Where did I put that checkpoint again? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This was all quite a long time ago, back at the end of 2022, but I&#39;ve only just got round to writing about it all. I was going to share the models too, but unfortunately I seem to have misplaced them, I really have no idea where the saved checkpoints are now, I have a so many hard drives and folders of checkpoints, but none of them seem to be the right ones. Oh well.&lt;/p&gt;
&lt;p&gt;If you really want to you could train the things yourselves. All the config files are in &lt;a href=&quot;https://github.com/justinpinkney/stable-diffusion/tree/main/configs/stable-diffusion&quot;&gt;my Github repo&lt;/a&gt;, and CelebA, FFHQ, and &lt;a href=&quot;https://huggingface.co/datasets/justinpinkney/trailer-faces-hq&quot;&gt;TFHQ&lt;/a&gt; are all things you can download.&lt;/p&gt;
&lt;p&gt;I also talked a bit about this model, along with a bunch of others, in the &lt;a href=&quot;https://youtu.be/mpMGwQa7J1w?&quot;&gt;Hugging face diffusers talk&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I liked the old StyleGAN Face mixing vibe I got with this model:&lt;/p&gt;
&lt;p&gt;https://twitter.com/Buntworthy/status/1592845516127481857&lt;/p&gt;
&lt;h2 id=&quot;some-related-work&quot; tabindex=&quot;-1&quot;&gt;Some related work &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After I &lt;a href=&quot;https://twitter.com/Buntworthy/status/1749179500389109782&quot;&gt;tweeted about this&lt;/a&gt; blog post a couple of people pointed out some similar work. &lt;a href=&quot;https://boleizhou.github.io/&quot;&gt;Bolei Zhou&lt;/a&gt; mentioned the great &lt;a href=&quot;https://genforce.github.io/idinvert/&quot;&gt;IDinvert&lt;/a&gt; from back in the good old days when GANs ruled the image generation roost, and &lt;a href=&quot;https://twitter.com/hagsaeng_bag&quot;&gt;Yong-Hyun Park&lt;/a&gt; pointed me the way of &lt;a href=&quot;https://openaccess.thecvf.com/content/WACV2024/html/Jeong_Training-Free_Content_Injection_Using_H-Space_in_Diffusion_Models_WACV_2024_paper.html&quot;&gt;this recent interesting paper&lt;/a&gt; on content injection using the &amp;quot;h-space&amp;quot; of Unet based diffusion models.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/idinvert.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/yCwgP8YYrQ-200.webp 200w, https://justinpinkney.com/img/yCwgP8YYrQ-320.webp 320w, https://justinpinkney.com/img/yCwgP8YYrQ-500.webp 500w, https://justinpinkney.com/img/yCwgP8YYrQ-800.webp 800w, https://justinpinkney.com/img/yCwgP8YYrQ-1024.webp 1024w, https://justinpinkney.com/img/yCwgP8YYrQ-1256.webp 1256w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/yCwgP8YYrQ-200.jpeg 200w, https://justinpinkney.com/img/yCwgP8YYrQ-320.jpeg 320w, https://justinpinkney.com/img/yCwgP8YYrQ-500.jpeg 500w, https://justinpinkney.com/img/yCwgP8YYrQ-800.jpeg 800w, https://justinpinkney.com/img/yCwgP8YYrQ-1024.jpeg 1024w, https://justinpinkney.com/img/yCwgP8YYrQ-1256.jpeg 1256w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;figure from idinvert paper showing GAN based face mixing&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/yCwgP8YYrQ-200.jpeg&quot; width=&quot;1256&quot; height=&quot;742&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Figure from the paper &quot;In-Domain GAN Inversion for Real Image Editing&quot;, amazingly similar to my examples above!
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/face-mixer-diffusion/h-space.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YtUYmsQuap-200.webp 200w, https://justinpinkney.com/img/YtUYmsQuap-320.webp 320w, https://justinpinkney.com/img/YtUYmsQuap-500.webp 500w, https://justinpinkney.com/img/YtUYmsQuap-800.webp 800w, https://justinpinkney.com/img/YtUYmsQuap-1024.webp 1024w, https://justinpinkney.com/img/YtUYmsQuap-1162.webp 1162w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YtUYmsQuap-200.jpeg 200w, https://justinpinkney.com/img/YtUYmsQuap-320.jpeg 320w, https://justinpinkney.com/img/YtUYmsQuap-500.jpeg 500w, https://justinpinkney.com/img/YtUYmsQuap-800.jpeg 800w, https://justinpinkney.com/img/YtUYmsQuap-1024.jpeg 1024w, https://justinpinkney.com/img/YtUYmsQuap-1162.jpeg 1162w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;figure from idinvert paper showing h-space based face mixing&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YtUYmsQuap-200.jpeg&quot; width=&quot;1162&quot; height=&quot;492&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Figure from the paper &quot;Training-free Content Injection using h-space in Diffusion Models&quot;. The approach involves some interesting injection of activations into the bottleneck of a pre-trained diffusion model unet.
&lt;/div&gt;</content>
	</entry>
	
	<entry>
		<title>Würstchen v2 Pokémon</title>
		<link href="https://justinpinkney.com/blog/2024/w2-pokemon/"/>
		<updated>2024-01-07T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2024/w2-pokemon/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/pokemon-004750.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/OocIwyFVsr-200.webp 200w, https://justinpinkney.com/img/OocIwyFVsr-320.webp 320w, https://justinpinkney.com/img/OocIwyFVsr-500.webp 500w, https://justinpinkney.com/img/OocIwyFVsr-800.webp 800w, https://justinpinkney.com/img/OocIwyFVsr-1024.webp 1024w, https://justinpinkney.com/img/OocIwyFVsr-1600.webp 1600w, https://justinpinkney.com/img/OocIwyFVsr-4096.webp 4096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/OocIwyFVsr-200.jpeg 200w, https://justinpinkney.com/img/OocIwyFVsr-320.jpeg 320w, https://justinpinkney.com/img/OocIwyFVsr-500.jpeg 500w, https://justinpinkney.com/img/OocIwyFVsr-800.jpeg 800w, https://justinpinkney.com/img/OocIwyFVsr-1024.jpeg 1024w, https://justinpinkney.com/img/OocIwyFVsr-1600.jpeg 1600w, https://justinpinkney.com/img/OocIwyFVsr-4096.jpeg 4096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/OocIwyFVsr-200.jpeg&quot; width=&quot;4096&quot; height=&quot;2048&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Outputs of the Pokemon fine-tuned Würstchen v2 model for prompts: &quot;Hello Kitty&quot;, &quot;Old man looking at the moon&quot;, &quot;Donald Trump&quot;, adn &quot;Yoda&quot;. Top row EMA weights bottom row normal.
&lt;/div&gt;
&lt;p&gt;The second half of 2023 saw a ton of image generation models come out (including our own Midjourney v6 😊), some open source some not and most interesting in their own way.&lt;/p&gt;
&lt;p&gt;One particularly neat little model was Würstchen v2. Würstchen v2 makes the interesting choice of learning a &lt;a href=&quot;https://arxiv.org/abs/2112.10752&quot;&gt;Latent Diffusion Model&lt;/a&gt; which is conditional not on text embeddings like Stable Diffusion but on spatial image features extracted by and EfficientNet model, this acts like a very compressed latent representation of the original image. Finally a second diffusion model is trained to generate these spatial image features conditional on text embeddings. Essentially this is training a Stable Diffusion like text to image model but using a super compressed (42x) latent space rather than the typical 8x compression for Latent Diffusion models.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/w2-arch.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Omj5_xPWbO-200.webp 200w, https://justinpinkney.com/img/Omj5_xPWbO-320.webp 320w, https://justinpinkney.com/img/Omj5_xPWbO-500.webp 500w, https://justinpinkney.com/img/Omj5_xPWbO-800.webp 800w, https://justinpinkney.com/img/Omj5_xPWbO-1024.webp 1024w, https://justinpinkney.com/img/Omj5_xPWbO-1436.webp 1436w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/Omj5_xPWbO-200.png 200w, https://justinpinkney.com/img/Omj5_xPWbO-320.png 320w, https://justinpinkney.com/img/Omj5_xPWbO-500.png 500w, https://justinpinkney.com/img/Omj5_xPWbO-800.png 800w, https://justinpinkney.com/img/Omj5_xPWbO-1024.png 1024w, https://justinpinkney.com/img/Omj5_xPWbO-1436.png 1436w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Omj5_xPWbO-200.png&quot; width=&quot;1436&quot; height=&quot;566&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Architecture of the Würstchen family of models, figure from &quot;Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models&quot;
&lt;/div&gt;
&lt;p&gt;The full pipeline ends up being 3 models: Stage C, a text to very compressed latent image diffusion model; Stage B, a very compressed to less compressed latent image diffusion model; Stage A, the typical latent decoder. With the advantage that Stage B and C can remain fixed for fine tuning, and only Stage C needs to be trained, and given the small spatial dimensions it operates on it can be done so quite efficiently.&lt;/p&gt;
&lt;p&gt;To try it out I tested fine tuning with my dataset of cpationed&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. My original dataset was captioned using BLIP, which is a quite out of date captioning model by today&#39;s standards. If you&#39;re after more detailed captions &lt;a href=&quot;https://sayak.dev/&quot;&gt;Sayak Paul&lt;/a&gt; kindly shared &lt;a href=&quot;https://huggingface.co/datasets/diffusers/pokemon-gpt4-captions&quot;&gt;a version captioned with GPT4&lt;/a&gt;&lt;/span&gt;
&lt;/span&gt; Pokemon images (the same one I used to make &lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator&quot;&gt;text-to-pokemon&lt;/a&gt;). After a few tweaks to the &lt;a href=&quot;https://github.com/justinpinkney/Wuerstchen/blob/main/train_stage_C.py&quot;&gt;training script&lt;/a&gt; which didn&#39;t seem to be set up for thev v2 model at the time (I originally did this around September 2023) &lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-2&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;2. Since then it looks like there is a &lt;a href=&quot;https://huggingface.co/docs/diffusers/training/wuerstchen&quot;&gt;training script&lt;/a&gt; for the Diffusers version of the model. I haven&#39;t used it myself but it&#39;s probably an easier place to get started yourself. The example even uses the same dataset but I didn&#39;t find any example outputs.&lt;/span&gt;
&lt;/span&gt;, I could run a quick fine-tune and start making Pokemon out of text again. After about 5000 training steps the model behaved in a pretty similar way to my original Stable Diffusion version. You can put in names unrelated to Pokemon and get a &amp;quot;Pokemon-ified&amp;quot; version out.&lt;/p&gt;
&lt;p&gt;Here are some more example outputs of &amp;quot;Mario&amp;quot;, &amp;quot;Girl with a Pearl Earring&amp;quot;, &amp;quot;Boris Johnson&amp;quot;, and &amp;quot;Ramen&amp;quot;. You can compare some of these to the example outputs in my previous &lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator&quot;&gt;Stable Diffusion based model&lt;/a&gt; which overall worked marginally better I think.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/mario.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/rUso6-68Ym-200.webp 200w, https://justinpinkney.com/img/rUso6-68Ym-320.webp 320w, https://justinpinkney.com/img/rUso6-68Ym-500.webp 500w, https://justinpinkney.com/img/rUso6-68Ym-800.webp 800w, https://justinpinkney.com/img/rUso6-68Ym-1024.webp 1024w, https://justinpinkney.com/img/rUso6-68Ym-1600.webp 1600w, https://justinpinkney.com/img/rUso6-68Ym-1770.webp 1770w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/rUso6-68Ym-200.jpeg 200w, https://justinpinkney.com/img/rUso6-68Ym-320.jpeg 320w, https://justinpinkney.com/img/rUso6-68Ym-500.jpeg 500w, https://justinpinkney.com/img/rUso6-68Ym-800.jpeg 800w, https://justinpinkney.com/img/rUso6-68Ym-1024.jpeg 1024w, https://justinpinkney.com/img/rUso6-68Ym-1600.jpeg 1600w, https://justinpinkney.com/img/rUso6-68Ym-1770.jpeg 1770w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/rUso6-68Ym-200.jpeg&quot; width=&quot;1770&quot; height=&quot;888&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/pearl.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/LHnerqkzna-200.webp 200w, https://justinpinkney.com/img/LHnerqkzna-320.webp 320w, https://justinpinkney.com/img/LHnerqkzna-500.webp 500w, https://justinpinkney.com/img/LHnerqkzna-800.webp 800w, https://justinpinkney.com/img/LHnerqkzna-1024.webp 1024w, https://justinpinkney.com/img/LHnerqkzna-1600.webp 1600w, https://justinpinkney.com/img/LHnerqkzna-1762.webp 1762w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/LHnerqkzna-200.jpeg 200w, https://justinpinkney.com/img/LHnerqkzna-320.jpeg 320w, https://justinpinkney.com/img/LHnerqkzna-500.jpeg 500w, https://justinpinkney.com/img/LHnerqkzna-800.jpeg 800w, https://justinpinkney.com/img/LHnerqkzna-1024.jpeg 1024w, https://justinpinkney.com/img/LHnerqkzna-1600.jpeg 1600w, https://justinpinkney.com/img/LHnerqkzna-1762.jpeg 1762w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/LHnerqkzna-200.jpeg&quot; width=&quot;1762&quot; height=&quot;886&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/boris.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/3asOd2jwPl-200.webp 200w, https://justinpinkney.com/img/3asOd2jwPl-320.webp 320w, https://justinpinkney.com/img/3asOd2jwPl-500.webp 500w, https://justinpinkney.com/img/3asOd2jwPl-800.webp 800w, https://justinpinkney.com/img/3asOd2jwPl-1024.webp 1024w, https://justinpinkney.com/img/3asOd2jwPl-1600.webp 1600w, https://justinpinkney.com/img/3asOd2jwPl-1784.webp 1784w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/3asOd2jwPl-200.jpeg 200w, https://justinpinkney.com/img/3asOd2jwPl-320.jpeg 320w, https://justinpinkney.com/img/3asOd2jwPl-500.jpeg 500w, https://justinpinkney.com/img/3asOd2jwPl-800.jpeg 800w, https://justinpinkney.com/img/3asOd2jwPl-1024.jpeg 1024w, https://justinpinkney.com/img/3asOd2jwPl-1600.jpeg 1600w, https://justinpinkney.com/img/3asOd2jwPl-1784.jpeg 1784w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/3asOd2jwPl-200.jpeg&quot; width=&quot;1784&quot; height=&quot;888&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2024/w2-pokemon/ramen.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/PVrEcUA95A-200.webp 200w, https://justinpinkney.com/img/PVrEcUA95A-320.webp 320w, https://justinpinkney.com/img/PVrEcUA95A-500.webp 500w, https://justinpinkney.com/img/PVrEcUA95A-800.webp 800w, https://justinpinkney.com/img/PVrEcUA95A-1024.webp 1024w, https://justinpinkney.com/img/PVrEcUA95A-1600.webp 1600w, https://justinpinkney.com/img/PVrEcUA95A-1776.webp 1776w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/PVrEcUA95A-200.jpeg 200w, https://justinpinkney.com/img/PVrEcUA95A-320.jpeg 320w, https://justinpinkney.com/img/PVrEcUA95A-500.jpeg 500w, https://justinpinkney.com/img/PVrEcUA95A-800.jpeg 800w, https://justinpinkney.com/img/PVrEcUA95A-1024.jpeg 1024w, https://justinpinkney.com/img/PVrEcUA95A-1600.jpeg 1600w, https://justinpinkney.com/img/PVrEcUA95A-1776.jpeg 1776w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/PVrEcUA95A-200.jpeg&quot; width=&quot;1776&quot; height=&quot;872&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Side notes and a Gallery</title>
		<link href="https://justinpinkney.com/blog/2023/tending-the-site/"/>
		<updated>2023-12-11T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/tending-the-site/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/tending-the-site/crt_gardener.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/oc7VBP9cGU-200.webp 200w, https://justinpinkney.com/img/oc7VBP9cGU-320.webp 320w, https://justinpinkney.com/img/oc7VBP9cGU-500.webp 500w, https://justinpinkney.com/img/oc7VBP9cGU-800.webp 800w, https://justinpinkney.com/img/oc7VBP9cGU-1024.webp 1024w, https://justinpinkney.com/img/oc7VBP9cGU-1456.webp 1456w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/oc7VBP9cGU-200.jpeg 200w, https://justinpinkney.com/img/oc7VBP9cGU-320.jpeg 320w, https://justinpinkney.com/img/oc7VBP9cGU-500.jpeg 500w, https://justinpinkney.com/img/oc7VBP9cGU-800.jpeg 800w, https://justinpinkney.com/img/oc7VBP9cGU-1024.jpeg 1024w, https://justinpinkney.com/img/oc7VBP9cGU-1456.jpeg 1456w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;A man tending his garden of crt&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/oc7VBP9cGU-200.jpeg&quot; width=&quot;1456&quot; height=&quot;816&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;sidenotes&quot; tabindex=&quot;-1&quot;&gt;Sidenotes &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/tending-the-site/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I just added side notes to my blog, thanks to my move to &lt;a href=&quot;https://www.11ty.dev/&quot;&gt;11ty&lt;/a&gt; (from the bloated mess that was Gatsby) and the wonderful sites and articles people have shared on the small web (as well as the power of &lt;code&gt;view source&lt;/code&gt;) it&#39;s been easy and a pleasure!&lt;/p&gt;
&lt;p&gt;I always want to add margin/side/foot notes to my posts, it&#39;s probably a sign of a bad excessively detail oriented writing style. Because scrolling up and down is bad, as is hovering, side notes are probably the best solution, the main tricky bit is figuring out what to do on mobile.&lt;/p&gt;
&lt;p&gt;Here&#39;s some nice sites I looked at for inspiration.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://every-layout.dev/layouts/sidebar/&quot;&gt;every layout was useful for making a sidebar&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://gwern.net/sidenote#tufte-css&quot;&gt;useful review of methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://danilafe.com/blog/sidenotes/&quot;&gt;a nice implemention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://omar.website/posts/against-recognition/&quot;&gt;a nice example of side notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the end I went for a tap to hide/show the side notes on mobile/narrow screens. And a little visual wiggle on hover on desktop to make clear what the side note number corresponds to.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2023/tending-the-site/sidenote.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;gallery-page&quot; tabindex=&quot;-1&quot;&gt;Gallery page &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/tending-the-site/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I also added a little Gallery page of all the images on my site, given this is a pretty image focussed blog it&#39;s nice to have them all in one place.&lt;/p&gt;
&lt;p&gt;A &lt;a href=&quot;https://github.com/11ty/eleventy/issues/440&quot;&gt;helpful comment&lt;/a&gt; in a GitHub issue pointed me in the right direction to find all the relevant images in my site and then iterate over them to display in a single page (which you can &lt;a href=&quot;https://justinpinkney.com/gallery&quot;&gt;see here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I still have a few outstanding things I&#39;d like to improve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Better layout of the images (hopefully still sticking to plain old css and html if possible)&lt;/li&gt;
&lt;li&gt;&lt;s&gt;Links back from the images to the blog post where they appear (this is a bit tricky as sometimes I organise the images in subdirectories)&lt;/s&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; With a little help from GitHub copilot it was actually pretty easy to add a link back to the original blog posts from the images. I find them by looking up the file tree for an &lt;code&gt;index.md&lt;/code&gt; file that indicates a post, and from that making a templated page I can link to.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/tending-the-site/gallery-item.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/77tjlOTK0Y-200.webp 200w, https://justinpinkney.com/img/77tjlOTK0Y-320.webp 320w, https://justinpinkney.com/img/77tjlOTK0Y-500.webp 500w, https://justinpinkney.com/img/77tjlOTK0Y-800.webp 800w, https://justinpinkney.com/img/77tjlOTK0Y-978.webp 978w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/77tjlOTK0Y-200.jpeg 200w, https://justinpinkney.com/img/77tjlOTK0Y-320.jpeg 320w, https://justinpinkney.com/img/77tjlOTK0Y-500.jpeg 500w, https://justinpinkney.com/img/77tjlOTK0Y-800.jpeg 800w, https://justinpinkney.com/img/77tjlOTK0Y-978.jpeg 978w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;A screenshot of the gallery item page&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/77tjlOTK0Y-200.jpeg&quot; width=&quot;978&quot; height=&quot;1280&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now the css is a bit messy and funky in various cases, but that&#39;s the fun of of having a website of your own to play with!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/tending-the-site/gallery.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9iZEJYCWZE-200.webp 200w, https://justinpinkney.com/img/9iZEJYCWZE-320.webp 320w, https://justinpinkney.com/img/9iZEJYCWZE-500.webp 500w, https://justinpinkney.com/img/9iZEJYCWZE-591.webp 591w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/9iZEJYCWZE-200.jpeg 200w, https://justinpinkney.com/img/9iZEJYCWZE-320.jpeg 320w, https://justinpinkney.com/img/9iZEJYCWZE-500.jpeg 500w, https://justinpinkney.com/img/9iZEJYCWZE-591.jpeg 591w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;gallery page&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9iZEJYCWZE-200.jpeg&quot; width=&quot;591&quot; height=&quot;1280&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Searching for Generative Train Journeys</title>
		<link href="https://justinpinkney.com/blog/2023/latent-train/"/>
		<updated>2023-11-05T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/latent-train/</id>
		<content type="html">&lt;p&gt;I&#39;ve long had an obsession with trying to create a purely generated train journey with machine learning tools. Cruising through the latent space, inside the internal world of a model is super appealing, and this post documents my years long efforts to do using various generative models along the way.&lt;/p&gt;
&lt;div id=&quot;punchline&quot;&gt;
In the end the rise of diffusion models made what I was trying to do possible, and if you want to see the end result, the thing I&#39;d been striving for all these years here&#39;s the final video:
&lt;/div&gt;
&lt;p&gt;https://youtu.be/_mt8Sbm-7CE?si=VheKzouy54rZeSXN&lt;/p&gt;
&lt;h2 id=&quot;the-stylegan-era&quot; tabindex=&quot;-1&quot;&gt;The StyleGAN era &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As many of the &lt;a href=&quot;https://justinpinkney.com/tags/stylegan/&quot;&gt;posts on this blog&lt;/a&gt; show I spent a lot of time playing with StyleGAN, and some of the very first StyleGAN models I trained myself were trying to generate non-existent train views. Turns out there are quite a lot of high quality YouTube videos of train rides taken from the cab to use as training data. Being the original StyleGAN, and some of the first GAN models I&#39;d ever trained the results weren&#39;t great, but they were really exciting to me. Unfortunately they couldn&#39;t take me anywhere but the shifting landscapes and scenery in interpolation videos were still mesmerising.&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-0&quot;&gt;&lt;sup&gt;[0]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-0&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;0. At the time I tried to fake the forward motion effect with some simple depth estimation and warping, it didn&#39;t look terrible but was pretty limited. &lt;video autoplay=&quot;&quot; loop=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2023/latent-train/gan-warp.mp4&quot;&gt;&lt;/video&gt;&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2023/latent-train/stylegan-train-interp-crop.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;trying-to-move-forward-pixel2style2pixel&quot; tabindex=&quot;-1&quot;&gt;Trying to move forward (pixel2style2pixel) &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Once I had a trained StyleGAN model I wanted to try travelling with it. There are some obvious approaches that I didn&#39;t try like trying to find a latent direction of &amp;quot;moving forward&amp;quot;, and the method I took was inspired by the fairly convincing appearance of simply zooming into the image with some depth aware warping. If I could take an image generated by the model, warp it slightly to look like we&#39;re moving forward, then re-encode that image into the StyleGAN latent space I could recover the &amp;quot;next&amp;quot; image in the imagined journey, and keep repeating the process to create a video. To embed images into the latent space I trained a pixel2style2pixel&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. E. Richardson et al., &lt;a href=&quot;https://arxiv.org/abs/2008.00951&quot;&gt;Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation&lt;/a&gt;, Jun. 2021.&lt;/span&gt;
&lt;/span&gt; model on my train StyleGAN, then I generated an image, warped it, encoded the warped version in the StyleGAN latent space and generated the new image. Unfortunately my train could never move forward with this method!&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2023/latent-train/gan-no-move.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Why won&#39;t my imaginary train move forward!?
&lt;/div&gt;
&lt;p&gt;Now I have a bit more experience and intuition of how StyleGAN works, it seems pretty clear that the injected noise maps are what&#39;s controlling the positions of small details like the poles and wires visible in the frame. As these noise maps are fixed during my video sequence, nothing really seems to change or move forward.&lt;/p&gt;
&lt;h2 id=&quot;swapping-autoencoder&quot; tabindex=&quot;-1&quot;&gt;Swapping autoencoder &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;After that failure I gave up on StyleGAN&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-2&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;2. I did try one further StyleGAN based method which was generating infinite scrolling videos using Aligning Latent and Image Spaces to Connect the Unconnectable (&lt;a href=&quot;https://github.com/universome/alis&quot;&gt;ALIS&lt;/a&gt;), unfortunately without the effective of parallax the videos don&#39;t give the effect I wanted. &lt;video controls=&quot;&quot; loop=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2023/latent-train/alis.mp4&quot;&gt;&lt;/video&gt;&lt;/span&gt;
&lt;/span&gt;, and instead of trying to generate a fully synthetic video, realised I could use a different model to fake the effect with a real video as a starting point.&lt;/p&gt;
&lt;p&gt;A highly under-rated model of the GAN era was &lt;a href=&quot;https://taesung.me/SwappingAutoencoder/&quot;&gt;Swapping Autoencoder&lt;/a&gt;, it was trained to distentangle style from content so you could take two images and take the content/layout from one and apply the style of another. Like most GANs it worked best under a limited domain, and the released landscapes model was amazing&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-3&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-3&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;3. T. Park et al., &lt;a href=&quot;https://arxiv.org/abs/2007.00653&quot;&gt;Swapping Autoencoder for Deep Image Manipulation&lt;/a&gt;,” 2020.&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/swapping.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/REzWGzKTxK-200.webp 200w, https://justinpinkney.com/img/REzWGzKTxK-320.webp 320w, https://justinpinkney.com/img/REzWGzKTxK-500.webp 500w, https://justinpinkney.com/img/REzWGzKTxK-800.webp 800w, https://justinpinkney.com/img/REzWGzKTxK-1024.webp 1024w, https://justinpinkney.com/img/REzWGzKTxK-1600.webp 1600w, https://justinpinkney.com/img/REzWGzKTxK-2550.webp 2550w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/REzWGzKTxK-200.jpeg 200w, https://justinpinkney.com/img/REzWGzKTxK-320.jpeg 320w, https://justinpinkney.com/img/REzWGzKTxK-500.jpeg 500w, https://justinpinkney.com/img/REzWGzKTxK-800.jpeg 800w, https://justinpinkney.com/img/REzWGzKTxK-1024.jpeg 1024w, https://justinpinkney.com/img/REzWGzKTxK-1600.jpeg 1600w, https://justinpinkney.com/img/REzWGzKTxK-2550.jpeg 2550w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Example images from the swapping autoencoder paper&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/REzWGzKTxK-200.jpeg&quot; width=&quot;2550&quot; height=&quot;1295&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Examples of Swapping Autoencoder outputs from the paper: &quot;Swapping Autoencoder for Deep Image Manipulation&quot;.
&lt;/div&gt;
&lt;p&gt;I realised I could fake the effect I wanted by taking a real life video of the view from a train ride and applying the style of random other landscape photos, melting between them as the train traveled. It didn&#39;t quite achieve my goal of traveling real imagined landscapes but the effect was great.&lt;/p&gt;
&lt;p&gt;https://youtu.be/ITrWUeHqwu4?si=ggp1KoFoimbCmlvt&lt;/p&gt;
&lt;h2 id=&quot;next-frame-prediction-with-stable-diffusion&quot; tabindex=&quot;-1&quot;&gt;Next frame prediction with Stable Diffusion &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Once stable diffusion came out and I had &lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/&quot;&gt;played with fine-tuning&lt;/a&gt; &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;it a little&lt;/a&gt; I came back to my usual quest. This time I was took some side facing train video of my own and tried training a next frame prediction model. If you&#39;re not familiar with Next Frame Prediction (NFP) then it&#39;s essentially training a network to take in one image, and to predict another image that would be the next frame in a video. You can see a good tutorial on how to train your own by &lt;a href=&quot;https://youtu.be/Gry1J3JhTP0?si=P-OHVjz1gLQTs-0z&quot;&gt;Derrick Schultz&lt;/a&gt;. The use of NFP models for making train video is very old (by machine learning standards), as seen in this amazing (and very inspirational) &lt;a href=&quot;https://magenta.tensorflow.org/nfp_p2p&quot;&gt;work by Damien Henry&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/schematic-nfp.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/RyZ7gEQVkR-200.webp 200w, https://justinpinkney.com/img/RyZ7gEQVkR-320.webp 320w, https://justinpinkney.com/img/RyZ7gEQVkR-500.webp 500w, https://justinpinkney.com/img/RyZ7gEQVkR-800.webp 800w, https://justinpinkney.com/img/RyZ7gEQVkR-1024.webp 1024w, https://justinpinkney.com/img/RyZ7gEQVkR-1600.webp 1600w, https://justinpinkney.com/img/RyZ7gEQVkR-2363.webp 2363w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/RyZ7gEQVkR-200.jpeg 200w, https://justinpinkney.com/img/RyZ7gEQVkR-320.jpeg 320w, https://justinpinkney.com/img/RyZ7gEQVkR-500.jpeg 500w, https://justinpinkney.com/img/RyZ7gEQVkR-800.jpeg 800w, https://justinpinkney.com/img/RyZ7gEQVkR-1024.jpeg 1024w, https://justinpinkney.com/img/RyZ7gEQVkR-1600.jpeg 1600w, https://justinpinkney.com/img/RyZ7gEQVkR-2363.jpeg 2363w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;A schematic of the NFP version of Stable Diffusion&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/RyZ7gEQVkR-200.jpeg&quot; width=&quot;2363&quot; height=&quot;901&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
A schematic of the NFP version of Stable Diffusion.
&lt;/div&gt;
&lt;p&gt;After a while of training to predict the next frame from my tiny train dataset the results were almost too good. Recursive next frame prediction to make a video actually turns out so realistic it&#39;s kind of boring.&lt;/p&gt;
&lt;p&gt;https://youtu.be/qGqPJSCuxPw?si=UwFWlDmcMCKQa0vu&lt;/p&gt;
&lt;br&gt;
&lt;div class=&quot;caption&quot;&gt;
This is a video is made by giving the NFP Stable Diffusion model a single starting frame example, then continuously getting it to predict a new frame, and then feeding that frame back in to create a video.
&lt;/div&gt;
&lt;p&gt;To make it more interesting I needed to add some more levers of control. To do this I repurposed &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;my image variations version of stable diffusion&lt;/a&gt;, and again fine-tuned it on the next frame prediction task. But this time instead of a only receiving the previous frame as input it also gets the clip image embedding of that frame. At inference time this lets you change the image embedding to anything you want which controls much of the style of the image, whereas the previous frame is most important for the global structure. This gives dream like videos of traveling sideways but with shifting and changing styles, colours and content depending on the clip embeddings used at each frame.&lt;/p&gt;
&lt;p&gt;https://youtu.be/8VLWoO232w8?si=PAEqlOkvCVhuhgVp&lt;/p&gt;
&lt;h2 id=&quot;the-final-cut&quot; tabindex=&quot;-1&quot;&gt;The final cut &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Putting all the pieces together I generated long sequences of frames where I would change the clip embeddings to that if various images and artwork. Sifted through for the section that worked best together and put them all together with an soundtrack assembled from various free audio samples of train noises.&lt;/p&gt;
&lt;p&gt;You can jump to the final &lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/&quot;&gt;here&lt;/a&gt;, otherwise there are also some nice still frames that come out of this technique:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/couple.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/gQ6DWy8bxq-200.webp 200w, https://justinpinkney.com/img/gQ6DWy8bxq-320.webp 320w, https://justinpinkney.com/img/gQ6DWy8bxq-500.webp 500w, https://justinpinkney.com/img/gQ6DWy8bxq-800.webp 800w, https://justinpinkney.com/img/gQ6DWy8bxq-1024.webp 1024w, https://justinpinkney.com/img/gQ6DWy8bxq-1280.webp 1280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/gQ6DWy8bxq-200.jpeg 200w, https://justinpinkney.com/img/gQ6DWy8bxq-320.jpeg 320w, https://justinpinkney.com/img/gQ6DWy8bxq-500.jpeg 500w, https://justinpinkney.com/img/gQ6DWy8bxq-800.jpeg 800w, https://justinpinkney.com/img/gQ6DWy8bxq-1024.jpeg 1024w, https://justinpinkney.com/img/gQ6DWy8bxq-1280.jpeg 1280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/gQ6DWy8bxq-200.jpeg&quot; width=&quot;1280&quot; height=&quot;640&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/stills/000431-up.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/8KedZIfoAV-200.webp 200w, https://justinpinkney.com/img/8KedZIfoAV-320.webp 320w, https://justinpinkney.com/img/8KedZIfoAV-500.webp 500w, https://justinpinkney.com/img/8KedZIfoAV-800.webp 800w, https://justinpinkney.com/img/8KedZIfoAV-1024.webp 1024w, https://justinpinkney.com/img/8KedZIfoAV-1600.webp 1600w, https://justinpinkney.com/img/8KedZIfoAV-3072.webp 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/8KedZIfoAV-200.jpeg 200w, https://justinpinkney.com/img/8KedZIfoAV-320.jpeg 320w, https://justinpinkney.com/img/8KedZIfoAV-500.jpeg 500w, https://justinpinkney.com/img/8KedZIfoAV-800.jpeg 800w, https://justinpinkney.com/img/8KedZIfoAV-1024.jpeg 1024w, https://justinpinkney.com/img/8KedZIfoAV-1600.jpeg 1600w, https://justinpinkney.com/img/8KedZIfoAV-3072.jpeg 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/8KedZIfoAV-200.jpeg&quot; width=&quot;3072&quot; height=&quot;1536&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/stills/city-up-photo2-resize.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/VPUnUIjzPX-200.webp 200w, https://justinpinkney.com/img/VPUnUIjzPX-320.webp 320w, https://justinpinkney.com/img/VPUnUIjzPX-500.webp 500w, https://justinpinkney.com/img/VPUnUIjzPX-800.webp 800w, https://justinpinkney.com/img/VPUnUIjzPX-1024.webp 1024w, https://justinpinkney.com/img/VPUnUIjzPX-1600.webp 1600w, https://justinpinkney.com/img/VPUnUIjzPX-2496.webp 2496w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/VPUnUIjzPX-200.jpeg 200w, https://justinpinkney.com/img/VPUnUIjzPX-320.jpeg 320w, https://justinpinkney.com/img/VPUnUIjzPX-500.jpeg 500w, https://justinpinkney.com/img/VPUnUIjzPX-800.jpeg 800w, https://justinpinkney.com/img/VPUnUIjzPX-1024.jpeg 1024w, https://justinpinkney.com/img/VPUnUIjzPX-1600.jpeg 1600w, https://justinpinkney.com/img/VPUnUIjzPX-2496.jpeg 2496w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/VPUnUIjzPX-200.jpeg&quot; width=&quot;2496&quot; height=&quot;1331&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/latent-train/stills/view-up.webp&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dF8LX3rnJg-200.webp&quot; width=&quot;1280&quot; height=&quot;720&quot; srcset=&quot;https://justinpinkney.com/img/dF8LX3rnJg-200.webp 200w, https://justinpinkney.com/img/dF8LX3rnJg-320.webp 320w, https://justinpinkney.com/img/dF8LX3rnJg-500.webp 500w, https://justinpinkney.com/img/dF8LX3rnJg-800.webp 800w, https://justinpinkney.com/img/dF8LX3rnJg-1024.webp 1024w, https://justinpinkney.com/img/dF8LX3rnJg-1280.webp 1280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Stable Diffusion Image Variations</title>
		<link href="https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/"/>
		<updated>2023-08-27T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/</id>
		<content type="html">&lt;script type=&quot;module&quot;&gt;
import PhotoSwipeLightbox from &#39;/photoswipe/photoswipe-lightbox.esm.js&#39;;
const lightbox = new PhotoSwipeLightbox({
  gallery: &#39;#gallery--getting-started&#39;,
  children: &#39;a&#39;,
  pswpModule: () =&gt; import(&#39;/photoswipe/photoswipe.esm.js&#39;)
});
lightbox.init();
&lt;/script&gt;
&lt;link rel=&quot;stylesheet&quot; href=&quot;https://justinpinkney.com/photoswipe/photoswipe.css&quot;&gt;
&lt;link rel=&quot;stylesheet&quot; href=&quot;https://justinpinkney.com/photoswipe/photoswipe.css&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/im-vars-banner.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/iB5K39aYpv-200.webp 200w, https://justinpinkney.com/img/iB5K39aYpv-320.webp 320w, https://justinpinkney.com/img/iB5K39aYpv-500.webp 500w, https://justinpinkney.com/img/iB5K39aYpv-800.webp 800w, https://justinpinkney.com/img/iB5K39aYpv-1024.webp 1024w, https://justinpinkney.com/img/iB5K39aYpv-1600.webp 1600w, https://justinpinkney.com/img/iB5K39aYpv-1742.webp 1742w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/iB5K39aYpv-200.jpeg 200w, https://justinpinkney.com/img/iB5K39aYpv-320.jpeg 320w, https://justinpinkney.com/img/iB5K39aYpv-500.jpeg 500w, https://justinpinkney.com/img/iB5K39aYpv-800.jpeg 800w, https://justinpinkney.com/img/iB5K39aYpv-1024.jpeg 1024w, https://justinpinkney.com/img/iB5K39aYpv-1600.jpeg 1600w, https://justinpinkney.com/img/iB5K39aYpv-1742.jpeg 1742w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Girl with a pearl earring and image variations&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/iB5K39aYpv-200.jpeg&quot; width=&quot;1742&quot; height=&quot;1006&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;p&gt;This post is time-travelling a little. For some reason I never blogged about the image variations stable diffusion model I trained, so this is a bit of a recap on the how and why of the image variation model and to collect some links and experiments using it. The model itself is a little old now and there are other models which are similar might be better like &lt;a href=&quot;https://github.com/kakaobrain/karlo&quot;&gt;Karlo v1.0&lt;/a&gt; and &lt;a href=&quot;https://github.com/ai-forever/Kandinsky-2&quot;&gt;Kandinsky&lt;/a&gt; (probably not the &lt;a href=&quot;https://github.com/Stability-AI/stablediffusion/blob/main/doc/UNCLIP.MD&quot;&gt;Stable Unclip&lt;/a&gt; though&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-0&quot;&gt;&lt;sup&gt;[0]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-0&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;0. For some reason the Stable Unclip is actually terrible quality. This is an example of the output when using my typical Ghibli house image for testing (hint it&#39;s really bad) &lt;img src=&quot;https://pbs.twimg.com/media/FstN1GJWIAEVprZ?format=jpg&amp;name=medium&quot;&gt;&lt;/span&gt;
&lt;/span&gt;.). But on &lt;a href=&quot;https://justinpinkney.com/blog/2023/the-other-web&quot;&gt;my drive to try and own my content&lt;/a&gt; and make sure it exists beyond walled gardens like Twitter, here&#39;s a post mostly cobbled together from other sources.&lt;/p&gt;
&lt;p&gt;As soon as the original stable diffusion was released I wanted to see how I could tweak the model beyond &lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator&quot;&gt;regular fine tuning&lt;/a&gt;. The image variations shown in the original dalle2 paper were always really compelling and this model presented the first chance to actually reproduce those (without lots of training resources). The key part would be to somehow pass images encoded as clip embeddings to the model rather than text embeddings.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/schematic-sd-small.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ViFpVPgjGs-200.webp 200w, https://justinpinkney.com/img/ViFpVPgjGs-320.webp 320w, https://justinpinkney.com/img/ViFpVPgjGs-500.webp 500w, https://justinpinkney.com/img/ViFpVPgjGs-800.webp 800w, https://justinpinkney.com/img/ViFpVPgjGs-1024.webp 1024w, https://justinpinkney.com/img/ViFpVPgjGs-1598.webp 1598w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/ViFpVPgjGs-200.png 200w, https://justinpinkney.com/img/ViFpVPgjGs-320.png 320w, https://justinpinkney.com/img/ViFpVPgjGs-500.png 500w, https://justinpinkney.com/img/ViFpVPgjGs-800.png 800w, https://justinpinkney.com/img/ViFpVPgjGs-1024.png 1024w, https://justinpinkney.com/img/ViFpVPgjGs-1598.png 1598w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;schematic of stable diffusion&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ViFpVPgjGs-200.png&quot; width=&quot;1598&quot; height=&quot;622&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Schematic of how the normal Stable Diffusion uses a set of token embeddings from the CLIP text encoder to condition the diffusion model.
&lt;/div&gt;
&lt;p&gt;Originally I mis-understood the architecture of stable diffusion and assumed that it took the final clip shared latent space text embedding and thought I might just be able to swap this out for an image embedding, but actually stable diffusion takes the full sequence of pooler and token embeddings, so I couldn&#39;t simply swap them out&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. I did do some experiments trying to generate these word embeddings from image embeddings but they never panned out&lt;/span&gt;
&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So instead I decided to swap out the conditioning altogether and fine tune the model to accept projected image embeddings from clip. So instead of using the text encoder to make a set of (batch_size,77,768) dimension CLIP word embeddings, I used the image encoder (plus final projection) to produce a (batch_size,1,768) size image embedding. Because this is just a shorter sequence it&#39;s easy to plumb into the existing cross-attention layers.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/schematic-imvar.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/uPaS9A9hvv-200.webp 200w, https://justinpinkney.com/img/uPaS9A9hvv-320.webp 320w, https://justinpinkney.com/img/uPaS9A9hvv-500.webp 500w, https://justinpinkney.com/img/uPaS9A9hvv-800.webp 800w, https://justinpinkney.com/img/uPaS9A9hvv-1024.webp 1024w, https://justinpinkney.com/img/uPaS9A9hvv-1600.webp 1600w, https://justinpinkney.com/img/uPaS9A9hvv-2072.webp 2072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/uPaS9A9hvv-200.png 200w, https://justinpinkney.com/img/uPaS9A9hvv-320.png 320w, https://justinpinkney.com/img/uPaS9A9hvv-500.png 500w, https://justinpinkney.com/img/uPaS9A9hvv-800.png 800w, https://justinpinkney.com/img/uPaS9A9hvv-1024.png 1024w, https://justinpinkney.com/img/uPaS9A9hvv-1600.png 1600w, https://justinpinkney.com/img/uPaS9A9hvv-2072.png 2072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;schematic of stable diffusion image variations&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/uPaS9A9hvv-200.png&quot; width=&quot;2072&quot; height=&quot;862&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Schematic of Image Variation model showing the replacement of the CLIP text encoder with the CLIP image encoder, conditioning on a single image embedding.
&lt;/div&gt;
&lt;p&gt;With some simple tweaking of the original training repo I could finetune the model to accept the new conditioning, there are more details on the training in the &lt;a href=&quot;https://huggingface.co/lambdalabs/sd-image-variations-diffusers&quot;&gt;model card&lt;/a&gt;, but here&#39;s a quick summary&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-2&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;2. This is actually the training procedure of the v2 model which was trained for longer and more carefully, giving better results&lt;/span&gt;
&lt;/span&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fine-tuned from the Stable Diffusion v1-4 checkpoint&lt;/li&gt;
&lt;li&gt;Trained on LAION improved aesthetics 6plus.&lt;/li&gt;
&lt;li&gt;Trained on 8 x A100-40GB GPUs&lt;/li&gt;
&lt;li&gt;Stage 1 - Fine tune only CrossAttention layer weights
&lt;ul&gt;
&lt;li&gt;Steps: 46,000&lt;/li&gt;
&lt;li&gt;Batch: batch size=4, GPUs=8, Gradient Accumulations=4. Total batch size=128&lt;/li&gt;
&lt;li&gt;Learning rate: warmup to 1e-5 for 10,000 steps and then kept constant, AdamW optimiser&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Stage 2 - Fine tune the whole Unet
&lt;ul&gt;
&lt;li&gt;Steps: 50,000&lt;/li&gt;
&lt;li&gt;Batch: batch size=4, GPUs=8, Gradient Accumulations=5. Total batch size=160&lt;/li&gt;
&lt;li&gt;Learning rate: warmup to 1e-5 for 5,000 steps and then kept constant, AdamW optimiser&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;results&quot; tabindex=&quot;-1&quot;&gt;Results &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The very first results I saw during after letting run for a decent amount of time looked like this:&lt;/p&gt;
&lt;div class=&quot;pswp-gallery&quot; id=&quot;gallery--getting-started&quot;&gt;
    &lt;div class=&quot;flex-container&quot;&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/first-training/training_1.jpg&quot; data-pswp-width=&quot;2060&quot; data-pswp-height=&quot;1036&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/first-training/training_1.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/first-training/training_2.jpg&quot; data-pswp-width=&quot;2060&quot; data-pswp-height=&quot;1036&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/first-training/training_2.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
    &lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;
The top row is the conditioning images, and the bottom row are generations from the model. They look &quot;similar&quot;, that means it was working!
&lt;/div&gt;
&lt;p&gt;The first batch of results I shared on Twitter showed the classic image variations for some famous images (see if you can guess which ones):&lt;/p&gt;
&lt;div class=&quot;flex-container&quot;&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/6l3fbER4VV-200.webp 200w, https://justinpinkney.com/img/6l3fbER4VV-320.webp 320w, https://justinpinkney.com/img/6l3fbER4VV-500.webp 500w, https://justinpinkney.com/img/6l3fbER4VV-800.webp 800w, https://justinpinkney.com/img/6l3fbER4VV-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/6l3fbER4VV-200.jpeg 200w, https://justinpinkney.com/img/6l3fbER4VV-320.jpeg 320w, https://justinpinkney.com/img/6l3fbER4VV-500.jpeg 500w, https://justinpinkney.com/img/6l3fbER4VV-800.jpeg 800w, https://justinpinkney.com/img/6l3fbER4VV-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/6l3fbER4VV-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ZcvDGnuB_w-200.webp 200w, https://justinpinkney.com/img/ZcvDGnuB_w-320.webp 320w, https://justinpinkney.com/img/ZcvDGnuB_w-500.webp 500w, https://justinpinkney.com/img/ZcvDGnuB_w-800.webp 800w, https://justinpinkney.com/img/ZcvDGnuB_w-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ZcvDGnuB_w-200.jpeg 200w, https://justinpinkney.com/img/ZcvDGnuB_w-320.jpeg 320w, https://justinpinkney.com/img/ZcvDGnuB_w-500.jpeg 500w, https://justinpinkney.com/img/ZcvDGnuB_w-800.jpeg 800w, https://justinpinkney.com/img/ZcvDGnuB_w-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ZcvDGnuB_w-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/dGyj6Gi8X1-200.webp 200w, https://justinpinkney.com/img/dGyj6Gi8X1-320.webp 320w, https://justinpinkney.com/img/dGyj6Gi8X1-500.webp 500w, https://justinpinkney.com/img/dGyj6Gi8X1-800.webp 800w, https://justinpinkney.com/img/dGyj6Gi8X1-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/dGyj6Gi8X1-200.jpeg 200w, https://justinpinkney.com/img/dGyj6Gi8X1-320.jpeg 320w, https://justinpinkney.com/img/dGyj6Gi8X1-500.jpeg 500w, https://justinpinkney.com/img/dGyj6Gi8X1-800.jpeg 800w, https://justinpinkney.com/img/dGyj6Gi8X1-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dGyj6Gi8X1-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/g-Izg0Vw2V-200.webp 200w, https://justinpinkney.com/img/g-Izg0Vw2V-320.webp 320w, https://justinpinkney.com/img/g-Izg0Vw2V-500.webp 500w, https://justinpinkney.com/img/g-Izg0Vw2V-800.webp 800w, https://justinpinkney.com/img/g-Izg0Vw2V-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/g-Izg0Vw2V-200.jpeg 200w, https://justinpinkney.com/img/g-Izg0Vw2V-320.jpeg 320w, https://justinpinkney.com/img/g-Izg0Vw2V-500.jpeg 500w, https://justinpinkney.com/img/g-Izg0Vw2V-800.jpeg 800w, https://justinpinkney.com/img/g-Izg0Vw2V-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/g-Izg0Vw2V-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;
&lt;/div&gt;
&lt;h2 id=&quot;version-2&quot; tabindex=&quot;-1&quot;&gt;Version 2 &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/v2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/z80ZILhdGz-200.webp 200w, https://justinpinkney.com/img/z80ZILhdGz-320.webp 320w, https://justinpinkney.com/img/z80ZILhdGz-500.webp 500w, https://justinpinkney.com/img/z80ZILhdGz-800.webp 800w, https://justinpinkney.com/img/z80ZILhdGz-1024.webp 1024w, https://justinpinkney.com/img/z80ZILhdGz-1600.webp 1600w, https://justinpinkney.com/img/z80ZILhdGz-4096.webp 4096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/z80ZILhdGz-200.jpeg 200w, https://justinpinkney.com/img/z80ZILhdGz-320.jpeg 320w, https://justinpinkney.com/img/z80ZILhdGz-500.jpeg 500w, https://justinpinkney.com/img/z80ZILhdGz-800.jpeg 800w, https://justinpinkney.com/img/z80ZILhdGz-1024.jpeg 1024w, https://justinpinkney.com/img/z80ZILhdGz-1600.jpeg 1600w, https://justinpinkney.com/img/z80ZILhdGz-4096.jpeg 4096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;image variation examples&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/z80ZILhdGz-200.jpeg&quot; width=&quot;4096&quot; height=&quot;1758&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After retraining the model a little more carefully&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-3&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-3&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;3. For v2 I fine tuned only the cross attention weights initially to try and help the model to better adapt to the new conditioning without degrading its performance, then followed that up with a full fine tune at a large batch size. Maybe if I were doing it now I might trying using LoRA.&lt;/span&gt;
&lt;/span&gt;, for longer, and with a bigger batch size, I released a V2 model which gave substantially better results.&lt;/p&gt;
&lt;p&gt;Here are some side by side examples of the improved quality of the v2 model, showing the improved fidelity and coherence of the v2 images.&lt;/p&gt;
&lt;div class=&quot;pswp-gallery&quot; id=&quot;gallery--getting-started&quot;&gt;
    &lt;div class=&quot;flex-container&quot;&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/compare-v2.jpg&quot; data-pswp-width=&quot;1024&quot; data-pswp-height=&quot;1546&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/compare-v2.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/compare-v1.jpg&quot; data-pswp-width=&quot;1024&quot; data-pswp-height=&quot;1546&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/compare-v1.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
    &lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;
Comparison of v2 (left) against v1 (right) conditioned on various images, one per row&lt;span class=&quot;sidenote&quot;&gt;
		&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-4&quot;&gt;&lt;sup&gt;[4]&lt;/sup&gt;&lt;/label&gt;
		&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-4&quot;&gt;
		&lt;span class=&quot;sidenote-content&quot;&gt;4. Original images: &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/compare-orig.jpg&quot;&gt;&lt;/span&gt;
	&lt;/span&gt;.
&lt;/div&gt;
&lt;h3 id=&quot;quirks-a-k-a-the-problem-is-always-image-resizing&quot; tabindex=&quot;-1&quot;&gt;Quirks (a.k.a the problem is always image resizing) &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;One thing that really baffled me for a long time, was that in the huggingface diffusers port I would get much worse quality images which were always slightly blurred. After scouring the code for what might be the problem it turned out to be the &lt;strong&gt;same problem it always is in image processing&lt;/strong&gt; code, the resizing method. Seriously, if you have an image processing issue, check your image resizing first!&lt;/p&gt;
&lt;p&gt;Turns out I accidentally trained the model with &lt;code&gt;antialias=False&lt;/code&gt; during resize. So when the huggingface diffusers pipeline applied the &amp;quot;proper&amp;quot; behaviour, i.e. using anti-aliasing, the results were terrible, seems like the model is &lt;em&gt;very&lt;/em&gt; sensitive to the resize method used.&lt;/p&gt;
&lt;div class=&quot;pswp-gallery&quot; id=&quot;gallery--getting-started&quot;&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/v2-goodresize.jpg&quot; data-pswp-width=&quot;2048&quot; data-pswp-height=&quot;512&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/v2-goodresize.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
        &lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/v2-badresize.jpg&quot; data-pswp-width=&quot;2048&quot; data-pswp-height=&quot;512&quot; target=&quot;_blank&quot;&gt;
            &lt;img src=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/v2-badresize.jpg&quot; alt=&quot;&quot;&gt;
        &lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;
Running the model with the resize method it was trained on (first) vs a slightly different one (second) gives a dramatic difference to the results&lt;span class=&quot;sidenote&quot;&gt;
		&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-5&quot;&gt;&lt;sup&gt;[5]&lt;/sup&gt;&lt;/label&gt;
		&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-5&quot;&gt;
		&lt;span class=&quot;sidenote-content&quot;&gt;5. The input image for this is my typical Ghibli House test: &lt;img src=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_6_0.jpg&quot;&gt;&lt;/span&gt;
	&lt;/span&gt;.
&lt;/div&gt;
&lt;p&gt;The other thing I noticed is that my nice example at the top of this post (Girl with a Pearl Earing by Vermeer) didn&#39;t produce pleasant variations as before, but now was badly overfit. It&#39;s actually a sign of how badly undertrained the first model was, as that particular image occurs many many times in the training data (see &lt;a href=&quot;https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn.laion.ai&amp;amp;index=laion5B-H-14&amp;amp;useMclip=false&amp;amp;imageUrl=http%3A%2F%2Fmail.100besteverything.com%2F100beimages%2Fartists%2F2124_johannesvermeer1.jpg&quot;&gt;these search results&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/overfit.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Xuc3QC_4PZ-200.webp 200w, https://justinpinkney.com/img/Xuc3QC_4PZ-320.webp 320w, https://justinpinkney.com/img/Xuc3QC_4PZ-500.webp 500w, https://justinpinkney.com/img/Xuc3QC_4PZ-800.webp 800w, https://justinpinkney.com/img/Xuc3QC_4PZ-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Xuc3QC_4PZ-200.jpeg 200w, https://justinpinkney.com/img/Xuc3QC_4PZ-320.jpeg 320w, https://justinpinkney.com/img/Xuc3QC_4PZ-500.jpeg 500w, https://justinpinkney.com/img/Xuc3QC_4PZ-800.jpeg 800w, https://justinpinkney.com/img/Xuc3QC_4PZ-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Overfit girl with pearl earring generations&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Xuc3QC_4PZ-200.jpeg&quot; width=&quot;1024&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;follow-ups&quot; tabindex=&quot;-1&quot;&gt;Follow ups &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Hopefully the model has been somewhat useful for people, it also showed how adaptable Stable Diffusion is to changing the conditioning, something I&#39;ve done a bunch of further experiments on. Some of these I&#39;ve already blogged about: doing &lt;a href=&quot;https://justinpinkney.com/blog/2022/clip-latent-space&quot;&gt;CLIP latent space editing&lt;/a&gt; and &lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments&quot;&gt;various other experiments&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was a &lt;a href=&quot;https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations&quot;&gt;very popular Huggingface Space&lt;/a&gt;, and if you want to try the model yourself, it&#39;s probably the easiest way.&lt;/p&gt;
&lt;p&gt;It also made an appearance in the paper &lt;em&gt;Versatile Diffusion: Text, Images and Variations All in One Diffusion Model&lt;/em&gt; (&lt;a href=&quot;https://arxiv.org/abs/2211.08332&quot;&gt;arxiv&lt;/a&gt;). Unfortunately they had done the work before I&#39;d released the v2 model.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/stable-diffusion-image-variations/paper.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9S3MfjhXQo-200.webp 200w, https://justinpinkney.com/img/9S3MfjhXQo-320.webp 320w, https://justinpinkney.com/img/9S3MfjhXQo-500.webp 500w, https://justinpinkney.com/img/9S3MfjhXQo-800.webp 800w, https://justinpinkney.com/img/9S3MfjhXQo-1024.webp 1024w, https://justinpinkney.com/img/9S3MfjhXQo-1600.webp 1600w, https://justinpinkney.com/img/9S3MfjhXQo-2666.webp 2666w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/9S3MfjhXQo-200.jpeg 200w, https://justinpinkney.com/img/9S3MfjhXQo-320.jpeg 320w, https://justinpinkney.com/img/9S3MfjhXQo-500.jpeg 500w, https://justinpinkney.com/img/9S3MfjhXQo-800.jpeg 800w, https://justinpinkney.com/img/9S3MfjhXQo-1024.jpeg 1024w, https://justinpinkney.com/img/9S3MfjhXQo-1600.jpeg 1600w, https://justinpinkney.com/img/9S3MfjhXQo-2666.jpeg 2666w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Examples of image variation in academic paper&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9S3MfjhXQo-200.jpeg&quot; width=&quot;2666&quot; height=&quot;1226&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And the concept was also the basis for my &lt;a href=&quot;https://huggingface.co/lambdalabs/image-mixer&quot;&gt;Image Mixer model&lt;/a&gt; (which I&#39;ll write up in more detail in future).&lt;/p&gt;
&lt;p&gt;It&#39;s also possible to use this model as a full on text to image generator by using an existing CLIP text embedding to image embedding prior. I did a &lt;a href=&quot;https://github.com/justinpinkney/stable-diffusion/blob/main/examples/prior_2_sd.ipynb&quot;&gt;little experiment with the LAION prior&lt;/a&gt; to show this was possible, but since then the &lt;a href=&quot;https://github.com/kakaobrain/karlo&quot;&gt;Karlo v1.0&lt;/a&gt; and &lt;a href=&quot;https://github.com/ai-forever/Kandinsky-2&quot;&gt;Kandinsky&lt;/a&gt; priors have come out which are probably much better.&lt;/p&gt;
&lt;p&gt;Finally if you want to watch a whole talk my me on this and related topics, I gave one at the Hugging Face Diffusers event:&lt;/p&gt;
&lt;p&gt;https://youtu.be/mpMGwQa7J1w?si=ilfxsLlQlquIZmB9&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>The Other Web</title>
		<link href="https://justinpinkney.com/blog/2023/the-other-web/"/>
		<updated>2023-08-22T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/the-other-web/</id>
		<content type="html">&lt;h1 id=&quot;the-other-web&quot; tabindex=&quot;-1&quot;&gt;The other web &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/the-other-web/&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2023/the-other-web/other-web-2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/t69Suy8ac--200.webp 200w, https://justinpinkney.com/img/t69Suy8ac--320.webp 320w, https://justinpinkney.com/img/t69Suy8ac--500.webp 500w, https://justinpinkney.com/img/t69Suy8ac--800.webp 800w, https://justinpinkney.com/img/t69Suy8ac--1024.webp 1024w, https://justinpinkney.com/img/t69Suy8ac--1486.webp 1486w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/t69Suy8ac--200.jpeg 200w, https://justinpinkney.com/img/t69Suy8ac--320.jpeg 320w, https://justinpinkney.com/img/t69Suy8ac--500.jpeg 500w, https://justinpinkney.com/img/t69Suy8ac--800.jpeg 800w, https://justinpinkney.com/img/t69Suy8ac--1024.jpeg 1024w, https://justinpinkney.com/img/t69Suy8ac--1486.jpeg 1486w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;watercolour illustration of houses connected by wires&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/t69Suy8ac--200.jpeg&quot; width=&quot;1486&quot; height=&quot;592&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The recently accelerated &lt;a href=&quot;https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys&quot;&gt;enshitification&lt;/a&gt; of Twitter made me realise that too much of what I do and make is trapped in that platform. I feel this especially when I start trying to explain projects or experiments I&#39;ve worked on to people by sharing tweets, why isn&#39;t it on my blog?&lt;/p&gt;
&lt;p&gt;Its probably not an understatement to say that Twitter has changed the course of my career, it&#39;s how &lt;a href=&quot;https://www.justinpinkney.com/blog/2020/making-toonify/&quot;&gt;Toonify went viral&lt;/a&gt; and me getting jobs at Lambda and Midjourney both started with Twitter DMs.&lt;/p&gt;
&lt;p&gt;But with recent changes to Twitter making me think a bit more about why and what I&#39;m sharing on the web I increasingly realise the internet doesn&#39;t just have to be big platforms, social networks, and the first page of a Google search. There is another web out there.&lt;/p&gt;
&lt;h2 id=&quot;the-other-web-1&quot; tabindex=&quot;-1&quot;&gt;The other web &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/the-other-web/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I don&#39;t know what the right term for it is, but I&#39;ve heard various ones:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the slow/personal/handmade/outer/small/indie web&lt;/li&gt;
&lt;li&gt;digital homesteading/gardening (well maybe &amp;quot;digital gardens&amp;quot; has been taken by the person productivity/tools for thought crowd)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&#39;s a lot I like about the ethos of tending a little personal place on the web and filling it with things that interest you. Hopefully a few people stumble across it and have that same great feeling of discovery that I do when I come across a lovely home made website full of curiosities and personality.&lt;/p&gt;
&lt;p&gt;So I&#39;m trying to share more of what I do here rather than Twitter, and preserve things that were only throw away threads as blog posts (starting &lt;a href=&quot;https://www.justinpinkney.com/blog/2022/clip-latent-space/&quot;&gt;here&lt;/a&gt;), and make my website my home on the web.&lt;/p&gt;
&lt;h2 id=&quot;links&quot; tabindex=&quot;-1&quot;&gt;Links &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/the-other-web/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We need links, more links, to good things not the usual garbage listicles, click bait and ad vehicles. The authentic, real, handmade web is hard to find unless we all link to it. So here&#39;s some of the interesting places/articles I found delving into and around this topic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://mmm.page&quot;&gt;mmm.page&lt;/a&gt; and &lt;a href=&quot;https://kinopio.club&quot;&gt;Kinopio&lt;/a&gt; and the communities around them are two of the places that have a wonderful ethos around what web software (beyond pure webpages) can be and helped me find lots more interesting threads&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://html.energy/podcast.html&quot;&gt;html.energy&lt;/a&gt; has a fantastic podcast with interviews with many interesting people&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://thecreativeindependent.com/essays/laurel-schwulst-my-website-is-a-shifting-house-next-to-a-river-of-knowledge-what-could-yours-be/&quot;&gt;My website is a shifting house next to a river of knowledge. What could yours be?&lt;/a&gt; Poses great questions about what an analogy for a website might be, &amp;quot;Website as a puddle&amp;quot; anyone? (from the great &lt;a href=&quot;https://thecreativeindependent.com/&quot;&gt;Creative Independant&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://handmade-web.net/library.html&quot;&gt;Handmade Web reading list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://neustadt.fr/essays/the-small-web/&quot;&gt;An essay on the Small web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://32bit.cafe/&quot;&gt;32-bit cafe&lt;/a&gt;: &amp;quot;a community of like-minded website hobbyists and professionals helping  to make the personal web fruitful and bountiful again, full of  self-expression and removing the capitalistic drive out of it&amp;quot;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tinysubversions.com/twitter-archive/&quot;&gt;Darius Kazemi&#39;s Twitter archive&lt;/a&gt; An example of, and tool for breaking your archive out of Twitter&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://flower.codes/2022/05/08/rediscovering-discovery.html&quot;&gt;Rediscovering Discovery on the Web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blogroll.org/&quot;&gt;A well curated old fashioned blogroll&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Some great examples of public collated rss feeds from blog rolls:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://tomcritchlow.com/feeds/&quot;&gt;https://tomcritchlow.com/feeds/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blogworm.eu/&quot;&gt;https://blogworm.eu/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.alexmolas.com/blogroll&quot;&gt;https://www.alexmolas.com/blogroll&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;An article on the &lt;a href=&quot;https://smallweb.page/why&quot;&gt;&amp;quot;small web&amp;quot;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Pages with many great links:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://a-website-is-a-room.net/&quot;&gt;https://a-website-is-a-room.net/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://broadsheet.chasem.co/&quot;&gt;https://broadsheet.chasem.co/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://readsomethingwonderful.com/&quot;&gt;https://readsomethingwonderful.com/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
</content>
	</entry>
	
	<entry>
		<title>Algorithmic Film Making</title>
		<link href="https://justinpinkney.com/blog/2023/algorithmic-film/"/>
		<updated>2023-08-13T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/algorithmic-film/</id>
		<content type="html">&lt;p&gt;https://youtu.be/rsxUcmHNH9s&lt;/p&gt;
&lt;p&gt;I recently signed up to &lt;a href=&quot;https://artificial-images.com/&quot;&gt;Derrick Schultz&lt;/a&gt;&#39;s online &lt;a href=&quot;https://www.bustbright.com/product/algorithmic-filmmaking-async-class-august-7-thru-september-11-2023-/334&quot;&gt;algorithmic film making class&lt;/a&gt;. I&#39;m excited to be taking one of Derrick&#39;s classes, I&#39;ve been a big fan for a long time of both his work and teaching (not just because &lt;a href=&quot;https://www.youtube.com/@artificialimages&quot;&gt;his YouTube channel&lt;/a&gt; has long been my default response when people would ask me &amp;quot;How do I StyleGAN?&amp;quot;).&lt;/p&gt;
&lt;p&gt;I&#39;m also keen to have something I can focus on as an outlet for some ml based art and maybe some open source models and tools. The slow death of Twitter and starting at Midjourney has meant I&#39;ve not had much motivation or reason to play in the open source creative ml space for a little while now. I also have a 2 TB dataset of movie trailers sitting on my hard drive I want to knead into something interesting.&lt;/p&gt;
&lt;p&gt;I don&#39;t have any particular ideas for what my final project for the course will be yet, but I&#39;ve always found the idea of visual collage and montaging videos particularly interesting (as well as the extreme extrapolation of that process: slit scanning).&lt;/p&gt;
&lt;p&gt;This is an example of the sort of cut-up montage video that I&#39;ve been thinking about:&lt;/p&gt;
&lt;p&gt;https://twitter.com/Buntworthy/status/1459838418339471364&lt;/p&gt;
&lt;p&gt;This originally sprouted from some earlier experiments in collaging images. Here&#39;s a link to a thread of some of my experiments and examples from others that convey the sort of aesthetic I like. (One day I&#39;ll break them all out from the prison of Twitter into my blog).&lt;/p&gt;
&lt;p&gt;https://twitter.com/Buntworthy/status/1443314544252686338&lt;/p&gt;
&lt;p&gt;Why this? I&#39;m not, sure something about how it relates to what I do every day: machine learning is all about datasets and seeing the patterns in those. That&#39;s what those montages feel like, trying (but not really being able) to convey a sense of the whole as much as any part. (I also spend a lot of my days looking at great big grids of beautiful images. So my eyes end up tuned to the things)&lt;/p&gt;
&lt;p&gt;The logical extreme of cutting up images or video and recombining them are also visible in some of the slit scanning and &lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/&quot;&gt;colour sorting experiments&lt;/a&gt; I&#39;ve done before.&lt;/p&gt;
&lt;p&gt;https://youtu.be/zgxHXQaObFg&lt;/p&gt;
&lt;p&gt;Given that I used to everything &lt;a href=&quot;https://justinpinkney.com/tags/stylegan/&quot;&gt;related to StyleGAN&lt;/a&gt;, here&#39;s a StyleGAN related example, slit scanning through interpolation spaces:&lt;/p&gt;
&lt;p&gt;https://twitter.com/Buntworthy/status/1451105403857739776&lt;/p&gt;
&lt;p&gt;(There was also an interactive version of this I never published:)&lt;/p&gt;
&lt;p&gt;https://twitter.com/Buntworthy/status/1222945471003602944&lt;/p&gt;
&lt;h2 id=&quot;week-1-randomising-clips&quot; tabindex=&quot;-1&quot;&gt;Week 1 - Randomising clips &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2023/algorithmic-film/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The first week&#39;s exercise is to chop up something into shots and randomise the order, just to get a feel with working with video datasets and to see what interesting things turns up. I didn&#39;t want to face battling with my big dataset just yet, so I cut up The Tale of Princess Kaguya into shots and arranged them in a grid. The grid view fits my aesthetic sensibilities and also means that I can actually watch the entire resulting clip in a reasonable amount of time.&lt;/p&gt;
&lt;p&gt;I made two versions, one with a totally random order, and the other where the clips are sorted by order of occurence, but the new clips are placed in the grid as soon as any previous one finishes, so the whole thing ends up going slowly out of sync as time progresses, which I think is a nice effect. The sorted version is &lt;a href=&quot;https://justinpinkney.com/blog/2023/algorithmic-film/&quot;&gt;at the top of this page&lt;/a&gt; and the randomised one below:&lt;/p&gt;
&lt;p&gt;https://youtu.be/Lyg16nGbUtM&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Flowers Deconstructed</title>
		<link href="https://justinpinkney.com/blog/2023/flowers-deconstructed/"/>
		<updated>2023-07-10T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2023/flowers-deconstructed/</id>
		<content type="html">&lt;p&gt;Classic paintings of flowers cut and re-arranged.&lt;/p&gt;
&lt;p&gt;A long time ago I made these digitally cutout and collaged images of flowers. They were all created using public domain flower paintings, &lt;a href=&quot;https://arxiv.org/abs/2005.09007&quot;&gt;U2net&lt;/a&gt; for segmentation and image magick for collaging.&lt;/p&gt;
&lt;br&gt;
&lt;!-- &lt;div class=&quot;full-width&quot;&gt; --&gt;
&lt;br&gt;
&lt;p&gt;&lt;/p&gt;&lt;div class=&quot;gallery&quot;&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-00.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/FaM_miDa7E-200.webp 200w, https://justinpinkney.com/img/FaM_miDa7E-320.webp 320w, https://justinpinkney.com/img/FaM_miDa7E-500.webp 500w, https://justinpinkney.com/img/FaM_miDa7E-800.webp 800w, https://justinpinkney.com/img/FaM_miDa7E-1024.webp 1024w, https://justinpinkney.com/img/FaM_miDa7E-1276.webp 1276w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/FaM_miDa7E-200.jpeg 200w, https://justinpinkney.com/img/FaM_miDa7E-320.jpeg 320w, https://justinpinkney.com/img/FaM_miDa7E-500.jpeg 500w, https://justinpinkney.com/img/FaM_miDa7E-800.jpeg 800w, https://justinpinkney.com/img/FaM_miDa7E-1024.jpeg 1024w, https://justinpinkney.com/img/FaM_miDa7E-1276.jpeg 1276w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/FaM_miDa7E-200.jpeg&quot; width=&quot;1276&quot; height=&quot;1920&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-01.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/94OOk9t5Xk-200.webp 200w, https://justinpinkney.com/img/94OOk9t5Xk-320.webp 320w, https://justinpinkney.com/img/94OOk9t5Xk-500.webp 500w, https://justinpinkney.com/img/94OOk9t5Xk-800.webp 800w, https://justinpinkney.com/img/94OOk9t5Xk-1024.webp 1024w, https://justinpinkney.com/img/94OOk9t5Xk-1379.webp 1379w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/94OOk9t5Xk-200.jpeg 200w, https://justinpinkney.com/img/94OOk9t5Xk-320.jpeg 320w, https://justinpinkney.com/img/94OOk9t5Xk-500.jpeg 500w, https://justinpinkney.com/img/94OOk9t5Xk-800.jpeg 800w, https://justinpinkney.com/img/94OOk9t5Xk-1024.jpeg 1024w, https://justinpinkney.com/img/94OOk9t5Xk-1379.jpeg 1379w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/94OOk9t5Xk-200.jpeg&quot; width=&quot;1379&quot; height=&quot;1920&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-02.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Yl8RsM7t8o-200.webp 200w, https://justinpinkney.com/img/Yl8RsM7t8o-320.webp 320w, https://justinpinkney.com/img/Yl8RsM7t8o-500.webp 500w, https://justinpinkney.com/img/Yl8RsM7t8o-800.webp 800w, https://justinpinkney.com/img/Yl8RsM7t8o-1024.webp 1024w, https://justinpinkney.com/img/Yl8RsM7t8o-1600.webp 1600w, https://justinpinkney.com/img/Yl8RsM7t8o-1626.webp 1626w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Yl8RsM7t8o-200.jpeg 200w, https://justinpinkney.com/img/Yl8RsM7t8o-320.jpeg 320w, https://justinpinkney.com/img/Yl8RsM7t8o-500.jpeg 500w, https://justinpinkney.com/img/Yl8RsM7t8o-800.jpeg 800w, https://justinpinkney.com/img/Yl8RsM7t8o-1024.jpeg 1024w, https://justinpinkney.com/img/Yl8RsM7t8o-1600.jpeg 1600w, https://justinpinkney.com/img/Yl8RsM7t8o-1626.jpeg 1626w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Yl8RsM7t8o-200.jpeg&quot; width=&quot;1626&quot; height=&quot;1920&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-09.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/y6wWbyd0kr-200.webp 200w, https://justinpinkney.com/img/y6wWbyd0kr-320.webp 320w, https://justinpinkney.com/img/y6wWbyd0kr-500.webp 500w, https://justinpinkney.com/img/y6wWbyd0kr-800.webp 800w, https://justinpinkney.com/img/y6wWbyd0kr-1024.webp 1024w, https://justinpinkney.com/img/y6wWbyd0kr-1600.webp 1600w, https://justinpinkney.com/img/y6wWbyd0kr-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/y6wWbyd0kr-200.jpeg 200w, https://justinpinkney.com/img/y6wWbyd0kr-320.jpeg 320w, https://justinpinkney.com/img/y6wWbyd0kr-500.jpeg 500w, https://justinpinkney.com/img/y6wWbyd0kr-800.jpeg 800w, https://justinpinkney.com/img/y6wWbyd0kr-1024.jpeg 1024w, https://justinpinkney.com/img/y6wWbyd0kr-1600.jpeg 1600w, https://justinpinkney.com/img/y6wWbyd0kr-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/y6wWbyd0kr-200.jpeg&quot; width=&quot;1920&quot; height=&quot;359&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-05.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/pKeobaLMPa-200.webp 200w, https://justinpinkney.com/img/pKeobaLMPa-320.webp 320w, https://justinpinkney.com/img/pKeobaLMPa-500.webp 500w, https://justinpinkney.com/img/pKeobaLMPa-800.webp 800w, https://justinpinkney.com/img/pKeobaLMPa-1024.webp 1024w, https://justinpinkney.com/img/pKeobaLMPa-1513.webp 1513w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/pKeobaLMPa-200.jpeg 200w, https://justinpinkney.com/img/pKeobaLMPa-320.jpeg 320w, https://justinpinkney.com/img/pKeobaLMPa-500.jpeg 500w, https://justinpinkney.com/img/pKeobaLMPa-800.jpeg 800w, https://justinpinkney.com/img/pKeobaLMPa-1024.jpeg 1024w, https://justinpinkney.com/img/pKeobaLMPa-1513.jpeg 1513w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/pKeobaLMPa-200.jpeg&quot; width=&quot;1513&quot; height=&quot;1920&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-06.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/H29EjfFbSs-200.webp 200w, https://justinpinkney.com/img/H29EjfFbSs-320.webp 320w, https://justinpinkney.com/img/H29EjfFbSs-500.webp 500w, https://justinpinkney.com/img/H29EjfFbSs-800.webp 800w, https://justinpinkney.com/img/H29EjfFbSs-1024.webp 1024w, https://justinpinkney.com/img/H29EjfFbSs-1492.webp 1492w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/H29EjfFbSs-200.jpeg 200w, https://justinpinkney.com/img/H29EjfFbSs-320.jpeg 320w, https://justinpinkney.com/img/H29EjfFbSs-500.jpeg 500w, https://justinpinkney.com/img/H29EjfFbSs-800.jpeg 800w, https://justinpinkney.com/img/H29EjfFbSs-1024.jpeg 1024w, https://justinpinkney.com/img/H29EjfFbSs-1492.jpeg 1492w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/H29EjfFbSs-200.jpeg&quot; width=&quot;1492&quot; height=&quot;1920&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-07.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/IVIFqkmfTw-200.webp 200w, https://justinpinkney.com/img/IVIFqkmfTw-320.webp 320w, https://justinpinkney.com/img/IVIFqkmfTw-500.webp 500w, https://justinpinkney.com/img/IVIFqkmfTw-800.webp 800w, https://justinpinkney.com/img/IVIFqkmfTw-1024.webp 1024w, https://justinpinkney.com/img/IVIFqkmfTw-1600.webp 1600w, https://justinpinkney.com/img/IVIFqkmfTw-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/IVIFqkmfTw-200.jpeg 200w, https://justinpinkney.com/img/IVIFqkmfTw-320.jpeg 320w, https://justinpinkney.com/img/IVIFqkmfTw-500.jpeg 500w, https://justinpinkney.com/img/IVIFqkmfTw-800.jpeg 800w, https://justinpinkney.com/img/IVIFqkmfTw-1024.jpeg 1024w, https://justinpinkney.com/img/IVIFqkmfTw-1600.jpeg 1600w, https://justinpinkney.com/img/IVIFqkmfTw-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/IVIFqkmfTw-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1250&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;gallery-item&quot;&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2023/flowers-deconstructed/flowers-deconstructed-08.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YB8Bp9ar29-200.webp 200w, https://justinpinkney.com/img/YB8Bp9ar29-320.webp 320w, https://justinpinkney.com/img/YB8Bp9ar29-500.webp 500w, https://justinpinkney.com/img/YB8Bp9ar29-800.webp 800w, https://justinpinkney.com/img/YB8Bp9ar29-1024.webp 1024w, https://justinpinkney.com/img/YB8Bp9ar29-1600.webp 1600w, https://justinpinkney.com/img/YB8Bp9ar29-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YB8Bp9ar29-200.jpeg 200w, https://justinpinkney.com/img/YB8Bp9ar29-320.jpeg 320w, https://justinpinkney.com/img/YB8Bp9ar29-500.jpeg 500w, https://justinpinkney.com/img/YB8Bp9ar29-800.jpeg 800w, https://justinpinkney.com/img/YB8Bp9ar29-1024.jpeg 1024w, https://justinpinkney.com/img/YB8Bp9ar29-1600.jpeg 1600w, https://justinpinkney.com/img/YB8Bp9ar29-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YB8Bp9ar29-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1292&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;/p&gt;
&lt;!-- &lt;/div&gt; --&gt;</content>
	</entry>
	
	<entry>
		<title>Experiments in Image Variation</title>
		<link href="https://justinpinkney.com/blog/2022/image-variation-experiments/"/>
		<updated>2022-12-17T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2022/image-variation-experiments/</id>
		<content type="html">&lt;p&gt;I&#39;ve been doing a bunch of quick experiments for my improved CLIP Image conditioned version of Stable Diffusion v1. We can actually do lots of fun stuff with this model including transforming this Ghibli house into a lighthouse!&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; autoplay=&quot;true&quot; src=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/lighthouse_bf.mp4&quot; loop=&quot;true&quot; style=&quot;max-width:512px&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;First off you can find the actual &lt;a href=&quot;https://huggingface.co/lambdalabs/sd-image-variations-diffusers&quot;&gt;model on huggingface hub&lt;/a&gt;. Throughout the post we&#39;ll be using this crop from a Ghibli frame as our test image.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_6_0.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Er07EbsBsn-200.webp 200w, https://justinpinkney.com/img/Er07EbsBsn-320.webp 320w, https://justinpinkney.com/img/Er07EbsBsn-500.webp 500w, https://justinpinkney.com/img/Er07EbsBsn-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Er07EbsBsn-200.jpeg 200w, https://justinpinkney.com/img/Er07EbsBsn-320.jpeg 320w, https://justinpinkney.com/img/Er07EbsBsn-500.jpeg 500w, https://justinpinkney.com/img/Er07EbsBsn-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Er07EbsBsn-200.jpeg&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is what the standard variations model produces when you feed this image in:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/montage.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/McbSmGDpOL-200.webp 200w, https://justinpinkney.com/img/McbSmGDpOL-320.webp 320w, https://justinpinkney.com/img/McbSmGDpOL-500.webp 500w, https://justinpinkney.com/img/McbSmGDpOL-800.webp 800w, https://justinpinkney.com/img/McbSmGDpOL-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/McbSmGDpOL-200.jpeg 200w, https://justinpinkney.com/img/McbSmGDpOL-320.jpeg 320w, https://justinpinkney.com/img/McbSmGDpOL-500.jpeg 500w, https://justinpinkney.com/img/McbSmGDpOL-800.jpeg 800w, https://justinpinkney.com/img/McbSmGDpOL-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/McbSmGDpOL-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That was classifier free guidance scale 4, we can play with different levels of guidance: (0, 1, 2, 4, 8)&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_10_10.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/gqJ1aPcFzy-200.webp 200w, https://justinpinkney.com/img/gqJ1aPcFzy-320.webp 320w, https://justinpinkney.com/img/gqJ1aPcFzy-500.webp 500w, https://justinpinkney.com/img/gqJ1aPcFzy-800.webp 800w, https://justinpinkney.com/img/gqJ1aPcFzy-1024.webp 1024w, https://justinpinkney.com/img/gqJ1aPcFzy-1600.webp 1600w, https://justinpinkney.com/img/gqJ1aPcFzy-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/gqJ1aPcFzy-200.jpeg 200w, https://justinpinkney.com/img/gqJ1aPcFzy-320.jpeg 320w, https://justinpinkney.com/img/gqJ1aPcFzy-500.jpeg 500w, https://justinpinkney.com/img/gqJ1aPcFzy-800.jpeg 800w, https://justinpinkney.com/img/gqJ1aPcFzy-1024.jpeg 1024w, https://justinpinkney.com/img/gqJ1aPcFzy-1600.jpeg 1600w, https://justinpinkney.com/img/gqJ1aPcFzy-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/gqJ1aPcFzy-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The image is turned into a CLIP image embedding, this is the condition vector. One thing I noticed is that because I set the unconditional embedding to zeros, scaling a condition vector has the effect of increasing its &amp;quot;strength&amp;quot;, this effect is different to the usual classifier free guidance. We can play with the length of this vector to control the &amp;quot;strength&amp;quot; of the conditioning by  multiplying the vector by 0.25, 0.5, 1, 1.5, 2:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_12_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/5z5F7cuj1M-200.webp 200w, https://justinpinkney.com/img/5z5F7cuj1M-320.webp 320w, https://justinpinkney.com/img/5z5F7cuj1M-500.webp 500w, https://justinpinkney.com/img/5z5F7cuj1M-800.webp 800w, https://justinpinkney.com/img/5z5F7cuj1M-1024.webp 1024w, https://justinpinkney.com/img/5z5F7cuj1M-1600.webp 1600w, https://justinpinkney.com/img/5z5F7cuj1M-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/5z5F7cuj1M-200.jpeg 200w, https://justinpinkney.com/img/5z5F7cuj1M-320.jpeg 320w, https://justinpinkney.com/img/5z5F7cuj1M-500.jpeg 500w, https://justinpinkney.com/img/5z5F7cuj1M-800.jpeg 800w, https://justinpinkney.com/img/5z5F7cuj1M-1024.jpeg 1024w, https://justinpinkney.com/img/5z5F7cuj1M-1600.jpeg 1600w, https://justinpinkney.com/img/5z5F7cuj1M-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/5z5F7cuj1M-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can also mix different embeddings by averaging them together. Let&#39;s mix between the embedding for our Ghibli image and a matisse painting. 0%, 25%, 50%, 75%, and 100% matisse:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_14_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/2LVjSs_VSE-200.webp 200w, https://justinpinkney.com/img/2LVjSs_VSE-320.webp 320w, https://justinpinkney.com/img/2LVjSs_VSE-500.webp 500w, https://justinpinkney.com/img/2LVjSs_VSE-800.webp 800w, https://justinpinkney.com/img/2LVjSs_VSE-1024.webp 1024w, https://justinpinkney.com/img/2LVjSs_VSE-1600.webp 1600w, https://justinpinkney.com/img/2LVjSs_VSE-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/2LVjSs_VSE-200.jpeg 200w, https://justinpinkney.com/img/2LVjSs_VSE-320.jpeg 320w, https://justinpinkney.com/img/2LVjSs_VSE-500.jpeg 500w, https://justinpinkney.com/img/2LVjSs_VSE-800.jpeg 800w, https://justinpinkney.com/img/2LVjSs_VSE-1024.jpeg 1024w, https://justinpinkney.com/img/2LVjSs_VSE-1600.jpeg 1600w, https://justinpinkney.com/img/2LVjSs_VSE-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/2LVjSs_VSE-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Another way to mix is to concatenate the embeddings, yes we can have more than one if we want to. Here we do the following&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ghibli, ghibli&lt;/li&gt;
&lt;li&gt;ghibli, uncond&lt;/li&gt;
&lt;li&gt;ghibli, matisse&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_16_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YFOkREeRpZ-200.webp 200w, https://justinpinkney.com/img/YFOkREeRpZ-320.webp 320w, https://justinpinkney.com/img/YFOkREeRpZ-500.webp 500w, https://justinpinkney.com/img/YFOkREeRpZ-800.webp 800w, https://justinpinkney.com/img/YFOkREeRpZ-1024.webp 1024w, https://justinpinkney.com/img/YFOkREeRpZ-1536.webp 1536w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YFOkREeRpZ-200.jpeg 200w, https://justinpinkney.com/img/YFOkREeRpZ-320.jpeg 320w, https://justinpinkney.com/img/YFOkREeRpZ-500.jpeg 500w, https://justinpinkney.com/img/YFOkREeRpZ-800.jpeg 800w, https://justinpinkney.com/img/YFOkREeRpZ-1024.jpeg 1024w, https://justinpinkney.com/img/YFOkREeRpZ-1536.jpeg 1536w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YFOkREeRpZ-200.jpeg&quot; width=&quot;1536&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can also use text embeddings to mix with our image embedding, we can mix with the word &amp;quot;flowers&amp;quot; to add flowers. (The variations model actually works fine with text embeddings btw). At the far right it&#39;s only using the text &amp;quot;flowers&amp;quot; (we re-invented a text to image model!). 0%, 25%, 50%, 75%, and 100% &amp;quot;flowers&amp;quot;:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_18_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/xZ-AOEK-l4-200.webp 200w, https://justinpinkney.com/img/xZ-AOEK-l4-320.webp 320w, https://justinpinkney.com/img/xZ-AOEK-l4-500.webp 500w, https://justinpinkney.com/img/xZ-AOEK-l4-800.webp 800w, https://justinpinkney.com/img/xZ-AOEK-l4-1024.webp 1024w, https://justinpinkney.com/img/xZ-AOEK-l4-1600.webp 1600w, https://justinpinkney.com/img/xZ-AOEK-l4-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/xZ-AOEK-l4-200.jpeg 200w, https://justinpinkney.com/img/xZ-AOEK-l4-320.jpeg 320w, https://justinpinkney.com/img/xZ-AOEK-l4-500.jpeg 500w, https://justinpinkney.com/img/xZ-AOEK-l4-800.jpeg 800w, https://justinpinkney.com/img/xZ-AOEK-l4-1024.jpeg 1024w, https://justinpinkney.com/img/xZ-AOEK-l4-1600.jpeg 1600w, https://justinpinkney.com/img/xZ-AOEK-l4-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/xZ-AOEK-l4-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can do the same trick of concatenating the embeddings instead of averaging, it&#39;s a bit funny though&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ghibli, flowers&lt;/li&gt;
&lt;li&gt;ghibli, flowers, flowers&lt;/li&gt;
&lt;li&gt;ghibli, ghibli, flowers&lt;/li&gt;
&lt;li&gt;ghibli, ghibli, flowers, uncond, uncond&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_20_8.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/5l4OPfAAX0-200.webp 200w, https://justinpinkney.com/img/5l4OPfAAX0-320.webp 320w, https://justinpinkney.com/img/5l4OPfAAX0-500.webp 500w, https://justinpinkney.com/img/5l4OPfAAX0-800.webp 800w, https://justinpinkney.com/img/5l4OPfAAX0-1024.webp 1024w, https://justinpinkney.com/img/5l4OPfAAX0-1600.webp 1600w, https://justinpinkney.com/img/5l4OPfAAX0-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/5l4OPfAAX0-200.jpeg 200w, https://justinpinkney.com/img/5l4OPfAAX0-320.jpeg 320w, https://justinpinkney.com/img/5l4OPfAAX0-500.jpeg 500w, https://justinpinkney.com/img/5l4OPfAAX0-800.jpeg 800w, https://justinpinkney.com/img/5l4OPfAAX0-1024.jpeg 1024w, https://justinpinkney.com/img/5l4OPfAAX0-1600.jpeg 1600w, https://justinpinkney.com/img/5l4OPfAAX0-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/5l4OPfAAX0-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;More natural is to maybe use a direction in text embedding space to edit our ghibli conditioning. We can use the direction &amp;quot;still water&amp;quot; -&amp;gt; &amp;quot;wild flowers&amp;quot; and add this at different scales to your original conditioning. Here it&#39;s adding 0%, 10%, 25%, 50% and 100% of the edit vector &amp;quot;still water&amp;quot; -&amp;gt; &amp;quot;wild flowers&amp;quot;:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_22_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/HR2XpUWtPs-200.webp 200w, https://justinpinkney.com/img/HR2XpUWtPs-320.webp 320w, https://justinpinkney.com/img/HR2XpUWtPs-500.webp 500w, https://justinpinkney.com/img/HR2XpUWtPs-800.webp 800w, https://justinpinkney.com/img/HR2XpUWtPs-1024.webp 1024w, https://justinpinkney.com/img/HR2XpUWtPs-1600.webp 1600w, https://justinpinkney.com/img/HR2XpUWtPs-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/HR2XpUWtPs-200.jpeg 200w, https://justinpinkney.com/img/HR2XpUWtPs-320.jpeg 320w, https://justinpinkney.com/img/HR2XpUWtPs-500.jpeg 500w, https://justinpinkney.com/img/HR2XpUWtPs-800.jpeg 800w, https://justinpinkney.com/img/HR2XpUWtPs-1024.jpeg 1024w, https://justinpinkney.com/img/HR2XpUWtPs-1600.jpeg 1600w, https://justinpinkney.com/img/HR2XpUWtPs-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/HR2XpUWtPs-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Changing tack we can experiment with adding noise to our condition vector
with scales = 0, 0.1, 0.2, 0.5&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_24_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/4JLpW0qd58-200.webp 200w, https://justinpinkney.com/img/4JLpW0qd58-320.webp 320w, https://justinpinkney.com/img/4JLpW0qd58-500.webp 500w, https://justinpinkney.com/img/4JLpW0qd58-800.webp 800w, https://justinpinkney.com/img/4JLpW0qd58-1024.webp 1024w, https://justinpinkney.com/img/4JLpW0qd58-1600.webp 1600w, https://justinpinkney.com/img/4JLpW0qd58-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/4JLpW0qd58-200.jpeg 200w, https://justinpinkney.com/img/4JLpW0qd58-320.jpeg 320w, https://justinpinkney.com/img/4JLpW0qd58-500.jpeg 500w, https://justinpinkney.com/img/4JLpW0qd58-800.jpeg 800w, https://justinpinkney.com/img/4JLpW0qd58-1024.jpeg 1024w, https://justinpinkney.com/img/4JLpW0qd58-1600.jpeg 1600w, https://justinpinkney.com/img/4JLpW0qd58-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/4JLpW0qd58-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Or use multiplicative noise instead, scales = 0, 0.2, 0.5, 0.75&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_26_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/2DjHUSInD4-200.webp 200w, https://justinpinkney.com/img/2DjHUSInD4-320.webp 320w, https://justinpinkney.com/img/2DjHUSInD4-500.webp 500w, https://justinpinkney.com/img/2DjHUSInD4-800.webp 800w, https://justinpinkney.com/img/2DjHUSInD4-1024.webp 1024w, https://justinpinkney.com/img/2DjHUSInD4-1600.webp 1600w, https://justinpinkney.com/img/2DjHUSInD4-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/2DjHUSInD4-200.jpeg 200w, https://justinpinkney.com/img/2DjHUSInD4-320.jpeg 320w, https://justinpinkney.com/img/2DjHUSInD4-500.jpeg 500w, https://justinpinkney.com/img/2DjHUSInD4-800.jpeg 800w, https://justinpinkney.com/img/2DjHUSInD4-1024.jpeg 1024w, https://justinpinkney.com/img/2DjHUSInD4-1600.jpeg 1600w, https://justinpinkney.com/img/2DjHUSInD4-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/2DjHUSInD4-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If we take random crops when computing the embeddings we get zoomed in images mostly, a bit boring.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_28_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/rBiz7-lEWW-200.webp 200w, https://justinpinkney.com/img/rBiz7-lEWW-320.webp 320w, https://justinpinkney.com/img/rBiz7-lEWW-500.webp 500w, https://justinpinkney.com/img/rBiz7-lEWW-800.webp 800w, https://justinpinkney.com/img/rBiz7-lEWW-1024.webp 1024w, https://justinpinkney.com/img/rBiz7-lEWW-1600.webp 1600w, https://justinpinkney.com/img/rBiz7-lEWW-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/rBiz7-lEWW-200.jpeg 200w, https://justinpinkney.com/img/rBiz7-lEWW-320.jpeg 320w, https://justinpinkney.com/img/rBiz7-lEWW-500.jpeg 500w, https://justinpinkney.com/img/rBiz7-lEWW-800.jpeg 800w, https://justinpinkney.com/img/rBiz7-lEWW-1024.jpeg 1024w, https://justinpinkney.com/img/rBiz7-lEWW-1600.jpeg 1600w, https://justinpinkney.com/img/rBiz7-lEWW-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/rBiz7-lEWW-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;image-inversion&quot; tabindex=&quot;-1&quot;&gt;Image inversion &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now we play with DDIM inversion, followed by editing, as shown in the DALLE-2 paper. We invert our image using standard DDIM inversion as &lt;a href=&quot;https://github.com/pesser/stable-diffusion/blob/main/ldm/models/diffusion/ddim.py#L231&quot;&gt;in the original stable diffusion repo&lt;/a&gt;., we need to use lots of timesteps.
Then check we can decode it to pretty much the same image (if we use cfg it&#39;s more saturated). To get good inversion results we use: ddim_steps = 500, start_step=1, cfg scale = 3 (careful with cfg scale! not too high)&lt;/p&gt;
&lt;p&gt;For decoding you don&#39;t need to use so many timesteps, so it wont take so long, decode with 50 timesteps and you get the original picture back&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_33_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YOBg9JfLdN-200.webp 200w, https://justinpinkney.com/img/YOBg9JfLdN-320.webp 320w, https://justinpinkney.com/img/YOBg9JfLdN-500.webp 500w, https://justinpinkney.com/img/YOBg9JfLdN-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YOBg9JfLdN-200.jpeg 200w, https://justinpinkney.com/img/YOBg9JfLdN-320.jpeg 320w, https://justinpinkney.com/img/YOBg9JfLdN-500.jpeg 500w, https://justinpinkney.com/img/YOBg9JfLdN-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YOBg9JfLdN-200.jpeg&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now we use our same text diff as before to replace the water with flowers! 10%, 20% 50% 100% and 150% edit:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_35_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9NvTbtdi6h-200.webp 200w, https://justinpinkney.com/img/9NvTbtdi6h-320.webp 320w, https://justinpinkney.com/img/9NvTbtdi6h-500.webp 500w, https://justinpinkney.com/img/9NvTbtdi6h-800.webp 800w, https://justinpinkney.com/img/9NvTbtdi6h-1024.webp 1024w, https://justinpinkney.com/img/9NvTbtdi6h-1600.webp 1600w, https://justinpinkney.com/img/9NvTbtdi6h-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/9NvTbtdi6h-200.jpeg 200w, https://justinpinkney.com/img/9NvTbtdi6h-320.jpeg 320w, https://justinpinkney.com/img/9NvTbtdi6h-500.jpeg 500w, https://justinpinkney.com/img/9NvTbtdi6h-800.jpeg 800w, https://justinpinkney.com/img/9NvTbtdi6h-1024.jpeg 1024w, https://justinpinkney.com/img/9NvTbtdi6h-1600.jpeg 1600w, https://justinpinkney.com/img/9NvTbtdi6h-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9NvTbtdi6h-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Or how about turn the house into a lighthouse? 10%, 20% 50% 100% and 150% edit:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_37_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/0Pr-N5KkH1-200.webp 200w, https://justinpinkney.com/img/0Pr-N5KkH1-320.webp 320w, https://justinpinkney.com/img/0Pr-N5KkH1-500.webp 500w, https://justinpinkney.com/img/0Pr-N5KkH1-800.webp 800w, https://justinpinkney.com/img/0Pr-N5KkH1-1024.webp 1024w, https://justinpinkney.com/img/0Pr-N5KkH1-1600.webp 1600w, https://justinpinkney.com/img/0Pr-N5KkH1-2560.webp 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/0Pr-N5KkH1-200.jpeg 200w, https://justinpinkney.com/img/0Pr-N5KkH1-320.jpeg 320w, https://justinpinkney.com/img/0Pr-N5KkH1-500.jpeg 500w, https://justinpinkney.com/img/0Pr-N5KkH1-800.jpeg 800w, https://justinpinkney.com/img/0Pr-N5KkH1-1024.jpeg 1024w, https://justinpinkney.com/img/0Pr-N5KkH1-1600.jpeg 1600w, https://justinpinkney.com/img/0Pr-N5KkH1-2560.jpeg 2560w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/0Pr-N5KkH1-200.jpeg&quot; width=&quot;2560&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can also use ddim_eta to make variations of our original, but ones that closely match in composition. Here each row is for ddim_eta 0.3, 0.6, and finally 1:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_39_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/1bj5hFx9Ti-200.webp 200w, https://justinpinkney.com/img/1bj5hFx9Ti-320.webp 320w, https://justinpinkney.com/img/1bj5hFx9Ti-500.webp 500w, https://justinpinkney.com/img/1bj5hFx9Ti-800.webp 800w, https://justinpinkney.com/img/1bj5hFx9Ti-1024.webp 1024w, https://justinpinkney.com/img/1bj5hFx9Ti-1600.webp 1600w, https://justinpinkney.com/img/1bj5hFx9Ti-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/1bj5hFx9Ti-200.jpeg 200w, https://justinpinkney.com/img/1bj5hFx9Ti-320.jpeg 320w, https://justinpinkney.com/img/1bj5hFx9Ti-500.jpeg 500w, https://justinpinkney.com/img/1bj5hFx9Ti-800.jpeg 800w, https://justinpinkney.com/img/1bj5hFx9Ti-1024.jpeg 1024w, https://justinpinkney.com/img/1bj5hFx9Ti-1600.jpeg 1600w, https://justinpinkney.com/img/1bj5hFx9Ti-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/1bj5hFx9Ti-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_39_5.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/XbhaQVasXX-200.webp 200w, https://justinpinkney.com/img/XbhaQVasXX-320.webp 320w, https://justinpinkney.com/img/XbhaQVasXX-500.webp 500w, https://justinpinkney.com/img/XbhaQVasXX-800.webp 800w, https://justinpinkney.com/img/XbhaQVasXX-1024.webp 1024w, https://justinpinkney.com/img/XbhaQVasXX-1600.webp 1600w, https://justinpinkney.com/img/XbhaQVasXX-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/XbhaQVasXX-200.jpeg 200w, https://justinpinkney.com/img/XbhaQVasXX-320.jpeg 320w, https://justinpinkney.com/img/XbhaQVasXX-500.jpeg 500w, https://justinpinkney.com/img/XbhaQVasXX-800.jpeg 800w, https://justinpinkney.com/img/XbhaQVasXX-1024.jpeg 1024w, https://justinpinkney.com/img/XbhaQVasXX-1600.jpeg 1600w, https://justinpinkney.com/img/XbhaQVasXX-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/XbhaQVasXX-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_39_8.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/cF0YYN6ZqJ-200.webp 200w, https://justinpinkney.com/img/cF0YYN6ZqJ-320.webp 320w, https://justinpinkney.com/img/cF0YYN6ZqJ-500.webp 500w, https://justinpinkney.com/img/cF0YYN6ZqJ-800.webp 800w, https://justinpinkney.com/img/cF0YYN6ZqJ-1024.webp 1024w, https://justinpinkney.com/img/cF0YYN6ZqJ-1600.webp 1600w, https://justinpinkney.com/img/cF0YYN6ZqJ-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/cF0YYN6ZqJ-200.jpeg 200w, https://justinpinkney.com/img/cF0YYN6ZqJ-320.jpeg 320w, https://justinpinkney.com/img/cF0YYN6ZqJ-500.jpeg 500w, https://justinpinkney.com/img/cF0YYN6ZqJ-800.jpeg 800w, https://justinpinkney.com/img/cF0YYN6ZqJ-1024.jpeg 1024w, https://justinpinkney.com/img/cF0YYN6ZqJ-1600.jpeg 1600w, https://justinpinkney.com/img/cF0YYN6ZqJ-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/cF0YYN6ZqJ-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can make more of these steadily ramping up the value of eta and make a variations video:&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; autoplay=&quot;true&quot; src=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/ghibli-eta.mp4&quot; loop=&quot;true&quot; style=&quot;max-width:512px&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Last of all we can combine subtle variation (ddim_eta=0.4) with text diff editing:  &amp;quot;ghibli matte painting&amp;quot; -&amp;gt; &amp;quot;dslr leica photo&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/image-variation-experiments/output_41_2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/NYS9l68qOq-200.webp 200w, https://justinpinkney.com/img/NYS9l68qOq-320.webp 320w, https://justinpinkney.com/img/NYS9l68qOq-500.webp 500w, https://justinpinkney.com/img/NYS9l68qOq-800.webp 800w, https://justinpinkney.com/img/NYS9l68qOq-1024.webp 1024w, https://justinpinkney.com/img/NYS9l68qOq-1600.webp 1600w, https://justinpinkney.com/img/NYS9l68qOq-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/NYS9l68qOq-200.jpeg 200w, https://justinpinkney.com/img/NYS9l68qOq-320.jpeg 320w, https://justinpinkney.com/img/NYS9l68qOq-500.jpeg 500w, https://justinpinkney.com/img/NYS9l68qOq-800.jpeg 800w, https://justinpinkney.com/img/NYS9l68qOq-1024.jpeg 1024w, https://justinpinkney.com/img/NYS9l68qOq-1600.jpeg 1600w, https://justinpinkney.com/img/NYS9l68qOq-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/NYS9l68qOq-200.jpeg&quot; width=&quot;2048&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Latent editing with image variations</title>
		<link href="https://justinpinkney.com/blog/2022/clip-latent-space/"/>
		<updated>2022-10-31T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2022/clip-latent-space/</id>
		<content type="html">&lt;p&gt;Playing with the variations model a bit more. You can take the CLIP image embedding of an image then do latent editing with a direction in CLIP text space.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;original&lt;/li&gt;
&lt;li&gt;variation&lt;/li&gt;
&lt;li&gt;&amp;quot;colour oil painting&amp;quot; - &amp;quot;bw photo&amp;quot;&lt;/li&gt;
&lt;li&gt;&amp;quot;modern dslr photo&amp;quot; - &amp;quot;old bw photo&amp;quot;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;flex-container&quot;&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ETpAT7M5Hz-200.webp 200w, https://justinpinkney.com/img/ETpAT7M5Hz-320.webp 320w, https://justinpinkney.com/img/ETpAT7M5Hz-500.webp 500w, https://justinpinkney.com/img/ETpAT7M5Hz-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/ETpAT7M5Hz-200.png 200w, https://justinpinkney.com/img/ETpAT7M5Hz-320.png 320w, https://justinpinkney.com/img/ETpAT7M5Hz-500.png 500w, https://justinpinkney.com/img/ETpAT7M5Hz-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ETpAT7M5Hz-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/I_elzE6iTh-200.webp 200w, https://justinpinkney.com/img/I_elzE6iTh-320.webp 320w, https://justinpinkney.com/img/I_elzE6iTh-500.webp 500w, https://justinpinkney.com/img/I_elzE6iTh-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/I_elzE6iTh-200.png 200w, https://justinpinkney.com/img/I_elzE6iTh-320.png 320w, https://justinpinkney.com/img/I_elzE6iTh-500.png 500w, https://justinpinkney.com/img/I_elzE6iTh-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/I_elzE6iTh-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/232J3M4SWI-200.webp 200w, https://justinpinkney.com/img/232J3M4SWI-320.webp 320w, https://justinpinkney.com/img/232J3M4SWI-500.webp 500w, https://justinpinkney.com/img/232J3M4SWI-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/232J3M4SWI-200.png 200w, https://justinpinkney.com/img/232J3M4SWI-320.png 320w, https://justinpinkney.com/img/232J3M4SWI-500.png 500w, https://justinpinkney.com/img/232J3M4SWI-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/232J3M4SWI-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/7JPkS_6JpI-200.webp 200w, https://justinpinkney.com/img/7JPkS_6JpI-320.webp 320w, https://justinpinkney.com/img/7JPkS_6JpI-500.webp 500w, https://justinpinkney.com/img/7JPkS_6JpI-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/7JPkS_6JpI-200.png 200w, https://justinpinkney.com/img/7JPkS_6JpI-320.png 320w, https://justinpinkney.com/img/7JPkS_6JpI-500.png 500w, https://justinpinkney.com/img/7JPkS_6JpI-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/7JPkS_6JpI-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;original&lt;/li&gt;
&lt;li&gt;&amp;quot;old woman&amp;quot; -&amp;quot;young woman&amp;quot;&lt;/li&gt;
&lt;li&gt;&amp;quot;modern dslr photo&amp;quot; - &amp;quot;oil painting&amp;quot;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;flex-container&quot;&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/CRU78vfE4n-200.webp 200w, https://justinpinkney.com/img/CRU78vfE4n-320.webp 320w, https://justinpinkney.com/img/CRU78vfE4n-500.webp 500w, https://justinpinkney.com/img/CRU78vfE4n-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/CRU78vfE4n-200.png 200w, https://justinpinkney.com/img/CRU78vfE4n-320.png 320w, https://justinpinkney.com/img/CRU78vfE4n-500.png 500w, https://justinpinkney.com/img/CRU78vfE4n-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/CRU78vfE4n-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9zFusd_lSW-200.webp 200w, https://justinpinkney.com/img/9zFusd_lSW-320.webp 320w, https://justinpinkney.com/img/9zFusd_lSW-500.webp 500w, https://justinpinkney.com/img/9zFusd_lSW-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/9zFusd_lSW-200.png 200w, https://justinpinkney.com/img/9zFusd_lSW-320.png 320w, https://justinpinkney.com/img/9zFusd_lSW-500.png 500w, https://justinpinkney.com/img/9zFusd_lSW-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9zFusd_lSW-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/g6o0tT_OlH-200.webp 200w, https://justinpinkney.com/img/g6o0tT_OlH-320.webp 320w, https://justinpinkney.com/img/g6o0tT_OlH-500.webp 500w, https://justinpinkney.com/img/g6o0tT_OlH-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/g6o0tT_OlH-200.png 200w, https://justinpinkney.com/img/g6o0tT_OlH-320.png 320w, https://justinpinkney.com/img/g6o0tT_OlH-500.png 500w, https://justinpinkney.com/img/g6o0tT_OlH-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/g6o0tT_OlH-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;original&lt;/li&gt;
&lt;li&gt;&amp;quot;detailed painting&amp;quot; - &amp;quot;child&#39;s drawing&amp;quot;&lt;/li&gt;
&lt;li&gt;&amp;quot;archviz render&amp;quot; - &amp;quot;child&#39;s drawing&amp;quot;&lt;/li&gt;
&lt;li&gt;&amp;quot;castle&amp;quot; - &amp;quot;house&amp;quot;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;flex-container&quot;&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/GnX3q82US1-200.webp 200w, https://justinpinkney.com/img/GnX3q82US1-320.webp 320w, https://justinpinkney.com/img/GnX3q82US1-500.webp 500w, https://justinpinkney.com/img/GnX3q82US1-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/GnX3q82US1-200.png 200w, https://justinpinkney.com/img/GnX3q82US1-320.png 320w, https://justinpinkney.com/img/GnX3q82US1-500.png 500w, https://justinpinkney.com/img/GnX3q82US1-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/GnX3q82US1-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/pQshxVR9KZ-200.webp 200w, https://justinpinkney.com/img/pQshxVR9KZ-320.webp 320w, https://justinpinkney.com/img/pQshxVR9KZ-500.webp 500w, https://justinpinkney.com/img/pQshxVR9KZ-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/pQshxVR9KZ-200.png 200w, https://justinpinkney.com/img/pQshxVR9KZ-320.png 320w, https://justinpinkney.com/img/pQshxVR9KZ-500.png 500w, https://justinpinkney.com/img/pQshxVR9KZ-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/pQshxVR9KZ-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Aok66fRi3I-200.webp 200w, https://justinpinkney.com/img/Aok66fRi3I-320.webp 320w, https://justinpinkney.com/img/Aok66fRi3I-500.webp 500w, https://justinpinkney.com/img/Aok66fRi3I-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/Aok66fRi3I-200.png 200w, https://justinpinkney.com/img/Aok66fRi3I-320.png 320w, https://justinpinkney.com/img/Aok66fRi3I-500.png 500w, https://justinpinkney.com/img/Aok66fRi3I-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Aok66fRi3I-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/cOi1xa6Xtj-200.webp 200w, https://justinpinkney.com/img/cOi1xa6Xtj-320.webp 320w, https://justinpinkney.com/img/cOi1xa6Xtj-500.webp 500w, https://justinpinkney.com/img/cOi1xa6Xtj-512.webp 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/cOi1xa6Xtj-200.png 200w, https://justinpinkney.com/img/cOi1xa6Xtj-320.png 320w, https://justinpinkney.com/img/cOi1xa6Xtj-500.png 500w, https://justinpinkney.com/img/cOi1xa6Xtj-512.png 512w&quot; sizes=&quot;(max-width:640px) 25vw, 320px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/cOi1xa6Xtj-200.png&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;converted from a twitter thread:
&lt;a href=&quot;https://twitter.com/Buntworthy/status/1587181111003815936&quot;&gt;Mon Oct 31 20:33:59 2022&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Text to Pokemon Generator</title>
		<link href="https://justinpinkney.com/blog/2022/pokemon-generator/"/>
		<updated>2022-09-08T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2022/pokemon-generator/</id>
		<content type="html">&lt;p&gt;If you want to just try out the model look here:
&lt;a href=&quot;https://replicate.com/lambdal/text-to-pokemon&quot;&gt;&lt;img src=&quot;https://img.shields.io/badge/%F0%9F%9A%80-Open%20in%20Replicate-%23fff891&quot; alt=&quot;Open in Replicate&quot;&gt;&lt;/a&gt;
&lt;a href=&quot;https://colab.research.google.com/github/LambdaLabsML/lambda-diffusers/blob/main/notebooks/pokemon_demo.ipynb&quot;&gt;&lt;img src=&quot;https://colab.research.google.com/assets/colab-badge.svg&quot; alt=&quot;Open In Colab&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;All of this work was done as part of my role at &lt;a href=&quot;https://lambdalabs.com/&quot;&gt;Lambda Labs&lt;/a&gt; and all the real details of how the model was made, and how you can make one yourself are in this post on the &lt;a href=&quot;https://github.com/LambdaLabsML/examples/tree/main/stable-diffusion-finetuning&quot;&gt;Lambda Examples repo&lt;/a&gt;. This blog post just has a few extra details and notes on my experience&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/pokemontage.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Dtocos28k0-200.webp 200w, https://justinpinkney.com/img/Dtocos28k0-320.webp 320w, https://justinpinkney.com/img/Dtocos28k0-500.webp 500w, https://justinpinkney.com/img/Dtocos28k0-768.webp 768w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Dtocos28k0-200.jpeg 200w, https://justinpinkney.com/img/Dtocos28k0-320.jpeg 320w, https://justinpinkney.com/img/Dtocos28k0-500.jpeg 500w, https://justinpinkney.com/img/Dtocos28k0-768.jpeg 768w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Dtocos28k0-200.jpeg&quot; width=&quot;768&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Girl with a pearl earring, Cute Obama creature, Donald Trump, Boris Johnson, Totoro, Hello Kitty&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There&#39;s been much excitement about Stable Diffusion&#39;s recent release as you might expect. Sure it&#39;s great at making some interesting images but there are a lot of possibilities than just putting in words and getting out pictures. To me it feels like a similar moment to shortly after StyleGAN was released: A powerful, high quality model, publicly accessible, and simple enough to for people to play with, adapt and fine tune for their own purposes.&lt;/p&gt;
&lt;p&gt;I&#39;ve been doing some experiments with the model since it came out. One was changing the input conditioning to &lt;a href=&quot;https://twitter.com/Buntworthy/status/1566744186153484288&quot;&gt;enable &amp;quot;image variations&amp;quot; with Stable Diffusion&lt;/a&gt;, but this post is just a little summary of some more straightforward initial experiments in simple fine-tuning of the model to turn everyone into a Pokemon!&lt;/p&gt;
&lt;h2 id=&quot;catastrophic-forgetting&quot; tabindex=&quot;-1&quot;&gt;Catastrophic forgetting? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the amazing things about this model is that it manages to remember some of the &amp;quot;general knowledge&amp;quot; of the original stable diffusion. It doesn&#39;t seem obvious that it should do so after training for a few thousand steps on such a limited dataset, but there&#39;s a trick involved here! When fine tuning on the Pokemon the model actually starts to overfit quite quickly, and if you sample from it in a naive way it just produces Pokemon-ish gibberish for novel prompts (it has catastrophically forgotten the original data it was trained on), but like many modern networks Stable Diffusion keeps an exponential moving average (EMA) version of the model during training, which is usually used for inference as it gives better quality. So if we use the EMA weights we&#39;re actually using an average of the original model and the fine-tuned one. This turns out to be essential in order to turn all those famous people into Pokemon. You can even fine tune this effect by directly averaging the new model with the weights of the original to control the amount of Pokemonification.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Buntworthy/status/1567804278949007360&lt;/p&gt;
&lt;p&gt;Turns out that this is basically the same mechanism that enabled &lt;a href=&quot;https://justinpinkney.com/blog/2022/toonify-yourself&quot;&gt;toonify yourself&lt;/a&gt;, fine tuning and averaging model means you can end up with an effective mix of the original content with the style you fine-tuned on!&lt;/p&gt;
&lt;h2 id=&quot;more-experiments&quot; tabindex=&quot;-1&quot;&gt;More experiments &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It does seems a bit excessive to fine tune the whole unet as I did in the example above. There are probably lots of different strategies you could adopt trying to freeze different parts of the model, or even try something like &lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;layer swapping&lt;/a&gt;. One little experiment I tried along these lines to to fine tune only the Attention layers in the the unet. This version helps to preserve the original abilities of the model, but reduces the quality of the Pokemon produced. It&#39;s neat the model still remembers how to produce non-pokemon images, but this isn&#39;t quite what we want for this application, as it&#39;s good that the model ends up producing nothing but Pokemon!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/compare.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/VxoAeYWiz7-200.webp 200w, https://justinpinkney.com/img/VxoAeYWiz7-320.webp 320w, https://justinpinkney.com/img/VxoAeYWiz7-500.webp 500w, https://justinpinkney.com/img/VxoAeYWiz7-800.webp 800w, https://justinpinkney.com/img/VxoAeYWiz7-1024.webp 1024w, https://justinpinkney.com/img/VxoAeYWiz7-1064.webp 1064w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/VxoAeYWiz7-200.jpeg 200w, https://justinpinkney.com/img/VxoAeYWiz7-320.jpeg 320w, https://justinpinkney.com/img/VxoAeYWiz7-500.jpeg 500w, https://justinpinkney.com/img/VxoAeYWiz7-800.jpeg 800w, https://justinpinkney.com/img/VxoAeYWiz7-1024.jpeg 1024w, https://justinpinkney.com/img/VxoAeYWiz7-1064.jpeg 1064w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/VxoAeYWiz7-200.jpeg&quot; width=&quot;1064&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Left is the fully fine-tuned model. Right is attention layers only. The right model can clearly generate a more &amp;quot;normal&amp;quot; yoda, but is less good at making Pokemon.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;An obvious next step is to try and compare the above fine-tuning with a method like &lt;a href=&quot;https://dreambooth.github.io/&quot;&gt;DreamBooth&lt;/a&gt;. If anyone gets round to trying this before I do, please let me know how it goes!&lt;/p&gt;
&lt;h2 id=&quot;vivillon-is-the-rick-astley-of-pokemon&quot; tabindex=&quot;-1&quot;&gt;Vivillon is the Rick Astley of Pokemon &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Some people pointed out that the model has a tendency to randomly produce a certain Pokemon, Vivillon for seemingly random prompts. Even when it doesn&#39;t make Vivillon exactly it still has a strong preference to make circular radially patterned Pokemon when the prompt is a bit more abstract.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/JanelleCShane/status/1575855505922088960&lt;/p&gt;
&lt;p&gt;Someone pointed out that Vivillon actually has several variations, each of which is a different Pokemon, this means it appears a bunch of times in the original dataset. The model clearly ended up overfitting to Vivillon, I might make a version where I remove all but one Vivillon, but right now I quite enjoy that Vivillon occasionally appears when you least expect it like a Pokemon Rick Roll.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/MrCheeze_/status/1575857534874705920&lt;/p&gt;
&lt;h2 id=&quot;coverage&quot; tabindex=&quot;-1&quot;&gt;Coverage &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2022/pokemon-generator/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I tweeted about the model when I released it, but seems like it took a little while before it really took off. In particular this tweet seemed to really reach the right audience:&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/JDune5/status/1574143254366388232&lt;/p&gt;
&lt;p&gt;Since then it&#39;s been featured here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.washingtonpost.com/video-games/2022/09/29/pokemon-ai-generator-github/&quot;&gt;The Washington Post - Create surreal Pokémon lookalikes of Jeff Bezos, The Rock and more with AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.theverge.com/2022/9/26/23372457/pokemon-ai-generator-stable-diffusion-model&quot;&gt;The Verge - Turn anyone into a pokémon with this AI art model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.creativebloq.com/news/text-to-pokemon-ai-art-generator&quot;&gt;Creative Bloq - This spot-on AI Pokémon generator has me hooked&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.thegamer.com/pokemon-ai-generator/&quot;&gt;The Gamer - This AI Generates A Pokemon Based On Your Name&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://techcrunch.com/2022/09/28/make-your-very-own-ai-generated-pokemon-like-creature/&quot;&gt;TechCrunch - Make your very own AI-generated Pokémon-like creature&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nerdist.com/article/turn-anything-into-pokemon-with-this-ai-program/&quot;&gt;Nerdist - Turn anything into Pokémon with this new AI program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://screenrant.com/pokemon-ai-generator-custom-characters-scary/&quot;&gt;Screen Rant - AI Generator Turns You Into A Pokémon (But You&#39;re Not Going To Like It)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.yahoo.com/lifestyle/turn-anything-pok-mon-ai-210003616.html&quot;&gt;Yahoo - Turn Anything into Pokémon with This New AI Program&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and plenty of other places.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Style Space Face Editing</title>
		<link href="https://justinpinkney.com/blog/2021/face-edits/"/>
		<updated>2021-04-08T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2021/face-edits/</id>
		<content type="html">&lt;p&gt;&lt;img src=&quot;https://github.com/justinpinkney/pixel2style2pixel/raw/master/images/face-edit-runway.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://app.runwayml.com/models/justinpinkney/Style-space-face-editing&quot;&gt;&lt;img src=&quot;https://open-app.runwayml.com/gh-badge.svg&quot; alt=&quot;Opn in RunwayML Badge&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repository: https://github.com/justinpinkney/pixel2style2pixel/&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A little while back I ported a couple of recent machine learning models to Runway: &amp;quot;&lt;a href=&quot;https://arxiv.org/abs/2008.00951&quot;&gt;Encoding in Style&lt;/a&gt;&amp;quot; (aka Pixel2Style2Pixel or PSP)[^psp] and &amp;quot;&lt;a href=&quot;https://arxiv.org/abs/2011.12799&quot;&gt;Style Space Analysis&lt;/a&gt;&amp;quot;[^style-space]. They work well together, PSP for encoding an existing image into the latent space of a Generative model, and Style Space Analysis for editing that image in the latent space of StyleGAN. Here&#39;s a post on some of the technical details of how it works.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/goatee.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/SR_dbr2M7O-200.webp 200w, https://justinpinkney.com/img/SR_dbr2M7O-320.webp 320w, https://justinpinkney.com/img/SR_dbr2M7O-500.webp 500w, https://justinpinkney.com/img/SR_dbr2M7O-630.webp 630w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/SR_dbr2M7O-200.jpeg 200w, https://justinpinkney.com/img/SR_dbr2M7O-320.jpeg 320w, https://justinpinkney.com/img/SR_dbr2M7O-500.jpeg 500w, https://justinpinkney.com/img/SR_dbr2M7O-630.jpeg 630w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/SR_dbr2M7O-200.jpeg&quot; width=&quot;630&quot; height=&quot;1200&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;pixel2style2pixel&quot; tabindex=&quot;-1&quot;&gt;Pixel2Style2Pixel &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;PSP demonstrates a highly effective method for encoding real images in the latent space of a pre-trained generative model. The &lt;a href=&quot;https://github.com/eladrich/pixel2style2pixel&quot;&gt;repository&lt;/a&gt; provides a pre-trained encoder for finding the latent vector corresponding to an image your supply. There is also some cool follow up work called Encoder4Editing which might be even better for this application.&lt;/p&gt;
&lt;p&gt;They even use the good old &lt;a href=&quot;https://justinpinkney.com/blog/2021/toonify-yourself&quot;&gt;Toonify model&lt;/a&gt; in their research as part of demonstrating the flexibility and generality of their face encoding approach.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/psp.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/fVZyXopagz-200.webp 200w, https://justinpinkney.com/img/fVZyXopagz-320.webp 320w, https://justinpinkney.com/img/fVZyXopagz-500.webp 500w, https://justinpinkney.com/img/fVZyXopagz-800.webp 800w, https://justinpinkney.com/img/fVZyXopagz-840.webp 840w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/fVZyXopagz-200.jpeg 200w, https://justinpinkney.com/img/fVZyXopagz-320.jpeg 320w, https://justinpinkney.com/img/fVZyXopagz-500.jpeg 500w, https://justinpinkney.com/img/fVZyXopagz-800.jpeg 800w, https://justinpinkney.com/img/fVZyXopagz-840.jpeg 840w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/fVZyXopagz-200.jpeg&quot; width=&quot;840&quot; height=&quot;501&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;style-space-analysis&quot; tabindex=&quot;-1&quot;&gt;Style Space Analysis &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;StyleSpace Analysis is a paper with a nice idea which seems to not have gained much attention. It shows that as well as the typical Z and W latent spaces of StyleGAN, there is another latent space: StyleSpace, which provides extremely localisable edits to images. &lt;strike&gt;Unfortunately there is no code accompanying Style Space Analysis, but&lt;/strike&gt; There is now &lt;a href=&quot;https://github.com/betterze/StyleSpace&quot;&gt;code for StyleSpace analysis&lt;/a&gt;! The channels used to edit various attributes are provided in the paper and delving into the PyTorch version of StyleGAN 2 using forward hooks to modify the appropriate values is pretty straightforward.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/ss_analysis.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/cEbd6uK0EG-200.webp 200w, https://justinpinkney.com/img/cEbd6uK0EG-320.webp 320w, https://justinpinkney.com/img/cEbd6uK0EG-500.webp 500w, https://justinpinkney.com/img/cEbd6uK0EG-800.webp 800w, https://justinpinkney.com/img/cEbd6uK0EG-1024.webp 1024w, https://justinpinkney.com/img/cEbd6uK0EG-1264.webp 1264w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/cEbd6uK0EG-200.jpeg 200w, https://justinpinkney.com/img/cEbd6uK0EG-320.jpeg 320w, https://justinpinkney.com/img/cEbd6uK0EG-500.jpeg 500w, https://justinpinkney.com/img/cEbd6uK0EG-800.jpeg 800w, https://justinpinkney.com/img/cEbd6uK0EG-1024.jpeg 1024w, https://justinpinkney.com/img/cEbd6uK0EG-1264.jpeg 1264w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/cEbd6uK0EG-200.jpeg&quot; width=&quot;1264&quot; height=&quot;417&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;other-code&quot; tabindex=&quot;-1&quot;&gt;Other code &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;I also use some existing code for face alignment from the &lt;a href=&quot;https://github.com/NVlabs/ffhq-dataset&quot;&gt;FFHQ-dataset&lt;/a&gt; repository to ensure faces are aligned as expected by the StyleGAN model. I combine this with a pre-trained face detector from dlib, and add some extra code to &amp;quot;undo&amp;quot; the alignment to replace the edited face back in the original image. This involves some fairly straightforward code using OpenCV to estimate the inverse transform and apply this to align the face back into the reference frame of the original image.&lt;/p&gt;
&lt;h2 id=&quot;let-s-edit&quot; tabindex=&quot;-1&quot;&gt;Let&#39;s edit! &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now with these various bits of model code put together and stuck into the nice interactive RunwayML interface, it&#39;s easy to do some Face editing. PSP provides the means of embedding a set of real world faces in the latent space of StyleGAN2 and StyleSpace analysis lets us modify some interesting attributes. By placing this edited face back in the original image we can create some fairly realistically edited images.&lt;/p&gt;
&lt;p&gt;Here are some examples of facial hair, expression, and make up editing.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/dl.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/sGwMcSCqeR-200.webp 200w, https://justinpinkney.com/img/sGwMcSCqeR-320.webp 320w, https://justinpinkney.com/img/sGwMcSCqeR-500.webp 500w, https://justinpinkney.com/img/sGwMcSCqeR-800.webp 800w, https://justinpinkney.com/img/sGwMcSCqeR-1024.webp 1024w, https://justinpinkney.com/img/sGwMcSCqeR-1600.webp 1600w, https://justinpinkney.com/img/sGwMcSCqeR-2200.webp 2200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/sGwMcSCqeR-200.jpeg 200w, https://justinpinkney.com/img/sGwMcSCqeR-320.jpeg 320w, https://justinpinkney.com/img/sGwMcSCqeR-500.jpeg 500w, https://justinpinkney.com/img/sGwMcSCqeR-800.jpeg 800w, https://justinpinkney.com/img/sGwMcSCqeR-1024.jpeg 1024w, https://justinpinkney.com/img/sGwMcSCqeR-1600.jpeg 1600w, https://justinpinkney.com/img/sGwMcSCqeR-2200.jpeg 2200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/sGwMcSCqeR-200.jpeg&quot; width=&quot;2200&quot; height=&quot;1026&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/satc.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/6vY7hAAlBA-200.webp 200w, https://justinpinkney.com/img/6vY7hAAlBA-320.webp 320w, https://justinpinkney.com/img/6vY7hAAlBA-500.webp 500w, https://justinpinkney.com/img/6vY7hAAlBA-800.webp 800w, https://justinpinkney.com/img/6vY7hAAlBA-1024.webp 1024w, https://justinpinkney.com/img/6vY7hAAlBA-1278.webp 1278w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/6vY7hAAlBA-200.jpeg 200w, https://justinpinkney.com/img/6vY7hAAlBA-320.jpeg 320w, https://justinpinkney.com/img/6vY7hAAlBA-500.jpeg 500w, https://justinpinkney.com/img/6vY7hAAlBA-800.jpeg 800w, https://justinpinkney.com/img/6vY7hAAlBA-1024.jpeg 1024w, https://justinpinkney.com/img/6vY7hAAlBA-1278.jpeg 1278w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/6vY7hAAlBA-200.jpeg&quot; width=&quot;1278&quot; height=&quot;720&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/shaving.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/r0D2RWo3mg-200.webp 200w, https://justinpinkney.com/img/r0D2RWo3mg-320.webp 320w, https://justinpinkney.com/img/r0D2RWo3mg-500.webp 500w, https://justinpinkney.com/img/r0D2RWo3mg-800.webp 800w, https://justinpinkney.com/img/r0D2RWo3mg-1024.webp 1024w, https://justinpinkney.com/img/r0D2RWo3mg-1600.webp 1600w, https://justinpinkney.com/img/r0D2RWo3mg-2730.webp 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/r0D2RWo3mg-200.jpeg 200w, https://justinpinkney.com/img/r0D2RWo3mg-320.jpeg 320w, https://justinpinkney.com/img/r0D2RWo3mg-500.jpeg 500w, https://justinpinkney.com/img/r0D2RWo3mg-800.jpeg 800w, https://justinpinkney.com/img/r0D2RWo3mg-1024.jpeg 1024w, https://justinpinkney.com/img/r0D2RWo3mg-1600.jpeg 1600w, https://justinpinkney.com/img/r0D2RWo3mg-2730.jpeg 2730w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/r0D2RWo3mg-200.jpeg&quot; width=&quot;2730&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/makeup.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/G3YJ03CS-r-200.webp 200w, https://justinpinkney.com/img/G3YJ03CS-r-320.webp 320w, https://justinpinkney.com/img/G3YJ03CS-r-500.webp 500w, https://justinpinkney.com/img/G3YJ03CS-r-800.webp 800w, https://justinpinkney.com/img/G3YJ03CS-r-960.webp 960w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/G3YJ03CS-r-200.jpeg 200w, https://justinpinkney.com/img/G3YJ03CS-r-320.jpeg 320w, https://justinpinkney.com/img/G3YJ03CS-r-500.jpeg 500w, https://justinpinkney.com/img/G3YJ03CS-r-800.jpeg 800w, https://justinpinkney.com/img/G3YJ03CS-r-960.jpeg 960w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/G3YJ03CS-r-200.jpeg&quot; width=&quot;960&quot; height=&quot;1445&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The encoder is also flexible enough that it can work with some non-photographic images too.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/gothic.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/esnBn-lr6K-200.webp 200w, https://justinpinkney.com/img/esnBn-lr6K-320.webp 320w, https://justinpinkney.com/img/esnBn-lr6K-500.webp 500w, https://justinpinkney.com/img/esnBn-lr6K-800.webp 800w, https://justinpinkney.com/img/esnBn-lr6K-1024.webp 1024w, https://justinpinkney.com/img/esnBn-lr6K-1600.webp 1600w, https://justinpinkney.com/img/esnBn-lr6K-1790.webp 1790w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/esnBn-lr6K-200.jpeg 200w, https://justinpinkney.com/img/esnBn-lr6K-320.jpeg 320w, https://justinpinkney.com/img/esnBn-lr6K-500.jpeg 500w, https://justinpinkney.com/img/esnBn-lr6K-800.jpeg 800w, https://justinpinkney.com/img/esnBn-lr6K-1024.jpeg 1024w, https://justinpinkney.com/img/esnBn-lr6K-1600.jpeg 1600w, https://justinpinkney.com/img/esnBn-lr6K-1790.jpeg 1790w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/esnBn-lr6K-200.jpeg&quot; width=&quot;1790&quot; height=&quot;2160&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As the output of the model is constrained by the pre-trained generative model, it can also take unrealistic faces and translate them into realistic ones. It can then be used for blending composite &amp;quot;identikit&amp;quot; type faces, or even translating crude overlays (such as drawn glasses or cartoon moustaches) into realistic images.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/combo.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/PVK6qMDutl-200.webp 200w, https://justinpinkney.com/img/PVK6qMDutl-320.webp 320w, https://justinpinkney.com/img/PVK6qMDutl-500.webp 500w, https://justinpinkney.com/img/PVK6qMDutl-800.webp 800w, https://justinpinkney.com/img/PVK6qMDutl-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/PVK6qMDutl-200.jpeg 200w, https://justinpinkney.com/img/PVK6qMDutl-320.jpeg 320w, https://justinpinkney.com/img/PVK6qMDutl-500.jpeg 500w, https://justinpinkney.com/img/PVK6qMDutl-800.jpeg 800w, https://justinpinkney.com/img/PVK6qMDutl-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/PVK6qMDutl-200.jpeg&quot; width=&quot;1024&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is actually the same technique I used to mock up a face editor app integrated with GIMP&lt;/p&gt;
&lt;p&gt;https://vimeo.com/536038892&lt;/p&gt;
&lt;p&gt;There&#39;s also no particular reason why this should be limited to faces. I&#39;d love to see more of these techniques applied to editing landscape images, but this would require a high quality landscape model trained using StyleGAN2...&lt;/p&gt;
&lt;h2 id=&quot;could-try-harder&quot; tabindex=&quot;-1&quot;&gt;Could try harder &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This is pretty quick and dirty, but effective. There are some pretty obvious things that could make it nicer&lt;/p&gt;
&lt;h3 id=&quot;blending-the-composite-image&quot; tabindex=&quot;-1&quot;&gt;Blending the composite image &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;After compositing the image back into the original, the borders are generally fairly obvious. A straightforward method to improve this would simply be to perform poisson blending of the modified and original image. Or a more sophisticated method could involve segmentation of the person or the attribute region of interest from the modified image to remove the background before compositing back into the original.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/lana.jpeg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/dVccrLZAZv-200.webp 200w, https://justinpinkney.com/img/dVccrLZAZv-320.webp 320w, https://justinpinkney.com/img/dVccrLZAZv-500.webp 500w, https://justinpinkney.com/img/dVccrLZAZv-800.webp 800w, https://justinpinkney.com/img/dVccrLZAZv-1024.webp 1024w, https://justinpinkney.com/img/dVccrLZAZv-1600.webp 1600w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/dVccrLZAZv-200.jpeg 200w, https://justinpinkney.com/img/dVccrLZAZv-320.jpeg 320w, https://justinpinkney.com/img/dVccrLZAZv-500.jpeg 500w, https://justinpinkney.com/img/dVccrLZAZv-800.jpeg 800w, https://justinpinkney.com/img/dVccrLZAZv-1024.jpeg 1024w, https://justinpinkney.com/img/dVccrLZAZv-1600.jpeg 1600w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dVccrLZAZv-200.jpeg&quot; width=&quot;1600&quot; height=&quot;800&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;poor-embedding&quot; tabindex=&quot;-1&quot;&gt;Poor embedding &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;PSP hates bald people. It believes they should all have comb-overs, check out the image of Vin Diesel (encoded with no edits) below. A bit of extra optimisation of the encoded latents based on a perceptual similarity measure of the original image (e.g. LPIPS) might re-find the baldness.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/bald.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/WnvA7HZJMn-200.webp 200w, https://justinpinkney.com/img/WnvA7HZJMn-320.webp 320w, https://justinpinkney.com/img/WnvA7HZJMn-500.webp 500w, https://justinpinkney.com/img/WnvA7HZJMn-800.webp 800w, https://justinpinkney.com/img/WnvA7HZJMn-1024.webp 1024w, https://justinpinkney.com/img/WnvA7HZJMn-1600.webp 1600w, https://justinpinkney.com/img/WnvA7HZJMn-2130.webp 2130w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/WnvA7HZJMn-200.jpeg 200w, https://justinpinkney.com/img/WnvA7HZJMn-320.jpeg 320w, https://justinpinkney.com/img/WnvA7HZJMn-500.jpeg 500w, https://justinpinkney.com/img/WnvA7HZJMn-800.jpeg 800w, https://justinpinkney.com/img/WnvA7HZJMn-1024.jpeg 1024w, https://justinpinkney.com/img/WnvA7HZJMn-1600.jpeg 1600w, https://justinpinkney.com/img/WnvA7HZJMn-2130.jpeg 2130w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/WnvA7HZJMn-200.jpeg&quot; width=&quot;2130&quot; height=&quot;1600&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In other cases I&#39;m not sure any further fine tuning would help. I&#39;m pretty sure the FFHQ model doesn&#39;t know anything about tongues.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2021/face-edits/images/tongue.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/frWticCUa0-200.webp 200w, https://justinpinkney.com/img/frWticCUa0-320.webp 320w, https://justinpinkney.com/img/frWticCUa0-500.webp 500w, https://justinpinkney.com/img/frWticCUa0-800.webp 800w, https://justinpinkney.com/img/frWticCUa0-1024.webp 1024w, https://justinpinkney.com/img/frWticCUa0-1600.webp 1600w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/frWticCUa0-200.jpeg 200w, https://justinpinkney.com/img/frWticCUa0-320.jpeg 320w, https://justinpinkney.com/img/frWticCUa0-500.jpeg 500w, https://justinpinkney.com/img/frWticCUa0-800.jpeg 800w, https://justinpinkney.com/img/frWticCUa0-1024.jpeg 1024w, https://justinpinkney.com/img/frWticCUa0-1600.jpeg 1600w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/frWticCUa0-200.jpeg&quot; width=&quot;1600&quot; height=&quot;895&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;batching&quot; tabindex=&quot;-1&quot;&gt;Batching &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The current code just processes each face in the image individually. It would be a bit more efficient to batch these up. However I don&#39;t know the configuration of the remote GPU machine Runway uses, so don&#39;t know what batch size I could work with without risking an out of memory error.&lt;/p&gt;
&lt;h2 id=&quot;references&quot; tabindex=&quot;-1&quot;&gt;References &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2021/face-edits/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;[^psp]: Richardson, Elad, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. “Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/2008.00951.&lt;/p&gt;
&lt;p&gt;[^style-space]: Wu, Zongze, Dani Lischinski, and Eli Shechtman. 2020. “StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/2011.12799.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Ukiyo-e faces dataset</title>
		<link href="https://justinpinkney.com/blog/2020/ukiyoe-dataset/"/>
		<updated>2020-10-29T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/ukiyoe-dataset/</id>
		<content type="html">&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://drive.google.com/file/d/1zEgVLrKVp8oCZuX0NENcAeh-kdaKJzNG/view?usp=sharing&quot;&gt;Download the dataset: V2&lt;/a&gt;&lt;/strong&gt; &lt;img src=&quot;https://i.creativecommons.org/l/by-sa/4.0/80x15.png&quot; alt=&quot;https://creativecommons.org/licenses/by-sa/4.0/&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/ukiyoe-dataset.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YctQT8Mawa-200.webp 200w, https://justinpinkney.com/img/YctQT8Mawa-320.webp 320w, https://justinpinkney.com/img/YctQT8Mawa-500.webp 500w, https://justinpinkney.com/img/YctQT8Mawa-800.webp 800w, https://justinpinkney.com/img/YctQT8Mawa-1024.webp 1024w, https://justinpinkney.com/img/YctQT8Mawa-1600.webp 1600w, https://justinpinkney.com/img/YctQT8Mawa-2231.webp 2231w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YctQT8Mawa-200.jpeg 200w, https://justinpinkney.com/img/YctQT8Mawa-320.jpeg 320w, https://justinpinkney.com/img/YctQT8Mawa-500.jpeg 500w, https://justinpinkney.com/img/YctQT8Mawa-800.jpeg 800w, https://justinpinkney.com/img/YctQT8Mawa-1024.jpeg 1024w, https://justinpinkney.com/img/YctQT8Mawa-1600.jpeg 1600w, https://justinpinkney.com/img/YctQT8Mawa-2231.jpeg 2231w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YctQT8Mawa-200.jpeg&quot; width=&quot;2231&quot; height=&quot;924&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As part of my paper &lt;a href=&quot;https://arxiv.org/abs/2010.05334&quot;&gt;Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains&lt;/a&gt;[^rdgi] I use a dataset of Ukiyo-e face images for training a StyleGAN model, this post contains a link to, and details of that dataset.&lt;/p&gt;
&lt;h2 id=&quot;updates&quot; tabindex=&quot;-1&quot;&gt;Updates &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;V2 - Removed 28 bad quality images (poor alignment or not face).&lt;/li&gt;
&lt;li&gt;V1 - Initial release used in the paper &lt;em&gt;Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;the-dataset&quot; tabindex=&quot;-1&quot;&gt;The dataset &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
 &lt;link rel=&quot;stylesheet&quot; href=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.css&quot; integrity=&quot;sha256-p4NxAoJBhIIN+hmNHrzRCf9tD/miZyoHS5obTRR9BMY=&quot; crossorigin=&quot;&quot;&gt;
 &lt;!-- Make sure you put this AFTER Leaflet&#39;s CSS --&gt;
 &lt;script src=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.js&quot; integrity=&quot;sha256-20nQCchB9co0qIjJZRGuk2/Z9VM+kNiyxNV1lvTlZBo=&quot; crossorigin=&quot;&quot;&gt;&lt;/script&gt;
 &lt;div id=&quot;map&quot; style=&quot;height: 400px&quot;&gt;&lt;/div&gt;
 &lt;script&gt;

	const map = L.map(&#39;map&#39;, {
        crs: L.CRS.Simple
    }).setView([-0.25, 0.35], 12);

	const tiles = L.tileLayer(
        &quot;https://assets.justinpinkney.com/blog/ukiyoe/ukiyoe_files//{z}/{x}_{y}.jpg&quot;,
        {minZoom:9, maxZoom:15 }
    ).addTo(map);

&lt;/script&gt;
&lt;p&gt;The ukiyo-e faces dataset comprises of 5209 images of faces from ukiyo-e prints. The images are 1024x1024 pixels in jpeg format and have been aligned using the procedure used for the &lt;a href=&quot;https://github.com/NVlabs/ffhq-dataset&quot;&gt;FFHQ dataset&lt;/a&gt;. Above is a map of (almost) all the images in the dataset, images are plotted such that similar faces appear close together[^map]. The images have been downscaled to 256x256 for display.&lt;/p&gt;
&lt;h2 id=&quot;further-details&quot; tabindex=&quot;-1&quot;&gt;Further details &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Images are scraped from several museum websites, I then used Amazon Rekognition to attempt to detect faces and facial landmarks from each image. Rekognition does a reasonable job at both tasks, but is clearly imperfect, many faces are missed and there are alignment errors in many of the images. Many of the images are not of very high resolution, so to produce a useable dataset at 1024x1024 resolution I use a pre-trained ESRGAN[^esrgan] model &lt;a href=&quot;https://upscale.wiki/wiki/Model_Database&quot;&gt;trained on the Manga109 dataset&lt;/a&gt; to upscale the images where required, these leaves some artifacts but generally does a good job.&lt;/p&gt;
&lt;h2 id=&quot;other-datasets&quot; tabindex=&quot;-1&quot;&gt;Other datasets &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/rois-codh/kaokore&quot;&gt;KaoKore&lt;/a&gt; is another dataset of Ukiyo-e faces[^kao], it is more varied and labelled, however the image resolution is lower and faces are not aligned.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/kaokore_example.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Ube0Y8o0Sz-200.webp 200w, https://justinpinkney.com/img/Ube0Y8o0Sz-320.webp 320w, https://justinpinkney.com/img/Ube0Y8o0Sz-500.webp 500w, https://justinpinkney.com/img/Ube0Y8o0Sz-800.webp 800w, https://justinpinkney.com/img/Ube0Y8o0Sz-1024.webp 1024w, https://justinpinkney.com/img/Ube0Y8o0Sz-1314.webp 1314w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Ube0Y8o0Sz-200.jpeg 200w, https://justinpinkney.com/img/Ube0Y8o0Sz-320.jpeg 320w, https://justinpinkney.com/img/Ube0Y8o0Sz-500.jpeg 500w, https://justinpinkney.com/img/Ube0Y8o0Sz-800.jpeg 800w, https://justinpinkney.com/img/Ube0Y8o0Sz-1024.jpeg 1024w, https://justinpinkney.com/img/Ube0Y8o0Sz-1314.jpeg 1314w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Ube0Y8o0Sz-200.jpeg&quot; width=&quot;1314&quot; height=&quot;1246&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;license-and-usage&quot; tabindex=&quot;-1&quot;&gt;License and usage &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-dataset/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://i.creativecommons.org/l/by-sa/4.0/88x31.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;This dataset is provided under a &lt;a href=&quot;https://creativecommons.org/licenses/by-sa/4.0/&quot;&gt;Creative Commons Attribution-ShareAlike 4.0 International License&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If using the dataset please cite as &amp;quot;Aligned ukiyo-e faces dataset, Justin Pinkney 2020&amp;quot; or for a bibtex entry:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@misc{pinkney2020ukiyoe,
      author = {Pinkney, Justin N. M.},
      title = {Aligned Ukiyo-e faces dataset},
      year={2020},
      howpublished= {&#92;url{https://www.justinpinkney.com/blog/2020/ukiyoe-dataset}}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;[^rdgi]: Pinkney, Justin N. M., and Doron Adler. ‘Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains’. ArXiv:2010.05334 [Cs, Eess], 20 October 2020. http://arxiv.org/abs/2010.05334.&lt;/p&gt;
&lt;p&gt;[^map]: To generate this image I first extract CNN features from each image using a ResNet50 pre-trained on Imagenet. These high-dimensional feature vectors are then projected into two dimensions using UMAP, grid assigment is then done using the lapjv algorithm.&lt;/p&gt;
&lt;p&gt;[^esrgan]: Wang, Xintao, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, and Xiaoou Tang. ‘ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks’. ArXiv:1809.00219 [Cs], 1 September 2018. http://arxiv.org/abs/1809.00219.&lt;/p&gt;
&lt;p&gt;[^kao]: Tian, Yingtao, Chikahiko Suzuki, Tarin Clanuwat, Mikel Bober-Irizar, Alex Lamb, and Asanobu Kitamoto. ‘KaoKore: A Pre-Modern Japanese Art Facial Expression Dataset’. ArXiv:2002.08595 [Cs, Stat], 20 February 2020. http://arxiv.org/abs/2002.08595.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Making Toonify Yourself</title>
		<link href="https://justinpinkney.com/blog/2020/making-toonify/"/>
		<updated>2020-09-20T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/making-toonify/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;If you&#39;d like to keep Toonify Yourself free for everyone to play with, please consider donating to cover running costs at Ko-fi:&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/W7W228W2S&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So &lt;a href=&quot;https://linktr.ee/Norod78&quot;&gt;Doron Adler&lt;/a&gt; and I recently released our toonification translation model at our &lt;a href=&quot;https://toonify.justinpinkney.com&quot;&gt;Toonify Yourself&lt;/a&gt; website. It turned out to be pretty popular with tens of thousands people visiting in the 22 hours it was running for, submitting almost a quarter of a million images for toonification.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Buntworthy/status/1306236896125870080&lt;/p&gt;
&lt;p&gt;regex to capture the above twitter link is:&lt;/p&gt;
&lt;p&gt;It got quite a bit of interest on social media, and picked up on a few websites. Unfortunately we had to turn off the toonification server before costs started to get out of hand, but we&#39;re working on bringing it back so people can carry on playing with the model for free.&lt;/p&gt;
&lt;p&gt;A lot of people have expressed interest in how the model work and how the website was run. So here&#39;s a blog post with some details on the traffic and running costs, as well as the technical details of how to run a deep neural network in the cloud serving tens of thousands of requests an hour!&lt;/p&gt;
&lt;h2 id=&quot;making-an-efficient-toonification-model&quot; tabindex=&quot;-1&quot;&gt;Making an efficient Toonification model &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;If you want to know about the details of the original Toonification model, see &lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself&quot;&gt;this blog post&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The original Toonification method involved an expensive optimisation process to encode a person&#39;s face using the blended StyleGAN model which can take several minutes to run even on a GPU. Clearly this wasn&#39;t going to cut it as a web app! A common pattern in deep learning is replacing expensive optimisations with more neural networks[^style-transfer]. We used the basic idea described in &lt;em&gt;StyleGAN2 Distillation for Feed-Forward Image Manipulation&lt;/em&gt;[^distillation], i.e. training a pix2pixHD model to apply the transformation to any arbitrary image, rather than first having to perform the optimisation step.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/making-toonify/allen.jpg&quot; alt=&quot;Left: Original, Middle: Optimised, Right: pix2pixHD&quot;&gt;&lt;/p&gt;
&lt;p&gt;The novel part here is that the pairs of images we use for the training process are pairs produced by the original FFHQ model and the blended model[^new-model]. Although the pix2pixHD model is only trained on image generated by the two StyleGAN models, when it&#39;s done we should be able to apply it to any image and get the same toonification result&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/making-toonify/monkwithbook.jpg&quot; alt=&quot;It even works on paintings!&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;deploying-the-model&quot; tabindex=&quot;-1&quot;&gt;Deploying the model &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So after the initial interest on Twitter about my experiments putting together a local web app for myself to run the Toonify model I decided to take a crack at putting up a website where anybody could run it on their own images.&lt;/p&gt;
&lt;p&gt;First things first I wasn&#39;t going to be running anything on a GPU, so I needed to get the pix2pixHD model runnable on a CPU. The original pix2pixHD repo has some bugs which prevent inference without a GPU and I fixed these &lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify/github.com/justinpinkney/pix2pixHD/&quot;&gt;on my fork&lt;/a&gt; if anyone is interested. In the end I actually decided to export the model to ONNX format so I could run it using the ONNX runtime. This makes the dependencies more lightweight than trying to run using PyTorch and (I hope) the ONNX runtime is built with performance in mind.&lt;/p&gt;
&lt;p&gt;I went for Google Cloud Run as a means of deploying the web app. All the thing needs to do is accept an image, run inference, and return the result. It&#39;s totally stateless so a good fit for the Cloud Run model. The use of Docker containers in Cloud Run meant that it was easy for me to bundle up any required dependencies, scalability was built right in and there is a generous free allowance (not generous enough it would turn out!).&lt;/p&gt;
&lt;p&gt;So after a few evenings of putting together a small app using Flask and Bootstrap, things were ready to deploy!&lt;/p&gt;
&lt;h2 id=&quot;toonification-in-the-wild&quot; tabindex=&quot;-1&quot;&gt;Toonification in the Wild &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So after some beta testing with friends I announced the release of the &lt;a href=&quot;https://toonify.justinpinkney.com&quot;&gt;Toonify Yourself&lt;/a&gt; website on Twitter. It quickly got some reasonable traffic and people seemed to be enjoying trying the model out on themselves.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify/toonify.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/dE4QFJoQEb-200.webp 200w, https://justinpinkney.com/img/dE4QFJoQEb-320.webp 320w, https://justinpinkney.com/img/dE4QFJoQEb-500.webp 500w, https://justinpinkney.com/img/dE4QFJoQEb-800.webp 800w, https://justinpinkney.com/img/dE4QFJoQEb-1024.webp 1024w, https://justinpinkney.com/img/dE4QFJoQEb-1278.webp 1278w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/dE4QFJoQEb-200.jpeg 200w, https://justinpinkney.com/img/dE4QFJoQEb-320.jpeg 320w, https://justinpinkney.com/img/dE4QFJoQEb-500.jpeg 500w, https://justinpinkney.com/img/dE4QFJoQEb-800.jpeg 800w, https://justinpinkney.com/img/dE4QFJoQEb-1024.jpeg 1024w, https://justinpinkney.com/img/dE4QFJoQEb-1278.jpeg 1278w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dE4QFJoQEb-200.jpeg&quot; width=&quot;1278&quot; height=&quot;802&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Some were complaining that their faces were never detected no matter what they submitted, and I fairly quickly figured out (and many helpful people online started to point out) that it was an issue with image rotation on iPhones[^transpose].&lt;/p&gt;
&lt;p&gt;By the next morning traffic started to really pick up, partly due to getting on the front page of &lt;a href=&quot;https://news.ycombinator.com/item?id=24494377&quot;&gt;Hacker News&lt;/a&gt;. I was starting to get a little bit twitchy seeing the number of containers spun up on Cloud Run steadily increasing. As lunch time approached we were getting close to 25,000 page views an hour, at times this was requiring 100 containers to service the traffic, and thing were going up fast.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify/page-views.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Jwo-BDg-cr-200.webp 200w, https://justinpinkney.com/img/Jwo-BDg-cr-320.webp 320w, https://justinpinkney.com/img/Jwo-BDg-cr-500.webp 500w, https://justinpinkney.com/img/Jwo-BDg-cr-800.webp 800w, https://justinpinkney.com/img/Jwo-BDg-cr-1024.webp 1024w, https://justinpinkney.com/img/Jwo-BDg-cr-1096.webp 1096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/Jwo-BDg-cr-200.png 200w, https://justinpinkney.com/img/Jwo-BDg-cr-320.png 320w, https://justinpinkney.com/img/Jwo-BDg-cr-500.png 500w, https://justinpinkney.com/img/Jwo-BDg-cr-800.png 800w, https://justinpinkney.com/img/Jwo-BDg-cr-1024.png 1024w, https://justinpinkney.com/img/Jwo-BDg-cr-1096.png 1096w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Jwo-BDg-cr-200.png&quot; width=&quot;1096&quot; height=&quot;253&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The measly number of free cpu and ram minutes had long since evaporated, and I was getting a little concerned about what the cloud bill was going to be after I came back from an afternoon out of the house. So rather than limit things to a level that most people would get a non-response from the site, I decided to turn off the model and switch to an apology message.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify/offline.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/I45oWpjmQ0-200.webp 200w, https://justinpinkney.com/img/I45oWpjmQ0-320.webp 320w, https://justinpinkney.com/img/I45oWpjmQ0-500.webp 500w, https://justinpinkney.com/img/I45oWpjmQ0-800.webp 800w, https://justinpinkney.com/img/I45oWpjmQ0-1024.webp 1024w, https://justinpinkney.com/img/I45oWpjmQ0-1149.webp 1149w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/I45oWpjmQ0-200.jpeg 200w, https://justinpinkney.com/img/I45oWpjmQ0-320.jpeg 320w, https://justinpinkney.com/img/I45oWpjmQ0-500.jpeg 500w, https://justinpinkney.com/img/I45oWpjmQ0-800.jpeg 800w, https://justinpinkney.com/img/I45oWpjmQ0-1024.jpeg 1024w, https://justinpinkney.com/img/I45oWpjmQ0-1149.jpeg 1149w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/I45oWpjmQ0-200.jpeg&quot; width=&quot;1149&quot; height=&quot;815&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;the-numbers&quot; tabindex=&quot;-1&quot;&gt;The numbers &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In the end we had close to a quarter of a million page views before I shut down the toonification. Not every one of these corresponds to someone submitting an image, but it&#39;s not far off, and each user submitted around 3 or 4 images for toonification.&lt;/p&gt;
&lt;p&gt;So how much did this all cost? Not quite as much as I thought, I had set a personal limit of spending around $100 on this and I didn&#39;t break that. But here are some details of what it costs to run a service like this.&lt;/p&gt;
&lt;p&gt;The model takes about 5 seconds to run inference on whatever hardware Cloud Run uses, and occupies around 1.4 GB of memory whilst doing it. It also takes a slightly astonishing 20 seconds to load the model the first time a container is brought up (and this happens more than I&#39;d like), and the memory peaks at well over 2GB during this period. All this meant that processing a thousand images probably costs around 30 cents (footnote: there are also some other smaller costs like network egress to think about, but that&#39;s not much extra), which isn&#39;t too bad, but when you&#39;re trying to process 25,000 an hour, starts to add up fast!&lt;/p&gt;
&lt;p&gt;I&#39;m still pretty amazed that it was so easy to build a site which could service so much traffic and do some serious image processing in the cloud. I&#39;ve never used any of this scalable serverless technology before, but it was incredibly easy to get going!&lt;/p&gt;
&lt;h2 id=&quot;feedback&quot; tabindex=&quot;-1&quot;&gt;Feedback &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A lot of people commented that the images produced didn&#39;t preserve enough of the original character of the person, that they ended up looking pretty generic, and that a human cartoonist could do a far better job. I fully agree, there is not way deep learning is going to outperform a skilled artist any time soon! But there is also no way you could get skilled human artists to make cartoon versions of people for 30 cents per thousand.&lt;/p&gt;
&lt;p&gt;Despite me putting a line in the FAQ assuring people I was not storing or collecting their images (how on earth would I have afforded that!?) several people commented on it with scepticism, surely something free online must be harvesting your data somehow? But honestly once the toonification was done the original image and the result were gone forever (for me at least). Plus I don&#39;t really see why people were worried, people have already happily uploaded millions of photos of themselves to the internet and social media sites, who explicitly do make money from your data. If companies want to collect face data, a silly website like mine is not the way to do it!&lt;/p&gt;
&lt;h2 id=&quot;the-future&quot; tabindex=&quot;-1&quot;&gt;The future &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;UPDATE: Toonify Yourself is back&lt;/strong&gt; thanks to the fantastic support of generous supporters on Ko-fi, as well as model hosting thanks to &lt;a href=&quot;https://deepai.org&quot;&gt;DeepAI&lt;/a&gt;. There&#39;s still costs involved in running the site, so if you&#39;d like to help keep things free for everyone to play with please think about &lt;a href=&quot;https://ko-fi.com/justinpinkney&quot;&gt;supporting me on Ko-fi&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/W7W228W2S&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;twitter&quot; tabindex=&quot;-1&quot;&gt;Twitter &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Lots of people shared fun examples on Twitter, here are a few:&lt;/p&gt;
&lt;TwitterTimeline username=&quot;Buntworthy/timelines/1308129916114939904&quot; height=&quot;650px&quot;&gt;
&lt;h2 id=&quot;coverage&quot; tabindex=&quot;-1&quot;&gt;Coverage &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/making-toonify/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here are some links to the coverage this got.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.pocket-lint.com/apps/news/153848-how-to-toonify-yourself-see-what-you-d-look-like-in-a-cartoon-movie&quot;&gt;Pocket lint - How to Toonify yourself: See what you&#39;d look like in a cartoon movie&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://gigazine.net/gsc_news/en/20200917-toonify-yourself/&quot;&gt;Gigazine - I tried using &#39;Toonify Yourself!&#39; Which can convert your face into Disney animation style&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://meduza.io/shapito/2020/09/17/meduza-ispytala-toonify-servis-kotoryy-prevraschaet-vas-v-personazhey-multfilmov-nu-ili-v-cheloveka-epohi-neolita&quot;&gt;Meduza - Medusa tried Toonify, a service that turns you into cartoon characters. Well, or in a man of the Neolithic era&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://youtu.be/7Oqpiaj0IUM&quot;&gt;This AI Transform Faces into Hyper-Realistic Cartoon Characters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tech.sina.com.cn/roll/2020-09-20/doc-iivhuipp5388256.shtml&quot;&gt;The three giants of deep learning have also become cute, this one-click conversion of animated movie images is actually offline due to &amp;quot;too hot&amp;quot;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.boredpanda.com/turning-people-into-characters-toonify-justin-pinkney/&quot;&gt;This App That Turns People Into Pixar-Like Cartoon Characters Gets The Internet Buzzing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.genbeta.com/web/puedes-saber-que-aspecto-tendrias-como-personaje-pelicula-animacion-gracias-a-toonify-inteligencia-artificial&quot;&gt;You can now know what you would look like as a character in an animated film thanks to Toonify and artificial intelligence&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://hitek.fr/42/intelligence-artificielle-toonify-transforme-portrat-personnage-cartoon_8228&quot;&gt;This artificial intelligence turns your face into a cartoon character&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.20minutos.es/noticia/4397744/0/la-ultima-moda-viral-en-transformacion-del-rostro-esta-web-te-muestra-como-serias-si-fueras-un-personaje-de-pixar/&quot;&gt;The latest viral fashion: this website transforms you into a Pixar character&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://femina.hu/terasz/mesefigura-atalakito/&quot;&gt;What would you look like as a fairy tale figure? You can also try out the new favorite craze of the stars&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://otechnice.cz/toonify-dokaze-promenit-lidskou-tvar-v-oblicej-kreslene-postavicky/&quot;&gt;Toonify can turn a human face into the face of a cartoon character&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and many many more...&lt;/p&gt;
&lt;p&gt;I was even interviewed on the excellent Cold Fusion YouTube channel:&lt;/p&gt;
&lt;p&gt;https://youtu.be/KZ7BnJb30Cc&lt;/p&gt;
&lt;p&gt;[^style-transfer]: For a classic example see the work on neural Style Transfer.&lt;/p&gt;
&lt;p&gt;[^distillation]: Viazovetskyi, Yuri, Vladimir Ivashkin, and Evgeny Kashin. ‘StyleGAN2 Distillation for Feed-Forward Image Manipulation’. ArXiv:2003.03581 [Cs], 7 March 2020. http://arxiv.org/abs/2003.03581.&lt;/p&gt;
&lt;p&gt;[^new-model]:  Doron actually spent some time assembling new datasets and training a new models so the results you see are a bit different to the ones I originally shared.&lt;/p&gt;
&lt;p&gt;[^transpose]: Portrait images are actually saved as landscape on iPhone and the rotation is embedded in the metadata. You can apply this rotation in Pillow using &lt;code&gt;ImageOps.exif_transpose&lt;/code&gt;&lt;/p&gt;
&lt;/TwitterTimeline&gt;</content>
	</entry>
	
	<entry>
		<title>Toonify yourself</title>
		<link href="https://justinpinkney.com/blog/2020/toonify-yourself/"/>
		<updated>2020-09-01T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/toonify-yourself/</id>
		<content type="html">&lt;p&gt;&lt;strong&gt;TLDR: If you want a Colab Notebook to toonify yourself click here:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://colab.research.google.com/drive/1s2XPNMwf6HDhrJ1FMwlW1jl-eQ2-_tlk?usp=sharing&quot;&gt;&lt;img src=&quot;https://colab.research.google.com/assets/colab-badge.svg&quot; alt=&quot;Open In Colab&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you&#39;re interested in how the website &lt;a href=&quot;https://toonify.justinpinkney.com&quot;&gt;Toonify Yourself&lt;/a&gt; works, see this &lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify&quot;&gt;followup post&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In a &lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending&quot;&gt;previous post&lt;/a&gt; I introduced the idea of &lt;strong&gt;Layer Swapping&lt;/strong&gt; (or more generally network blending) for StyleGAN models. I briefly pointed to a fantastic model created by &lt;a href=&quot;https://linktr.ee/Norod78&quot;&gt;Doron Adler&lt;/a&gt; that generates almost photo-realistc people who seem to have come straight out of your favourite Disney/Pixar/Dreamworks animated movie.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Norod78/status/1297513475258953728&lt;/p&gt;
&lt;h2 id=&quot;how-to-make-a-photo-realistic-cartoon-model&quot; tabindex=&quot;-1&quot;&gt;How to make a photo-realistic cartoon model &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So what&#39;s going on here? This blended network is actually the product of a rather convoluted process which I&#39;ll try and briefly explain.&lt;/p&gt;
&lt;h3 id=&quot;transfer-learning&quot; tabindex=&quot;-1&quot;&gt;Transfer learning &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We start with the classic StyleGAN model which is trained on photos of people&#39;s faces. This was released with the StyleGAN2 code and paper and produces pretty fantastically high quality results.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/NVlabs/stylegan2/master/docs/stylegan2-teaser-1024x256.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Not everyone has multiple-GPUs and weeks of time to train a model so a shortcut lots of people is called transfer learning, where they take this &amp;quot;pre-trained model&amp;quot; and then train it on some new data. This gives good results really quickly, and even more so if the new dataset is some also of faces.&lt;/p&gt;
&lt;p&gt;Doron fine-tuned the faces model on a dataset of various characters from animated films. It&#39;s only around 300 images but enough for the model to start learning what features these characters typically have.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/small-faces.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/FgvtALke0w-200.webp 200w, https://justinpinkney.com/img/FgvtALke0w-320.webp 320w, https://justinpinkney.com/img/FgvtALke0w-500.webp 500w, https://justinpinkney.com/img/FgvtALke0w-800.webp 800w, https://justinpinkney.com/img/FgvtALke0w-1024.webp 1024w, https://justinpinkney.com/img/FgvtALke0w-1600.webp 1600w, https://justinpinkney.com/img/FgvtALke0w-1766.webp 1766w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/FgvtALke0w-200.jpeg 200w, https://justinpinkney.com/img/FgvtALke0w-320.jpeg 320w, https://justinpinkney.com/img/FgvtALke0w-500.jpeg 500w, https://justinpinkney.com/img/FgvtALke0w-800.jpeg 800w, https://justinpinkney.com/img/FgvtALke0w-1024.jpeg 1024w, https://justinpinkney.com/img/FgvtALke0w-1600.jpeg 1600w, https://justinpinkney.com/img/FgvtALke0w-1766.jpeg 1766w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/FgvtALke0w-200.jpeg&quot; width=&quot;1766&quot; height=&quot;998&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once the model is trained just a little bit it gives outputs that look like below&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/small-ffhq-cartoons-000038_fakes.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/AvZhynwa9M-200.webp 200w, https://justinpinkney.com/img/AvZhynwa9M-320.webp 320w, https://justinpinkney.com/img/AvZhynwa9M-500.webp 500w, https://justinpinkney.com/img/AvZhynwa9M-800.webp 800w, https://justinpinkney.com/img/AvZhynwa9M-1024.webp 1024w, https://justinpinkney.com/img/AvZhynwa9M-1200.webp 1200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/AvZhynwa9M-200.jpeg 200w, https://justinpinkney.com/img/AvZhynwa9M-320.jpeg 320w, https://justinpinkney.com/img/AvZhynwa9M-500.jpeg 500w, https://justinpinkney.com/img/AvZhynwa9M-800.jpeg 800w, https://justinpinkney.com/img/AvZhynwa9M-1024.jpeg 1024w, https://justinpinkney.com/img/AvZhynwa9M-1200.jpeg 1200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/AvZhynwa9M-200.jpeg&quot; width=&quot;1200&quot; height=&quot;719&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The output is OK for such a small amount of training on a small dataset, it&#39;s clearly got the big eyes thing down pretty well. The problem is that the style of images in the dataset is a bit of a mish-mash, some are CG some are hand drawn, and lots are quite low resolution. The model tries to replicate all these things and comes off worse for it.&lt;/p&gt;
&lt;h3 id=&quot;blend-the-models&quot; tabindex=&quot;-1&quot;&gt;Blend the models &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;But as this model was fine tuned from the original faces model, we can perform a trick where we directly &lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;swap parts of the models&lt;/a&gt; around. This gets interesting because, due to the structure of StyleGAN,  different layers in the model affect the appearance in different ways. So low resolution layers affect the pose of the head and shape of the face, while high resolution layers control things like lighting and texture. When Doron used my layer swapping script (&lt;a href=&quot;https://colab.research.google.com/drive/1tputbmA9EaXs9HL9iO21g7xN7jz_Xrko?usp=sharing&quot;&gt;Colab here&lt;/a&gt;) to take the high resolution layers from the original model, and the low resolution from his fine-tuned cartoon model. You end up with a hybrid which has the structure of a cartoon face, but photo realistic rendering!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/m2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/no_0msaVO--200.webp 200w, https://justinpinkney.com/img/no_0msaVO--320.webp 320w, https://justinpinkney.com/img/no_0msaVO--500.webp 500w, https://justinpinkney.com/img/no_0msaVO--800.webp 800w, https://justinpinkney.com/img/no_0msaVO--1024.webp 1024w, https://justinpinkney.com/img/no_0msaVO--1280.webp 1280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/no_0msaVO--200.jpeg 200w, https://justinpinkney.com/img/no_0msaVO--320.jpeg 320w, https://justinpinkney.com/img/no_0msaVO--500.jpeg 500w, https://justinpinkney.com/img/no_0msaVO--800.jpeg 800w, https://justinpinkney.com/img/no_0msaVO--1024.jpeg 1024w, https://justinpinkney.com/img/no_0msaVO--1280.jpeg 1280w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/no_0msaVO--200.jpeg&quot; width=&quot;1280&quot; height=&quot;759&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you generate images using the original faces model and the blended cartoon model, you can see how there is a clear relationship between the two, the identities appear to be the same, but the features have been shifted to give them a cartoon look.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/toon.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;now-toonify-yourself&quot; tabindex=&quot;-1&quot;&gt;Now toonify yourself &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;These StyleGAN face models can produce a huge diversity of faces and it&#39;s actually possible to find basically any face inside the model. It&#39;s actually a straight forward process to search for any image of a face in the model. So given an example image you want to find you can find a &amp;quot;code&amp;quot; (aka latent vector) which, when given as an input to the model, will produce an output which looks almost exactly like the face you&#39;re looking for. Below the original is on the left, and the generated image on the right (it&#39;s hard to tell right!?)&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/embedding.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Kf_t_TsnkM-200.webp 200w, https://justinpinkney.com/img/Kf_t_TsnkM-320.webp 320w, https://justinpinkney.com/img/Kf_t_TsnkM-500.webp 500w, https://justinpinkney.com/img/Kf_t_TsnkM-800.webp 800w, https://justinpinkney.com/img/Kf_t_TsnkM-1024.webp 1024w, https://justinpinkney.com/img/Kf_t_TsnkM-1600.webp 1600w, https://justinpinkney.com/img/Kf_t_TsnkM-2088.webp 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Kf_t_TsnkM-200.jpeg 200w, https://justinpinkney.com/img/Kf_t_TsnkM-320.jpeg 320w, https://justinpinkney.com/img/Kf_t_TsnkM-500.jpeg 500w, https://justinpinkney.com/img/Kf_t_TsnkM-800.jpeg 800w, https://justinpinkney.com/img/Kf_t_TsnkM-1024.jpeg 1024w, https://justinpinkney.com/img/Kf_t_TsnkM-1600.jpeg 1600w, https://justinpinkney.com/img/Kf_t_TsnkM-2088.jpeg 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Kf_t_TsnkM-200.jpeg&quot; width=&quot;2088&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now you have a code that represents a face, you can give this as input to the blended model and given that they are closely related, you will get the same face, but modified to look like a &amp;quot;toon&amp;quot; version!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/abe_toon.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/uV-XEntdIV-200.webp 200w, https://justinpinkney.com/img/uV-XEntdIV-320.webp 320w, https://justinpinkney.com/img/uV-XEntdIV-500.webp 500w, https://justinpinkney.com/img/uV-XEntdIV-800.webp 800w, https://justinpinkney.com/img/uV-XEntdIV-1024.webp 1024w, https://justinpinkney.com/img/uV-XEntdIV-1600.webp 1600w, https://justinpinkney.com/img/uV-XEntdIV-2088.webp 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/uV-XEntdIV-200.jpeg 200w, https://justinpinkney.com/img/uV-XEntdIV-320.jpeg 320w, https://justinpinkney.com/img/uV-XEntdIV-500.jpeg 500w, https://justinpinkney.com/img/uV-XEntdIV-800.jpeg 800w, https://justinpinkney.com/img/uV-XEntdIV-1024.jpeg 1024w, https://justinpinkney.com/img/uV-XEntdIV-1600.jpeg 1600w, https://justinpinkney.com/img/uV-XEntdIV-2088.jpeg 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/uV-XEntdIV-200.jpeg&quot; width=&quot;2088&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you want to try this process yourself you can use &lt;a href=&quot;https://colab.research.google.com/drive/1s2XPNMwf6HDhrJ1FMwlW1jl-eQ2-_tlk?usp=sharing&quot;&gt;this colab notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;BTW all credit to &lt;a href=&quot;https://twitter.com/Norod78&quot;&gt;Doron&lt;/a&gt; for making the model and showing this self toonification works wonderfully!&lt;/p&gt;
&lt;h2 id=&quot;next-steps&quot; tabindex=&quot;-1&quot;&gt;Next steps &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I followed up this stuff by making a website where anyone could Toonify themselves in a few seconds, it got pretty popular (too popular!). See the details of &lt;a href=&quot;https://justinpinkney.com/blog/2020/making-toonify&quot;&gt;how it was made&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;more-examples&quot; tabindex=&quot;-1&quot;&gt;More examples &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/LLxTvOGise-200.webp 200w, https://justinpinkney.com/img/LLxTvOGise-320.webp 320w, https://justinpinkney.com/img/LLxTvOGise-500.webp 500w, https://justinpinkney.com/img/LLxTvOGise-800.webp 800w, https://justinpinkney.com/img/LLxTvOGise-1024.webp 1024w, https://justinpinkney.com/img/LLxTvOGise-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/LLxTvOGise-200.jpeg 200w, https://justinpinkney.com/img/LLxTvOGise-320.jpeg 320w, https://justinpinkney.com/img/LLxTvOGise-500.jpeg 500w, https://justinpinkney.com/img/LLxTvOGise-800.jpeg 800w, https://justinpinkney.com/img/LLxTvOGise-1024.jpeg 1024w, https://justinpinkney.com/img/LLxTvOGise-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/LLxTvOGise-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/KraoEllR0T-200.webp 200w, https://justinpinkney.com/img/KraoEllR0T-320.webp 320w, https://justinpinkney.com/img/KraoEllR0T-500.webp 500w, https://justinpinkney.com/img/KraoEllR0T-800.webp 800w, https://justinpinkney.com/img/KraoEllR0T-1024.webp 1024w, https://justinpinkney.com/img/KraoEllR0T-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/KraoEllR0T-200.jpeg 200w, https://justinpinkney.com/img/KraoEllR0T-320.jpeg 320w, https://justinpinkney.com/img/KraoEllR0T-500.jpeg 500w, https://justinpinkney.com/img/KraoEllR0T-800.jpeg 800w, https://justinpinkney.com/img/KraoEllR0T-1024.jpeg 1024w, https://justinpinkney.com/img/KraoEllR0T-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/KraoEllR0T-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon3.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/AS5SMoQ7wB-200.webp 200w, https://justinpinkney.com/img/AS5SMoQ7wB-320.webp 320w, https://justinpinkney.com/img/AS5SMoQ7wB-500.webp 500w, https://justinpinkney.com/img/AS5SMoQ7wB-800.webp 800w, https://justinpinkney.com/img/AS5SMoQ7wB-1024.webp 1024w, https://justinpinkney.com/img/AS5SMoQ7wB-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/AS5SMoQ7wB-200.jpeg 200w, https://justinpinkney.com/img/AS5SMoQ7wB-320.jpeg 320w, https://justinpinkney.com/img/AS5SMoQ7wB-500.jpeg 500w, https://justinpinkney.com/img/AS5SMoQ7wB-800.jpeg 800w, https://justinpinkney.com/img/AS5SMoQ7wB-1024.jpeg 1024w, https://justinpinkney.com/img/AS5SMoQ7wB-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/AS5SMoQ7wB-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon4.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/jYvxAJH6jt-200.webp 200w, https://justinpinkney.com/img/jYvxAJH6jt-320.webp 320w, https://justinpinkney.com/img/jYvxAJH6jt-500.webp 500w, https://justinpinkney.com/img/jYvxAJH6jt-800.webp 800w, https://justinpinkney.com/img/jYvxAJH6jt-1024.webp 1024w, https://justinpinkney.com/img/jYvxAJH6jt-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/jYvxAJH6jt-200.jpeg 200w, https://justinpinkney.com/img/jYvxAJH6jt-320.jpeg 320w, https://justinpinkney.com/img/jYvxAJH6jt-500.jpeg 500w, https://justinpinkney.com/img/jYvxAJH6jt-800.jpeg 800w, https://justinpinkney.com/img/jYvxAJH6jt-1024.jpeg 1024w, https://justinpinkney.com/img/jYvxAJH6jt-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/jYvxAJH6jt-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon5.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/AVHH8twP6t-200.webp 200w, https://justinpinkney.com/img/AVHH8twP6t-320.webp 320w, https://justinpinkney.com/img/AVHH8twP6t-500.webp 500w, https://justinpinkney.com/img/AVHH8twP6t-800.webp 800w, https://justinpinkney.com/img/AVHH8twP6t-1024.webp 1024w, https://justinpinkney.com/img/AVHH8twP6t-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/AVHH8twP6t-200.jpeg 200w, https://justinpinkney.com/img/AVHH8twP6t-320.jpeg 320w, https://justinpinkney.com/img/AVHH8twP6t-500.jpeg 500w, https://justinpinkney.com/img/AVHH8twP6t-800.jpeg 800w, https://justinpinkney.com/img/AVHH8twP6t-1024.jpeg 1024w, https://justinpinkney.com/img/AVHH8twP6t-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/AVHH8twP6t-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon6.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/lb-tFhEl8g-200.webp 200w, https://justinpinkney.com/img/lb-tFhEl8g-320.webp 320w, https://justinpinkney.com/img/lb-tFhEl8g-500.webp 500w, https://justinpinkney.com/img/lb-tFhEl8g-800.webp 800w, https://justinpinkney.com/img/lb-tFhEl8g-1024.webp 1024w, https://justinpinkney.com/img/lb-tFhEl8g-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/lb-tFhEl8g-200.jpeg 200w, https://justinpinkney.com/img/lb-tFhEl8g-320.jpeg 320w, https://justinpinkney.com/img/lb-tFhEl8g-500.jpeg 500w, https://justinpinkney.com/img/lb-tFhEl8g-800.jpeg 800w, https://justinpinkney.com/img/lb-tFhEl8g-1024.jpeg 1024w, https://justinpinkney.com/img/lb-tFhEl8g-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/lb-tFhEl8g-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon7.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/mhqslsUd_h-200.webp 200w, https://justinpinkney.com/img/mhqslsUd_h-320.webp 320w, https://justinpinkney.com/img/mhqslsUd_h-500.webp 500w, https://justinpinkney.com/img/mhqslsUd_h-800.webp 800w, https://justinpinkney.com/img/mhqslsUd_h-1024.webp 1024w, https://justinpinkney.com/img/mhqslsUd_h-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/mhqslsUd_h-200.jpeg 200w, https://justinpinkney.com/img/mhqslsUd_h-320.jpeg 320w, https://justinpinkney.com/img/mhqslsUd_h-500.jpeg 500w, https://justinpinkney.com/img/mhqslsUd_h-800.jpeg 800w, https://justinpinkney.com/img/mhqslsUd_h-1024.jpeg 1024w, https://justinpinkney.com/img/mhqslsUd_h-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/mhqslsUd_h-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon8.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/xSJORQsRZC-200.webp 200w, https://justinpinkney.com/img/xSJORQsRZC-320.webp 320w, https://justinpinkney.com/img/xSJORQsRZC-500.webp 500w, https://justinpinkney.com/img/xSJORQsRZC-800.webp 800w, https://justinpinkney.com/img/xSJORQsRZC-1024.webp 1024w, https://justinpinkney.com/img/xSJORQsRZC-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/xSJORQsRZC-200.jpeg 200w, https://justinpinkney.com/img/xSJORQsRZC-320.jpeg 320w, https://justinpinkney.com/img/xSJORQsRZC-500.jpeg 500w, https://justinpinkney.com/img/xSJORQsRZC-800.jpeg 800w, https://justinpinkney.com/img/xSJORQsRZC-1024.jpeg 1024w, https://justinpinkney.com/img/xSJORQsRZC-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/xSJORQsRZC-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself/examples/toon9.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/EYf9qPJOyr-200.webp 200w, https://justinpinkney.com/img/EYf9qPJOyr-320.webp 320w, https://justinpinkney.com/img/EYf9qPJOyr-500.webp 500w, https://justinpinkney.com/img/EYf9qPJOyr-800.webp 800w, https://justinpinkney.com/img/EYf9qPJOyr-1024.webp 1024w, https://justinpinkney.com/img/EYf9qPJOyr-1044.webp 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/EYf9qPJOyr-200.jpeg 200w, https://justinpinkney.com/img/EYf9qPJOyr-320.jpeg 320w, https://justinpinkney.com/img/EYf9qPJOyr-500.jpeg 500w, https://justinpinkney.com/img/EYf9qPJOyr-800.jpeg 800w, https://justinpinkney.com/img/EYf9qPJOyr-1024.jpeg 1024w, https://justinpinkney.com/img/EYf9qPJOyr-1044.jpeg 1044w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/EYf9qPJOyr-200.jpeg&quot; width=&quot;1044&quot; height=&quot;522&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>StyleGAN network blending</title>
		<link href="https://justinpinkney.com/blog/2020/stylegan-network-blending/"/>
		<updated>2020-08-25T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/stylegan-network-blending/</id>
		<content type="html">&lt;h2 id=&quot;making-ukiyo-e-portraits-real&quot; tabindex=&quot;-1&quot;&gt;Making Ukiyo-e portraits real &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/ukiyoe/3.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/avfwlzLwff-200.webp 200w, https://justinpinkney.com/img/avfwlzLwff-320.webp 320w, https://justinpinkney.com/img/avfwlzLwff-500.webp 500w, https://justinpinkney.com/img/avfwlzLwff-800.webp 800w, https://justinpinkney.com/img/avfwlzLwff-1024.webp 1024w, https://justinpinkney.com/img/avfwlzLwff-1600.webp 1600w, https://justinpinkney.com/img/avfwlzLwff-2088.webp 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/avfwlzLwff-200.jpeg 200w, https://justinpinkney.com/img/avfwlzLwff-320.jpeg 320w, https://justinpinkney.com/img/avfwlzLwff-500.jpeg 500w, https://justinpinkney.com/img/avfwlzLwff-800.jpeg 800w, https://justinpinkney.com/img/avfwlzLwff-1024.jpeg 1024w, https://justinpinkney.com/img/avfwlzLwff-1600.jpeg 1600w, https://justinpinkney.com/img/avfwlzLwff-2088.jpeg 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/avfwlzLwff-200.jpeg&quot; width=&quot;2088&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In my previous post about attempting to create an &lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself&quot;&gt;ukiyo-e portrait generator&lt;/a&gt; I introduced a concept I called &amp;quot;layer swapping&amp;quot; in order to mix two StyleGAN models[^version]. The aim was to blend a &lt;strong&gt;base model&lt;/strong&gt; and another created from that using transfer learning, the &lt;strong&gt;fine-tuned model&lt;/strong&gt;. The method was different to simply interpolating the weights of the two models[^interpolation] as it allows you to control independently which model you got low and high resolution features from; in my example I wanted to get the pose from normal photographs, and the texture/style from ukiyo-e prints[^style-transfer].&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/mr79.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.webp 200w, https://justinpinkney.com/img/dB65WPoc2Y-320.webp 320w, https://justinpinkney.com/img/dB65WPoc2Y-500.webp 500w, https://justinpinkney.com/img/dB65WPoc2Y-800.webp 800w, https://justinpinkney.com/img/dB65WPoc2Y-1024.webp 1024w, https://justinpinkney.com/img/dB65WPoc2Y-1600.webp 1600w, https://justinpinkney.com/img/dB65WPoc2Y-3072.webp 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.jpeg 200w, https://justinpinkney.com/img/dB65WPoc2Y-320.jpeg 320w, https://justinpinkney.com/img/dB65WPoc2Y-500.jpeg 500w, https://justinpinkney.com/img/dB65WPoc2Y-800.jpeg 800w, https://justinpinkney.com/img/dB65WPoc2Y-1024.jpeg 1024w, https://justinpinkney.com/img/dB65WPoc2Y-1600.jpeg 1600w, https://justinpinkney.com/img/dB65WPoc2Y-3072.jpeg 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.jpeg&quot; width=&quot;3072&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The above example worked ok, but after the a &lt;a href=&quot;https://twitter.com/AydaoGMan/status/1295876628762046464?s=20&quot;&gt;recent Twitter thread&lt;/a&gt; popped up again on model interpolation, I realised that I had missed a really obvious variation on my earlier experiments. Rather than taking the low resolution layers (pose) from normal photos and high res layers (texture) from ukiyo-e I figured it would surely be interesting to try the other way round[^texture].&lt;/p&gt;
&lt;p&gt;https://vimeo.com/451284388&lt;/p&gt;
&lt;p&gt;It was indeed interesting and deeply weird too! Playing around with the different levels at which the swap occurs gives some control over how realistic the images are. If you&#39;ve saved a bunch of network snapshots during transfer learning the degree to which the networks have diverged also give some interesting effects.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/ukiyoe/77.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/tG9M8_mRsO-200.webp 200w, https://justinpinkney.com/img/tG9M8_mRsO-320.webp 320w, https://justinpinkney.com/img/tG9M8_mRsO-500.webp 500w, https://justinpinkney.com/img/tG9M8_mRsO-800.webp 800w, https://justinpinkney.com/img/tG9M8_mRsO-1024.webp 1024w, https://justinpinkney.com/img/tG9M8_mRsO-1600.webp 1600w, https://justinpinkney.com/img/tG9M8_mRsO-3132.webp 3132w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/tG9M8_mRsO-200.jpeg 200w, https://justinpinkney.com/img/tG9M8_mRsO-320.jpeg 320w, https://justinpinkney.com/img/tG9M8_mRsO-500.jpeg 500w, https://justinpinkney.com/img/tG9M8_mRsO-800.jpeg 800w, https://justinpinkney.com/img/tG9M8_mRsO-1024.jpeg 1024w, https://justinpinkney.com/img/tG9M8_mRsO-1600.jpeg 1600w, https://justinpinkney.com/img/tG9M8_mRsO-3132.jpeg 3132w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/tG9M8_mRsO-200.jpeg&quot; width=&quot;3132&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/ukiyoe/98.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/45RxxuW1fe-200.webp 200w, https://justinpinkney.com/img/45RxxuW1fe-320.webp 320w, https://justinpinkney.com/img/45RxxuW1fe-500.webp 500w, https://justinpinkney.com/img/45RxxuW1fe-800.webp 800w, https://justinpinkney.com/img/45RxxuW1fe-1024.webp 1024w, https://justinpinkney.com/img/45RxxuW1fe-1600.webp 1600w, https://justinpinkney.com/img/45RxxuW1fe-2088.webp 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/45RxxuW1fe-200.jpeg 200w, https://justinpinkney.com/img/45RxxuW1fe-320.jpeg 320w, https://justinpinkney.com/img/45RxxuW1fe-500.jpeg 500w, https://justinpinkney.com/img/45RxxuW1fe-800.jpeg 800w, https://justinpinkney.com/img/45RxxuW1fe-1024.jpeg 1024w, https://justinpinkney.com/img/45RxxuW1fe-1600.jpeg 1600w, https://justinpinkney.com/img/45RxxuW1fe-2088.jpeg 2088w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/45RxxuW1fe-200.jpeg&quot; width=&quot;2088&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You also see some wonderfully weird effects because of the fact that ukiyo-e artists almost never drew faces straight on. As the original faces model is mostly straight-on the model has a somewhat tough time adapting to this change.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/ukiyoe/84.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/TngOSNyOpS-200.webp 200w, https://justinpinkney.com/img/TngOSNyOpS-320.webp 320w, https://justinpinkney.com/img/TngOSNyOpS-500.webp 500w, https://justinpinkney.com/img/TngOSNyOpS-800.webp 800w, https://justinpinkney.com/img/TngOSNyOpS-1024.webp 1024w, https://justinpinkney.com/img/TngOSNyOpS-1600.webp 1600w, https://justinpinkney.com/img/TngOSNyOpS-3132.webp 3132w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/TngOSNyOpS-200.jpeg 200w, https://justinpinkney.com/img/TngOSNyOpS-320.jpeg 320w, https://justinpinkney.com/img/TngOSNyOpS-500.jpeg 500w, https://justinpinkney.com/img/TngOSNyOpS-800.jpeg 800w, https://justinpinkney.com/img/TngOSNyOpS-1024.jpeg 1024w, https://justinpinkney.com/img/TngOSNyOpS-1600.jpeg 1600w, https://justinpinkney.com/img/TngOSNyOpS-3132.jpeg 3132w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/TngOSNyOpS-200.jpeg&quot; width=&quot;3132&quot; height=&quot;1044&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;blending-other-stylegan-models&quot; tabindex=&quot;-1&quot;&gt;Blending other StyleGAN models &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You can also swap models which are trained on very different domains (but one still has to be fine-tuned from the other). For example the Frea Buckler model trained by &lt;a href=&quot;https://artificial-images.com/&quot;&gt;Derrick Schultz&lt;/a&gt;. Swapping out the original FFHQ trained model into this one is in a sense replacing the rendering to be of the faces model, but the structure to be from the new one.&lt;/p&gt;
&lt;p&gt;https://vimeo.com/451291240&lt;/p&gt;
&lt;p&gt;As well as being pretty mesmerising it seems to give some hints as to how the model transferred domains. It looks like it&#39;s mostly adapted features corresponding to the background of the original images to serve as the structural elements of the new model. Many of the new images look like an &amp;quot;over the shoulder&amp;quot; view point, and the original faces have been pushed out of frame (although as noticed they are still lurking there &lt;a href=&quot;https://youtu.be/s3ZC2rMczt8&quot;&gt;deep in the model&lt;/a&gt;). Although you could probably understand these details of the mechansim of domain transfer using some statistical analysis of the network weights and internal activations, this is quite a simple and pretty way of getting an intuition.&lt;/p&gt;
&lt;h2 id=&quot;get-the-code-and-blend-your-own-networks&quot; tabindex=&quot;-1&quot;&gt;Get the code and blend your own networks &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/face-world.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;I&#39;ve shared an initial version of some code to blend two networks in this layer swapping manner (with some interpolation thrown into the mix) in my &lt;a href=&quot;https://github.com/justinpinkney/stylegan2&quot;&gt;StyleGAN2 fork&lt;/a&gt; (see the blend_models.py file). There&#39;s also an &lt;a href=&quot;https://colab.research.google.com/drive/1tputbmA9EaXs9HL9iO21g7xN7jz_Xrko?usp=sharing&quot;&gt;example colab notebook&lt;/a&gt; to show how to blend some StyleGAN models[^models], in the example I use a small faces model and one I trained on satellite images of the earth above.&lt;/p&gt;
&lt;p&gt;I plan to write a more detailed article on some of the effects of different blending strategies and models but for now the rest of this is documenting some of the amazing things others have done with the approach.&lt;/p&gt;
&lt;h2 id=&quot;over-to-you-stylegan-twitter&quot; tabindex=&quot;-1&quot;&gt;Over to you StyleGAN Twitter! &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id=&quot;disneyfication&quot; tabindex=&quot;-1&quot;&gt;Disneyfication &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Shortly after sharing my code and approach, some of the wonderful StyleGAN community on Twitter started trying things out. The first really amazing network blend was by Doron Adler, he mixed a model fine-tuned on &lt;a href=&quot;https://twitter.com/Buntworthy/status/1297976798236598274&quot;&gt;just a few images of Disney/Dreamworks/Pixar&lt;/a&gt; characters to give these uncannily cartoonish characters.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Norod78/status/1297513475258953728&lt;/p&gt;
&lt;p&gt;He also used a StyleGAN encoder to find the latent representation of a real face in the &amp;quot;real face&amp;quot; model then generate an image from the representation using the &amp;quot;blended&amp;quot; model with amazing semi-real cartoonification results:&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Norod78/status/1297849293299212288&lt;/p&gt;
&lt;p&gt;I think this approach would make a great way of generating a paired image dataset for training a pix2pixHD model, i.e. the &lt;a href=&quot;https://arxiv.org/abs/2003.03581&quot;&gt;StyleGAN Distillation approach&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;resolution-dependent-interpolation-foxes-ponies-people-and-furries&quot; tabindex=&quot;-1&quot;&gt;Resolution dependent interpolation: foxes, ponies, people, and furries &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;It was originally &lt;a href=&quot;https://twitter.com/arfafax&quot;&gt;Arfa&lt;/a&gt; who asked me to share some of the layer swapping code I had been working on. He followed up by combining both the weight interpolation and layer swapping ideas, combining a bunch of different models (with some neat visualisations):&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/arfafax/status/1297694374470402055&lt;/p&gt;
&lt;p&gt;The results are pretty amazing, this sort of &lt;strong&gt;&amp;quot;resolution dependent model interpolation&amp;quot;&lt;/strong&gt; is the logical generalisation of both the interpolation and swapping ideas. It looks like it gives a completely new axis of control over a generative model (assuming you have some fine-tuned models which can be combined). Take these example frames from one of the above videos:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/arfa-frames.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/DhONn6FFCp-200.webp 200w, https://justinpinkney.com/img/DhONn6FFCp-320.webp 320w, https://justinpinkney.com/img/DhONn6FFCp-500.webp 500w, https://justinpinkney.com/img/DhONn6FFCp-800.webp 800w, https://justinpinkney.com/img/DhONn6FFCp-1024.webp 1024w, https://justinpinkney.com/img/DhONn6FFCp-1600.webp 1600w, https://justinpinkney.com/img/DhONn6FFCp-2200.webp 2200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/DhONn6FFCp-200.jpeg 200w, https://justinpinkney.com/img/DhONn6FFCp-320.jpeg 320w, https://justinpinkney.com/img/DhONn6FFCp-500.jpeg 500w, https://justinpinkney.com/img/DhONn6FFCp-800.jpeg 800w, https://justinpinkney.com/img/DhONn6FFCp-1024.jpeg 1024w, https://justinpinkney.com/img/DhONn6FFCp-1600.jpeg 1600w, https://justinpinkney.com/img/DhONn6FFCp-2200.jpeg 2200w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/DhONn6FFCp-200.jpeg&quot; width=&quot;2200&quot; height=&quot;720&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On the left is the output of the &lt;strong&gt;anime model&lt;/strong&gt;, on the right the &lt;strong&gt;my little pony model&lt;/strong&gt;, and in the middle the mid-resolution layers have been transplanted from &lt;strong&gt;my little pony&lt;/strong&gt; into &lt;strong&gt;anime&lt;/strong&gt;. This essentially introduces middle resolution features such as the eyes and nose from &lt;strong&gt;my little pony&lt;/strong&gt; into &lt;strong&gt;anime&lt;/strong&gt; characters!&lt;/p&gt;
&lt;h3 id=&quot;parameter-tuning&quot; tabindex=&quot;-1&quot;&gt;Parameter tuning &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/CitizenPlain&quot;&gt;Nathan Shipley&lt;/a&gt; made some beautiful experiments trying to get the &lt;a href=&quot;https://justinpinkney.com/blog/2020/toonify-yourself&quot;&gt;Toonification effect&lt;/a&gt; just right by adjusting two of the key parameters: the amount of transfer learning to apply (measured in thousands of iterations) and the resolution layer from which to swap. By tuning these two you can pick out just the degree of Toonificaiton to apply, see this lovely figure made by Nathan:&lt;/p&gt;
 &lt;link rel=&quot;stylesheet&quot; href=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.css&quot; integrity=&quot;sha256-p4NxAoJBhIIN+hmNHrzRCf9tD/miZyoHS5obTRR9BMY=&quot; crossorigin=&quot;&quot;&gt;
 &lt;!-- Make sure you put this AFTER Leaflet&#39;s CSS --&gt;
 &lt;script src=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.js&quot; integrity=&quot;sha256-20nQCchB9co0qIjJZRGuk2/Z9VM+kNiyxNV1lvTlZBo=&quot; crossorigin=&quot;&quot;&gt;&lt;/script&gt;
 &lt;div id=&quot;map&quot; style=&quot;height: 400px&quot;&gt;&lt;/div&gt;
 &lt;script&gt;

	const map = L.map(&#39;map&#39;, {
        crs: L.CRS.Simple
    }).setView([-0.25, 0.35], 10);

	const tiles = L.tileLayer(
        &quot;https://assets.justinpinkney.com/blog/toonification/toonify-obama_files/{z}/{x}_{y}.jpg&quot;,
        {minZoom:9, maxZoom:14 }
    ).addTo(map);

&lt;/script&gt;
&lt;p&gt;Then he applied First Order Motion model and you get some pretty amazing results:&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/CitizenPlain/status/1308824021803372549&lt;/p&gt;
&lt;h2 id=&quot;going-further&quot; tabindex=&quot;-1&quot;&gt;Going further &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I think there&#39;s lots of potential to look at these blending strategies further, in particular not only interpolating between models dependent on the resolution, but also differently for different channels. If you can identify the subset of neurons which correspond (for example) to the my little pony eyes you could swap those specifically into the anime model, and be able to modify the eyes without affecting other features, such as the nose. Simple clustering of the internal activations has already been shown to be an effective way of identifying neurons which correspond to attributes in the image in the &lt;a href=&quot;https://arxiv.org/abs/2004.14367&quot;&gt;Editing in Style paper&lt;/a&gt; so this seems pretty straightforward to try!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending/clustering_activations.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/8BDpFwG1xZ-200.webp 200w, https://justinpinkney.com/img/8BDpFwG1xZ-320.webp 320w, https://justinpinkney.com/img/8BDpFwG1xZ-500.webp 500w, https://justinpinkney.com/img/8BDpFwG1xZ-800.webp 800w, https://justinpinkney.com/img/8BDpFwG1xZ-1024.webp 1024w, https://justinpinkney.com/img/8BDpFwG1xZ-1600.webp 1600w, https://justinpinkney.com/img/8BDpFwG1xZ-1737.webp 1737w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/8BDpFwG1xZ-200.jpeg 200w, https://justinpinkney.com/img/8BDpFwG1xZ-320.jpeg 320w, https://justinpinkney.com/img/8BDpFwG1xZ-500.jpeg 500w, https://justinpinkney.com/img/8BDpFwG1xZ-800.jpeg 800w, https://justinpinkney.com/img/8BDpFwG1xZ-1024.jpeg 1024w, https://justinpinkney.com/img/8BDpFwG1xZ-1600.jpeg 1600w, https://justinpinkney.com/img/8BDpFwG1xZ-1737.jpeg 1737w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/8BDpFwG1xZ-200.jpeg&quot; width=&quot;1737&quot; height=&quot;708&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[^version]: I did all this work using StyleGAN2, but have generally taken to referring to both versions 1 and 2 as StyleGAN, StyleGAN 1 is just config-a in the StyleGAN 2 code.&lt;/p&gt;
&lt;p&gt;[^interpolation]: Interpolating two StyleGAN models has been used quite a bit by many on Twitter to mix models for interesting results. And as far as I&#39;m aware the idea first cropped up in generative models in the &lt;a href=&quot;https://arxiv.org/abs/1809.00219&quot;&gt;ESRGAN paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;[^style-transfer]: If this sounds a little bit like style-transfer then you&#39;re not far off there are some similarities. In face StyleGAN&#39;s architecture was inspired by networks designed for style transfer.&lt;/p&gt;
&lt;p&gt;[^texture]: It seems to be a general rule that neural networks are better at adding texture than removing it.&lt;/p&gt;
&lt;p&gt;[^models]: If you&#39;re in search of some models to blend then see collection of &lt;a href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;pretrained stylegan&lt;/a&gt; models (I intend to add a field as to which ones have been fine-tuned in the near future.)&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>From feature space to input space</title>
		<link href="https://justinpinkney.com/blog/2020/feature-space-input-space/"/>
		<updated>2020-08-05T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/feature-space-input-space/</id>
		<content type="html">&lt;h2 id=&quot;input-space-and-feature-space-are-different-sides-of-the-looking-glass&quot; tabindex=&quot;-1&quot;&gt;Input space and Feature space are different sides of the looking glass &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/ext-alice1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/YtEExgOCM0-200.webp 200w, https://justinpinkney.com/img/YtEExgOCM0-320.webp 320w, https://justinpinkney.com/img/YtEExgOCM0-500.webp 500w, https://justinpinkney.com/img/YtEExgOCM0-772.webp 772w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/YtEExgOCM0-200.jpeg 200w, https://justinpinkney.com/img/YtEExgOCM0-320.jpeg 320w, https://justinpinkney.com/img/YtEExgOCM0-500.jpeg 500w, https://justinpinkney.com/img/YtEExgOCM0-772.jpeg 772w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;alice in wonderland&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/YtEExgOCM0-200.jpeg&quot; width=&quot;772&quot; height=&quot;472&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Neural networks (for image tasks) take in images and put them through a series of, typically convolutional, transformations. The world of those input images consisting of red, green, and blue pixel values we can call &lt;strong&gt;input space&lt;/strong&gt; and the intermediate values that get computed in the network, the transformed versions of the images, we might call &lt;strong&gt;feature space&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There are many different levels of feature space, one for each layer in the network, and there is much work to try and understand what the values and dimensions in this space represent[^feature-vis]. But fundamentally these internal representations computed by the network just look like images, ones with (generally) many more channels that the three colours we are familiar with.&lt;/p&gt;
&lt;p&gt;TODO find nice diagram of input space and feature space in the network&lt;/p&gt;
&lt;h2 id=&quot;how-is-feature-space-just-like-input-space&quot; tabindex=&quot;-1&quot;&gt;How is Feature space just like Input space? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As pointed out in the original batchnorm paper we can consider any portion of a network as a &amp;quot;sub-network&amp;quot; which takes in some activations from the previous layers and applies transforms to this. &lt;strong&gt;In this sense the previous layer&#39;s feature space is just the input space for the later layers.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Given that there is such a strong connection between input and feature space, it makes sense that anything we can do to improve the training process of the network in one space might be useful to also do in the other.&lt;/p&gt;
&lt;h2 id=&quot;batch-normalisation-is-just-input-normalisation-in-feature-space&quot; tabindex=&quot;-1&quot;&gt;Batch normalisation is just input normalisation in feature space &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;It has been  long  known that the network training converges faster if its inputs are whitened – i.e., linearly transformed to have zero means and unit variances, and decorrelated. As each layer observes the inputs produced by the layers below, it would be advantageous to achieve the same whitening of the inputs of each layer.[^ioffe2015batch]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It is standard practice to normalise the inputs of a machine learning model, at least ensuring the inputs have mean zero and a standard deviation of 1. The logic which inspired batch normalisation is that if such an operation is beneficial in input space, it is logical to expect it to also be of benefit in feature space.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Convergence is usually faster if the average of each input variable over the training set is close to zero. [^lecun-backprop]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The invention of batch normalisation was extremely important important in deep learning, it allowed efficient training of increasingly large networks at higher learn rates and has become standard practice in many neural network architectures. Since the original paper&#39;s discussion of how batchnorm helps by reducing &amp;quot;internal covariate shift&amp;quot; there has been some confusion as to how exactly it works, but for the best discussion I&#39;ve seen, look at this Twitter thread by David Page:&lt;/p&gt;
&lt;p&gt;https://twitter.com/dcpage3/status/1171867587417952260&lt;/p&gt;
&lt;p&gt;Clearly, adapting the concept of input normalisation to feature space is not trivial, and batch normalisation requires various new mechanisms and non-obvious assumptions to work (e.g. that batch statistics are sufficient, learned scale and offset parameters, and changes in behaviour between train and test time). In fact many of these unusual features of mean that Batch Normalisation turns out to often be a problematic layer.&lt;/p&gt;
&lt;h2 id=&quot;dropout-works-in-feature-space&quot; tabindex=&quot;-1&quot;&gt;Dropout works in feature space &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/ext-alice2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/pZii0escYk-200.webp 200w, https://justinpinkney.com/img/pZii0escYk-320.webp 320w, https://justinpinkney.com/img/pZii0escYk-500.webp 500w, https://justinpinkney.com/img/pZii0escYk-800.webp 800w, https://justinpinkney.com/img/pZii0escYk-1000.webp 1000w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/pZii0escYk-200.jpeg 200w, https://justinpinkney.com/img/pZii0escYk-320.jpeg 320w, https://justinpinkney.com/img/pZii0escYk-500.jpeg 500w, https://justinpinkney.com/img/pZii0escYk-800.jpeg 800w, https://justinpinkney.com/img/pZii0escYk-1000.jpeg 1000w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/pZii0escYk-200.jpeg&quot; width=&quot;1000&quot; height=&quot;929&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This pattern of taking concepts from one space and translating them to the other has also been applied to the regularisation technique of dropout. In dropout we randomly set the activations of some neurons to zero during training. It was popularised by use in AlexNet and for a while was extremely common practice in deep learning. The fact that this rather extreme procedure of setting (often significant) amounts of the feature space pixels to zero should actually work, let alone be beneficial, is not at first glance obvious (at least not to me), but there are various intuitive explanations for why this should be.[^thinking-about-dropout]&lt;/p&gt;
&lt;p&gt;Nowadays dropout is less prevalent than it used to be. This would seem to be in large part due to the fact that it was predominantly used in the final fully connected layers of networks which are no longer common. It doesn&#39;t work so well for convolutional layers, but this is something we&#39;ll come back to later.&lt;/p&gt;
&lt;h2 id=&quot;cutout-is-dropout-in-input-space&quot; tabindex=&quot;-1&quot;&gt;Cutout is dropout in Input space &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/ext-scissors.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/0mK0P7YQKm-200.webp 200w, https://justinpinkney.com/img/0mK0P7YQKm-320.webp 320w, https://justinpinkney.com/img/0mK0P7YQKm-500.webp 500w, https://justinpinkney.com/img/0mK0P7YQKm-800.webp 800w, https://justinpinkney.com/img/0mK0P7YQKm-1024.webp 1024w, https://justinpinkney.com/img/0mK0P7YQKm-1242.webp 1242w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/0mK0P7YQKm-200.jpeg 200w, https://justinpinkney.com/img/0mK0P7YQKm-320.jpeg 320w, https://justinpinkney.com/img/0mK0P7YQKm-500.jpeg 500w, https://justinpinkney.com/img/0mK0P7YQKm-800.jpeg 800w, https://justinpinkney.com/img/0mK0P7YQKm-1024.jpeg 1024w, https://justinpinkney.com/img/0mK0P7YQKm-1242.jpeg 1242w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/0mK0P7YQKm-200.jpeg&quot; width=&quot;1242&quot; height=&quot;879&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So if dropout was a successful approach for regularisation which works by zeroing pixels in feature space, can this be translated to input space? One problem is that dropout only seems to work well for fully connected layers and not so much for convolutional ones.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While dropout was found to be very effective at regular-izing fully-connected layers, it appears to be less powerfulwhen used with convolutional layers ...&lt;/p&gt;
&lt;p&gt;... neighbouring pixels in images share much of the same information.  If any of them are dropped out then the information they contain will likely still be passed on from the neighbouring pixels that are still active. [^cutout]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That same reasoning is also going to apply to images, but like we saw before porting these ideas from one space to another is never quite straightforward (if it was, it would have been done already).&lt;/p&gt;
&lt;p&gt;DeVries and Taylor introduced a data augmentation method called Cutout in the paper &amp;quot;Improved Regularization of Convolutional Neural Networks with Cutout&amp;quot; where they set random rectangles of the input images to be zero, and as they point out:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Cutout can be interpreted as applying a spatial prior to dropout in input space [^cutout]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The spatial prior they mention above is the fact that they remove a contiguous patch of the input images (they use a rectangle but state that the shape is not important) rather than zeroing random pixels as in dropout.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/flowers-cut.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/IRZ4WLf8X4-200.webp 200w, https://justinpinkney.com/img/IRZ4WLf8X4-320.webp 320w, https://justinpinkney.com/img/IRZ4WLf8X4-500.webp 500w, https://justinpinkney.com/img/IRZ4WLf8X4-800.webp 800w, https://justinpinkney.com/img/IRZ4WLf8X4-1024.webp 1024w, https://justinpinkney.com/img/IRZ4WLf8X4-1600.webp 1600w, https://justinpinkney.com/img/IRZ4WLf8X4-1814.webp 1814w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/IRZ4WLf8X4-200.jpeg 200w, https://justinpinkney.com/img/IRZ4WLf8X4-320.jpeg 320w, https://justinpinkney.com/img/IRZ4WLf8X4-500.jpeg 500w, https://justinpinkney.com/img/IRZ4WLf8X4-800.jpeg 800w, https://justinpinkney.com/img/IRZ4WLf8X4-1024.jpeg 1024w, https://justinpinkney.com/img/IRZ4WLf8X4-1600.jpeg 1600w, https://justinpinkney.com/img/IRZ4WLf8X4-1814.jpeg 1814w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/IRZ4WLf8X4-200.jpeg&quot; width=&quot;1814&quot; height=&quot;1210&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One other difference here is the difference in training and test time weight scaling that is used in dropout is no longer required. For the input space we centre our data around zero (which is not true in feature space), so as long as we fill our patches zero won&#39;t affect the expected bias.&lt;/p&gt;
&lt;h2 id=&quot;feature-space-to-input-space-and-back-again&quot; tabindex=&quot;-1&quot;&gt;Feature space to input space and back again... &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;In fact those same modifications to dropout that made it work for input space can be ported back to feature space to solve the problem we pointed out earlier: dropout doesn&#39;t work well for convolution layers.&lt;/p&gt;
&lt;p&gt;The Dropblock paper shows how the idea of zeroing out a contiguous region allows us to use dropout it feature space effectively. As the authors state &amp;quot;DropBlock generalizes Cutout by applying Cutout at every feature map in a convolutional networks.&amp;quot;[^dropblock]&lt;/p&gt;
&lt;p&gt;Unfortunately getting dropblock to work effectively seems to require hyper-parameter tuning, in terms of block size and where it is applied, as well as tweaking training by gradually increasing the amount of drop. The paper also states that &amp;quot;Although Cutout improves accuracy on the CIFAR-10 dataset ... it does not improve the accuracy on the ImageNet dataset in our experiments&amp;quot;.&lt;/p&gt;
&lt;h2 id=&quot;what-other-innovations-can-be-made-by-generalising-ideas-between-spaces&quot; tabindex=&quot;-1&quot;&gt;What other innovations can be made by generalising ideas between spaces? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So that&#39;s a couple of examples of how generalising ideas which were successful in one space to the other space can lead to useful innovations in deep learning, and there are more like MixUp and Manifold MixUp.&lt;/p&gt;
&lt;p&gt;What others concepts could be transferred from one space to the other? One possibility is the use of image transforms for data augmentation. Random horizontal flips, or small amount of resizing are commonly applied to input images. Could the same be applied to feature space? One complication, particularly for horizontal reflection, is that pixels in feature space are likely to have some directionality that pixels in image space do not. So maybe random rescaling is a better place to start? But as we&#39;ve seen above, methods often need some modification to be relevant in the other space.&lt;/p&gt;
&lt;p&gt;In fact we can also look for other &amp;quot;spaces&amp;quot; to apply our ideas. For example the parameters of the network itself can be considered to be a high-dimensional &lt;strong&gt;weight space&lt;/strong&gt;. And although the properties of this are likely to be somewhat different to feature or input space, perhaps some of the ideas can also transition.[^weight-standardisation]&lt;/p&gt;
&lt;h3 id=&quot;image-credits&quot; tabindex=&quot;-1&quot;&gt;Image credits &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/feature-space-input-space/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://commons.wikimedia.org/wiki/Category:John_Tenniel%27s_illustrations_of_Through_the_Looking-Glass_and_What_Alice_Found_There&quot;&gt;Illustrations of Through the Looking Glass by John Tenniel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.si.edu/object/editor-and-giraffe:chndm_1938-57-1070-184&quot;&gt;The Editor and the Giraffe by Frederick Stuart Church&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;[^feature-vis]: For beautiful visuals and great insights into what this feature space represents see the fantastic work on feature visualisation and the follow ups by Chris Olah. &lt;br&gt;&lt;br&gt; Olah, Chris, Alexander Mordvintsev, and Ludwig Schubert. ‘Feature Visualization’. Distill 2, no. 11 (7 November 2017): e7. https://doi.org/10.23915/distill.00007.&lt;/p&gt;
&lt;p&gt;[^ioffe2015batch]: Ioffe, Sergey, and Christian Szegedy. ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’. ArXiv:1502.03167 [Cs], 2 March 2015. http://arxiv.org/abs/1502.03167.&lt;/p&gt;
&lt;p&gt;[^cutout]: DeVries, Terrance, and Graham W. Taylor. ‘Improved Regularization of Convolutional Neural Networks with Cutout’. ArXiv:1708.04552 [Cs], 29 November 2017. http://arxiv.org/abs/1708.04552.&lt;/p&gt;
&lt;p&gt;[^thinking-about-dropout]: There are various different ways to interpret how dropout is working: simply adding noise, approximating averaging an ensemble, or prevent co-adaptation. The original paper outlines these and it would be interesting to revist what evidence there is for what intuition is correct.&lt;/p&gt;
&lt;p&gt;[^lecun-backprop]: LeCun, Yann A., Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. ‘Efficient BackProp’. In Neural Networks: Tricks of the Trade: Second Edition, edited by Grégoire Montavon, Geneviève B. Orr, and Klaus-Robert Müller, 9–48. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2012. https://doi.org/10.1007/978-3-642-35289-8_3.&lt;/p&gt;
&lt;p&gt;[^dropblock]: Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. ‘DropBlock: A Regularization Method for Convolutional Networks’. ArXiv:1810.12890 [Cs], 30 October 2018. http://arxiv.org/abs/1810.12890.&lt;/p&gt;
&lt;p&gt;[^weight-standardisation]: Qiao, Siyuan, Huiyu Wang, Chenxi Liu, Wei Shen, and Alan Yuille. ‘Weight Standardization’. ArXiv:1903.10520 [Cs], 25 March 2019. http://arxiv.org/abs/1903.10520.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>About this site 🏡</title>
		<link href="https://justinpinkney.com/blog/2020/this-site/"/>
		<updated>2020-07-19T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/this-site/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This is old, since then I gave up the pain of Gatsby and updated the site to run on 11ty&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I&#39;ve had many different versions of a home page over the years. I&#39;ve left traces on lots of different social media sites (but only really use Twitter now), kept a food blog[^recipes], posted about generative art on a wordpress blog, even written &lt;a href=&quot;https://github.com/justinpinkney/sissigen&quot;&gt;my own simple static site generator&lt;/a&gt; (for no apparent reason).&lt;/p&gt;
&lt;p&gt;This is my latest attempt (and I&#39;m sure it&#39;ll stick this time, honest...). Hopefully this will be a place for me to share some of the more interesting things I make and do and even if it&#39;s not much interest to anyone else, hopefully I&#39;ll like it. Keep up to date with the &lt;a href=&quot;https://justinpinkney.com/blog/2020/this-site/rss.xml&quot;&gt;&lt;strong&gt;RSS feed&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The site is built with &lt;a href=&quot;https://www.gatsbyjs.org/&quot;&gt;Gatsby&lt;/a&gt; and will give me somewhere to play with web technologies and learn a little more JavaScript and React. Under the hood I&#39;m writing posts in MDX so I can include slightly &lt;a href=&quot;https://justinpinkney.com/blog/2020/trying-leaflet&quot;&gt;more interactive content&lt;/a&gt; that in plain old markdown blogs. I&#39;ve enjoyed seeing some of the recent activity around digital gardens and the indie web and hope to incorporate some of those ideas here too.&lt;/p&gt;
&lt;p&gt;Gatsby has impressed me with its ease of use and adaptability, it&#39;s made making websites fun again.&lt;/p&gt;
&lt;h2 id=&quot;making-use-of-these-great-things&quot; tabindex=&quot;-1&quot;&gt;Making use of these great things &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/this-site/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.gatsbyjs.org/&quot;&gt;Gatsby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://mdxjs.com/&quot;&gt;MDX&lt;/a&gt; - for writing posts&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/PaulieScanlon/gatsby-mdx-embed&quot;&gt;MDX embed&lt;/a&gt; - for embedding tweets and other things&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://leafletjs.com/&quot;&gt;Leaflet&lt;/a&gt; - for &lt;a href=&quot;https://justinpinkney.com/blog/2020/trying-leaflet&quot;&gt;displaying big images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/dweirich/gatsby-plugin-react-leaflet&quot;&gt;gatsby-plugin-react-leaflet&lt;/a&gt; - for working with leaflet&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/goblindegook/littlefoot&quot;&gt;Littlefoot&lt;/a&gt; - for foot note pop overs[^littlefoot]&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tinydb.readthedocs.io/en/stable/&quot;&gt;TinyDB&lt;/a&gt;, &lt;a href=&quot;https://pysftp.readthedocs.io/en/release_0.2.9/&quot;&gt;pysftp&lt;/a&gt;, &lt;a href=&quot;https://imageio.github.io/&quot;&gt;imageio&lt;/a&gt;, and &lt;a href=&quot;https://pillow.readthedocs.io/en/stable/&quot;&gt;Pillow&lt;/a&gt; for managing my stream (now dead)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/TolonUK/Leaflet.EdgeBuffer&quot;&gt;Leaflet EdgeBuffer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;[^littlefoot]: Getting Littlefoot to work correctly on the Gatsby production build actually requires something a little different to that described in the readme, see &lt;a href=&quot;https://github.com/goblindegook/littlefoot/issues/338&quot;&gt;this issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;[^recipes]: I still keep some recipes at https://recipes.justinpinkney.com but these are just for me and that food blog is long dead.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>CakeGAN 🍰</title>
		<link href="https://justinpinkney.com/blog/2020/cakegan/"/>
		<updated>2020-07-14T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/cakegan/</id>
		<content type="html">&lt;p&gt;These cakes are a lie, they&#39;re all generated by a neural network.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://assets.justinpinkney.com/blog/cakegan/gridloop.mp4&quot; loop=&quot;true&quot; preload=&quot;auto&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Random fake cakes are nice, but eventually my grand idea is to make a bespoke cake designer (I&#39;m in this for the latent space you know!). Before I can do that I need to train a model, let&#39;s call it CakeGAN. So here is a delightful recipe for how to bake your very own non-existant cakes.&lt;/p&gt;
&lt;h1 id=&quot;how-to-bake-a-cakegan&quot; tabindex=&quot;-1&quot;&gt;How to bake a CakeGAN &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id=&quot;ingredients&quot; tabindex=&quot;-1&quot;&gt;Ingredients &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data&lt;/strong&gt; - 40,000+ images of cakes of all varieties cut into equal bite-sized pieces&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architecture&lt;/strong&gt; - StyleGAN 2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code&lt;/strong&gt; - &lt;a href=&quot;https://github.com/justinpinkney/stylegan2&quot;&gt;My StyleGAN2 fork&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute&lt;/strong&gt; - A powerful GPU (Google Colab will do just fine)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;method&quot; tabindex=&quot;-1&quot;&gt;Method &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id=&quot;collect-your-data&quot; tabindex=&quot;-1&quot;&gt;Collect your data &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Scraping an image search engine is a pretty easy way of assembling a pretty huge, but low quality dataset. I used &lt;a href=&quot;https://github.com/ultralytics/google-images-download&quot;&gt;this Bing image search scraper&lt;/a&gt;[^1] to grab a bunch of images for every search term I could think of relating to cakes. After a little time I had many GBs of (mostly) cake images on my hard drive.&lt;/p&gt;
&lt;p&gt;That&#39;s the raw material assembled, but it&#39;s going to need some refinement first.&lt;/p&gt;
&lt;h3 id=&quot;prepare-your-data&quot; tabindex=&quot;-1&quot;&gt;Prepare your data &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The images you get back will probably be a pretty mixed bag&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/cakegan/real_not_cakes.jpg&quot; alt=&quot;These are not cakes! For real this time.&quot;&gt;&lt;/p&gt;
&lt;p&gt;In their raw form they&#39;re not really suitable for training StyleGAN, in particular the images need to be a fixed aspect ratio (I&#39;m going to go for square). It&#39;s a generally accepted piece of Deep Learning Folklore that StyleGAN gives better quality results when trying to generate images which have centrally placed objects[^2].&lt;/p&gt;
&lt;p&gt;Conveniently there are a wealth of pre-trained cake detectors available! They aren&#39;t usually called that, but it happens that &amp;quot;cake&amp;quot; is one of the categories in the COCO object detection challenge, so any COCO detector can find cakes. Using the best model from &lt;a href=&quot;https://github.com/facebookresearch/detectron2&quot;&gt;Detectron2&lt;/a&gt; makes this a pretty easy task.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/process.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/_Zhl3Mf7xe-200.webp 200w, https://justinpinkney.com/img/_Zhl3Mf7xe-320.webp 320w, https://justinpinkney.com/img/_Zhl3Mf7xe-500.webp 500w, https://justinpinkney.com/img/_Zhl3Mf7xe-800.webp 800w, https://justinpinkney.com/img/_Zhl3Mf7xe-1024.webp 1024w, https://justinpinkney.com/img/_Zhl3Mf7xe-1319.webp 1319w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/_Zhl3Mf7xe-200.jpeg 200w, https://justinpinkney.com/img/_Zhl3Mf7xe-320.jpeg 320w, https://justinpinkney.com/img/_Zhl3Mf7xe-500.jpeg 500w, https://justinpinkney.com/img/_Zhl3Mf7xe-800.jpeg 800w, https://justinpinkney.com/img/_Zhl3Mf7xe-1024.jpeg 1024w, https://justinpinkney.com/img/_Zhl3Mf7xe-1319.jpeg 1319w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/_Zhl3Mf7xe-200.jpeg&quot; width=&quot;1319&quot; height=&quot;790&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Then sprinkle on a bit of padding, cropping, and resizing as required and that&#39;s the main ingredient ready to go, here&#39;s what it should look like:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/cakes-prepped.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/jwvqirPSK_-200.webp 200w, https://justinpinkney.com/img/jwvqirPSK_-320.webp 320w, https://justinpinkney.com/img/jwvqirPSK_-500.webp 500w, https://justinpinkney.com/img/jwvqirPSK_-800.webp 800w, https://justinpinkney.com/img/jwvqirPSK_-1024.webp 1024w, https://justinpinkney.com/img/jwvqirPSK_-1218.webp 1218w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/jwvqirPSK_-200.jpeg 200w, https://justinpinkney.com/img/jwvqirPSK_-320.jpeg 320w, https://justinpinkney.com/img/jwvqirPSK_-500.jpeg 500w, https://justinpinkney.com/img/jwvqirPSK_-800.jpeg 800w, https://justinpinkney.com/img/jwvqirPSK_-1024.jpeg 1024w, https://justinpinkney.com/img/jwvqirPSK_-1218.jpeg 1218w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/jwvqirPSK_-200.jpeg&quot; width=&quot;1218&quot; height=&quot;671&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;train-your-model&quot; tabindex=&quot;-1&quot;&gt;Train your model &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Next it&#39;s time to start baking your model in a warm GPU heated oven. The model I&#39;m training is 256x256 (config-e) which is a bit lighter weight than a the full fat 1024x1024 (config-f) model primarily used in the paper.&lt;/p&gt;
&lt;p&gt;I use Google Colab for training, and use a fork[^3] of &lt;a href=&quot;https://github.com/skyflynil/stylegan2&quot;&gt;Skyflynil&#39;s StyleGAN2&lt;/a&gt; repo which allows you to keep the training data as jpegs (stuffed inside tfrecords), this keeps that dataset size now to a reasonable number of GBs.&lt;/p&gt;
&lt;p&gt;If you get a P100 via Colab the training process takes about 9 minutes per tick[^4] (make sure you optimise your GPU memory usage for the smaller model by setting the &lt;code&gt;minibatch_gpu_base&lt;/code&gt; to 16). Calculating the FID takes around 10 minutes and I think is worth it to keep track of progress.&lt;/p&gt;
&lt;p&gt;Overall I trained for around 5 million images (which works out around 6 days of training) achieving an FID of 13.6. The loss and FID curves are shown below, and you can see that I could have probably kept on training (the FID is still going down and there doesn&#39;t seem to be much sign of the discriminator over fitting[^5]) but the gains seemed pretty marginal at that point.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/fid.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ZrEdzcOUzQ-200.webp 200w, https://justinpinkney.com/img/ZrEdzcOUzQ-320.webp 320w, https://justinpinkney.com/img/ZrEdzcOUzQ-500.webp 500w, https://justinpinkney.com/img/ZrEdzcOUzQ-540.webp 540w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/ZrEdzcOUzQ-200.png 200w, https://justinpinkney.com/img/ZrEdzcOUzQ-320.png 320w, https://justinpinkney.com/img/ZrEdzcOUzQ-500.png 500w, https://justinpinkney.com/img/ZrEdzcOUzQ-540.png 540w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ZrEdzcOUzQ-200.png&quot; width=&quot;540&quot; height=&quot;396&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/scores.png&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/8GocAK50-O-200.webp 200w, https://justinpinkney.com/img/8GocAK50-O-320.webp 320w, https://justinpinkney.com/img/8GocAK50-O-500.webp 500w, https://justinpinkney.com/img/8GocAK50-O-800.webp 800w, https://justinpinkney.com/img/8GocAK50-O-1024.webp 1024w, https://justinpinkney.com/img/8GocAK50-O-1104.webp 1104w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/png&quot; srcset=&quot;https://justinpinkney.com/img/8GocAK50-O-200.png 200w, https://justinpinkney.com/img/8GocAK50-O-320.png 320w, https://justinpinkney.com/img/8GocAK50-O-500.png 500w, https://justinpinkney.com/img/8GocAK50-O-800.png 800w, https://justinpinkney.com/img/8GocAK50-O-1024.png 1024w, https://justinpinkney.com/img/8GocAK50-O-1104.png 1104w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/8GocAK50-O-200.png&quot; width=&quot;1104&quot; height=&quot;387&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You&#39;ll know when your CakeGAN is fully baked when the FID no longer improves, or a skewer inserted comes out clean or it springs back lightly when poked. Here&#39;s a small selection of generated cakes.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/fakesgrid.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ZJJQ-J_rtK-200.webp 200w, https://justinpinkney.com/img/ZJJQ-J_rtK-320.webp 320w, https://justinpinkney.com/img/ZJJQ-J_rtK-500.webp 500w, https://justinpinkney.com/img/ZJJQ-J_rtK-800.webp 800w, https://justinpinkney.com/img/ZJJQ-J_rtK-1024.webp 1024w, https://justinpinkney.com/img/ZJJQ-J_rtK-1600.webp 1600w, https://justinpinkney.com/img/ZJJQ-J_rtK-4772.webp 4772w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ZJJQ-J_rtK-200.jpeg 200w, https://justinpinkney.com/img/ZJJQ-J_rtK-320.jpeg 320w, https://justinpinkney.com/img/ZJJQ-J_rtK-500.jpeg 500w, https://justinpinkney.com/img/ZJJQ-J_rtK-800.jpeg 800w, https://justinpinkney.com/img/ZJJQ-J_rtK-1024.jpeg 1024w, https://justinpinkney.com/img/ZJJQ-J_rtK-1600.jpeg 1600w, https://justinpinkney.com/img/ZJJQ-J_rtK-4772.jpeg 4772w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ZJJQ-J_rtK-200.jpeg&quot; width=&quot;4772&quot; height=&quot;2659&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And below is a zoomable grid arranged using feature extracted using an imagenet pretrained CNN projected into 2D using UMAP.&lt;/p&gt;
 &lt;link rel=&quot;stylesheet&quot; href=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.css&quot; integrity=&quot;sha256-p4NxAoJBhIIN+hmNHrzRCf9tD/miZyoHS5obTRR9BMY=&quot; crossorigin=&quot;&quot;&gt;
 &lt;!-- Make sure you put this AFTER Leaflet&#39;s CSS --&gt;
 &lt;script src=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.js&quot; integrity=&quot;sha256-20nQCchB9co0qIjJZRGuk2/Z9VM+kNiyxNV1lvTlZBo=&quot; crossorigin=&quot;&quot;&gt;&lt;/script&gt;
 &lt;div id=&quot;map&quot; style=&quot;height: 400px&quot;&gt;&lt;/div&gt;
 &lt;script&gt;

	const map = L.map(&#39;map&#39;, {
        crs: L.CRS.Simple
    }).setView([-0.25, 0.35], 12);

	const tiles = L.tileLayer(
        &quot;https://assets.justinpinkney.com/blog/cakegan/cakes-generated_files//{z}/{x}_{y}.jpg&quot;,
        {minZoom:9, maxZoom:15 }
    ).addTo(map);

&lt;/script&gt;
&lt;p&gt;You can download the final trained model from my &lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan2/#cakes&quot;&gt;Awesome Pretrained StyleGAN 2 repo&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;serving&quot; tabindex=&quot;-1&quot;&gt;Serving &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/cakegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Once we&#39;ve the model has cooled it should be serving up some delicious interpolation videos. But the best part of a well baked GAN is the smooth and creamy latent space. For a start StyleGAN is famous (in part) for its style mixing capabilities, so let&#39;s mix some cakes.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/grid-0_4.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Re52O6hJYV-200.webp 200w, https://justinpinkney.com/img/Re52O6hJYV-320.webp 320w, https://justinpinkney.com/img/Re52O6hJYV-500.webp 500w, https://justinpinkney.com/img/Re52O6hJYV-800.webp 800w, https://justinpinkney.com/img/Re52O6hJYV-1024.webp 1024w, https://justinpinkney.com/img/Re52O6hJYV-1600.webp 1600w, https://justinpinkney.com/img/Re52O6hJYV-1792.webp 1792w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Re52O6hJYV-200.jpeg 200w, https://justinpinkney.com/img/Re52O6hJYV-320.jpeg 320w, https://justinpinkney.com/img/Re52O6hJYV-500.jpeg 500w, https://justinpinkney.com/img/Re52O6hJYV-800.jpeg 800w, https://justinpinkney.com/img/Re52O6hJYV-1024.jpeg 1024w, https://justinpinkney.com/img/Re52O6hJYV-1600.jpeg 1600w, https://justinpinkney.com/img/Re52O6hJYV-1792.jpeg 1792w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Re52O6hJYV-200.jpeg&quot; width=&quot;1792&quot; height=&quot;1536&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If I want to make a cake editor I need to try and find meaningful directions in the latent space. I&#39;ve got a dataset with some noisy labels I could try and use, but for a first pass using &lt;a href=&quot;https://github.com/harskish/ganspace&quot;&gt;GANSpace&lt;/a&gt; is a quick and easy way of trying to automatically find meaningful directions[^ganspace-notebook].&lt;/p&gt;
&lt;p&gt;Some of the directions found by GANSpace seem promising, there are some hints of cake to slice, or more chocolate or fruit topping vectors. But the results below are very cherry picked and these don&#39;t seem to generalise terribly well.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/cake-fruit.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Mp10TABy3P-200.webp 200w, https://justinpinkney.com/img/Mp10TABy3P-320.webp 320w, https://justinpinkney.com/img/Mp10TABy3P-500.webp 500w, https://justinpinkney.com/img/Mp10TABy3P-800.webp 800w, https://justinpinkney.com/img/Mp10TABy3P-1024.webp 1024w, https://justinpinkney.com/img/Mp10TABy3P-1119.webp 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Mp10TABy3P-200.jpeg 200w, https://justinpinkney.com/img/Mp10TABy3P-320.jpeg 320w, https://justinpinkney.com/img/Mp10TABy3P-500.jpeg 500w, https://justinpinkney.com/img/Mp10TABy3P-800.jpeg 800w, https://justinpinkney.com/img/Mp10TABy3P-1024.jpeg 1024w, https://justinpinkney.com/img/Mp10TABy3P-1119.jpeg 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Mp10TABy3P-200.jpeg&quot; width=&quot;1119&quot; height=&quot;899&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/cake-slice.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Enr7jPKWX5-200.webp 200w, https://justinpinkney.com/img/Enr7jPKWX5-320.webp 320w, https://justinpinkney.com/img/Enr7jPKWX5-500.webp 500w, https://justinpinkney.com/img/Enr7jPKWX5-800.webp 800w, https://justinpinkney.com/img/Enr7jPKWX5-1024.webp 1024w, https://justinpinkney.com/img/Enr7jPKWX5-1119.webp 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Enr7jPKWX5-200.jpeg 200w, https://justinpinkney.com/img/Enr7jPKWX5-320.jpeg 320w, https://justinpinkney.com/img/Enr7jPKWX5-500.jpeg 500w, https://justinpinkney.com/img/Enr7jPKWX5-800.jpeg 800w, https://justinpinkney.com/img/Enr7jPKWX5-1024.jpeg 1024w, https://justinpinkney.com/img/Enr7jPKWX5-1119.jpeg 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Enr7jPKWX5-200.jpeg&quot; width=&quot;1119&quot; height=&quot;899&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/cakegan/cake-chocolate.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/njsvw68s9S-200.webp 200w, https://justinpinkney.com/img/njsvw68s9S-320.webp 320w, https://justinpinkney.com/img/njsvw68s9S-500.webp 500w, https://justinpinkney.com/img/njsvw68s9S-800.webp 800w, https://justinpinkney.com/img/njsvw68s9S-1024.webp 1024w, https://justinpinkney.com/img/njsvw68s9S-1119.webp 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/njsvw68s9S-200.jpeg 200w, https://justinpinkney.com/img/njsvw68s9S-320.jpeg 320w, https://justinpinkney.com/img/njsvw68s9S-500.jpeg 500w, https://justinpinkney.com/img/njsvw68s9S-800.jpeg 800w, https://justinpinkney.com/img/njsvw68s9S-1024.jpeg 1024w, https://justinpinkney.com/img/njsvw68s9S-1119.jpeg 1119w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/njsvw68s9S-200.jpeg&quot; width=&quot;1119&quot; height=&quot;899&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[^1]: This scraper is based on a Google image scraper which now seems to have been broken by changes in the Google images search page.&lt;/p&gt;
&lt;p&gt;[^2]: This may be due to the fact that StyleGAN uses a learned constant as the input to the convolutions, but I&#39;m not aware of any actual evidence for this, (it&#39;s more deep learning hearsay than folklore).&lt;/p&gt;
&lt;p&gt;[^3]: There are so many different forks of the original StyleGAN2 repo that it is a bit of a nightmare. If someone was feeling adventurous they could try and collect all the features into one. The closest I know for this is &lt;a href=&quot;https://github.com/pbaylies/stylegan2&quot;&gt;Peter Baylies fork&lt;/a&gt;, but last time I tried to use it, it was broken 🤷.&lt;/p&gt;
&lt;p&gt;[^4]: 1 tick corresponds to 6000 images&lt;/p&gt;
&lt;p&gt;[^5]: To check this I&#39;d like to use the heuristic presented in &lt;a href=&quot;https://arxiv.org/abs/2006.06676&quot;&gt;&amp;quot;Training Generative Adversarial Networks with Limited Data&amp;quot;&lt;/a&gt; which is the average value of the sign of D(real), but I don&#39;t have access to this data.&lt;/p&gt;
&lt;p&gt;[^ganspace-notebook]: I&#39;ve been using this &lt;a href=&quot;https://twitter.com/realmeatyhuman/status/1263153937596719106&quot;&gt;handy Colab notebook&lt;/a&gt; from &lt;a href=&quot;https://twitter.com/realmeatyhuman&quot;&gt;@realmeatyhuman&lt;/a&gt; for running GANSpace&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Big images with Leaflet 🗺️</title>
		<link href="https://justinpinkney.com/blog/2020/trying-leaflet/"/>
		<updated>2020-07-13T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/trying-leaflet/</id>
		<content type="html">&lt;p&gt;&lt;em&gt;This is all old. Since then I&#39;ve updated my blog to run on 11ty and life is much simpler.&lt;/em&gt;&lt;/p&gt;
 &lt;link rel=&quot;stylesheet&quot; href=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.css&quot; integrity=&quot;sha256-p4NxAoJBhIIN+hmNHrzRCf9tD/miZyoHS5obTRR9BMY=&quot; crossorigin=&quot;&quot;&gt;
 &lt;!-- Make sure you put this AFTER Leaflet&#39;s CSS --&gt;
 &lt;script src=&quot;https://unpkg.com/leaflet@1.9.4/dist/leaflet.js&quot; integrity=&quot;sha256-20nQCchB9co0qIjJZRGuk2/Z9VM+kNiyxNV1lvTlZBo=&quot; crossorigin=&quot;&quot;&gt;&lt;/script&gt;
 &lt;div id=&quot;map&quot; style=&quot;height: 400px&quot;&gt;&lt;/div&gt;
 &lt;script&gt;

	const map = L.map(&#39;map&#39;, {
        crs: L.CRS.Simple
    }).setView([-0.25, 0.35], 10);

	const tiles = L.tileLayer(
        &quot;https://assets.justinpinkney.com/sandbox/montage/montage_files/{z}/{x}_{y}.jpg&quot;,
        {minZoom:9, maxZoom:14 }
    ).addTo(map);

&lt;/script&gt;
&lt;p&gt;&lt;strong&gt;Below is a big image produced by my neural network feature visalisation library &lt;a href=&quot;https://github.com/justinpinkney/sumie&quot;&gt;Sumie&lt;/a&gt; please zoom in.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This page is mostly just a test of the integration of Leaflet for displaying big images on my Gatsby powered homepage. Below are some brief scratchings on how this works so I don&#39;t forget.&lt;/p&gt;
&lt;h2 id=&quot;what-is-mdx-actually-good-for&quot; tabindex=&quot;-1&quot;&gt;What is MDX actually good for? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/trying-leaflet/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I recently moved my website over to MDX as I was tempted by the prospect of adding more interesting and interactive content that is normally in plain old markdown blogs. MDX lets you include react components directly into your markdown syntax documents and render them to a static page with Gatsby. I&#39;ve wanted to put large zoomable images into some of my posts and this seems like a straightforward way.&lt;/p&gt;
&lt;h2 id=&quot;make-tiles-from-a-big-image&quot; tabindex=&quot;-1&quot;&gt;Make tiles from a big image &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/trying-leaflet/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Some of my generative art or visualisation algorithms make images in the range of a few hundred megapixels, displaying these over the web takes a little care. The first step in translating a single giant image into a interactive zoomable is to slice it up into a bunch of tiles at different resolutions. &lt;a href=&quot;https://github.com/openzoom/deepzoom.py&quot;&gt;This Python repo&lt;/a&gt; provides all you need for this step.&lt;/p&gt;
&lt;pre class=&quot;language-Python&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-Python&quot;&gt;import deepzoom

creator = deepzoom.ImageCreator(
    tile_size=256,
    tile_overlap=0,
    tile_format=&quot;jpg&quot;,
    image_quality=0.8,
    resize_filter=&quot;bicubic&quot;,
)

creator.create(&quot;input.tiff&quot;, &quot;output/zoomable.dzi&quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some code like the above will convert a large tiff file into a folder containing subdirectories with many tiles at different resolutions, which I can then upload to my web host.&lt;/p&gt;
&lt;h2 id=&quot;display-with-leaflet&quot; tabindex=&quot;-1&quot;&gt;Display with leaflet &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/trying-leaflet/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Leaflet then provides a pan and zoomable display of the image. Leaflet is normally used for displaying maps, but is simple to adapt it to using the tiles generated above by specifying the correctly formatted url, and &lt;a href=&quot;https://leafletjs.com/examples/crs-simple/crs-simple.html&quot;&gt;setting the co-ordinate reference system to &amp;quot;simple&amp;quot;&lt;/a&gt;, i.e. a plain grid.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;One small issue with the above is that Leaflet doesn&#39;t like fractional tiles so there are currently some weird edge effects which I could solve by making sure I pad all the tiles to the full dimensions.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Incorporating leaflet into a Gatsby site is happily very simple[^1] thanks to the &lt;a href=&quot;https://github.com/dweirich/gatsby-plugin-react-leaflet&quot;&gt;Gatsby React-Leaflet plugin&lt;/a&gt; which takes care of properly wrapping up the existing React-leaflet library (which itself makes Leaflet accessible as React components). Writing the react component required to display the image is very straightforward and then I can directly write the following in my markdown file to give the zoomable image at the top of the page.&lt;/p&gt;
&lt;pre class=&quot;language-markdown&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-markdown&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token tag&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;BigImage&lt;/span&gt;
    &lt;span class=&quot;token attr-name&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;{&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;{center:&lt;/span&gt;[-0.25,&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;0.35],&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;zoom:&lt;/span&gt;10,&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;minZoom:&lt;/span&gt;9,&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;&lt;span class=&quot;token namespace&quot;&gt;maxZoom:&lt;/span&gt;14&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token attr-name&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token attr-name&quot;&gt;tile_url&lt;/span&gt;&lt;span class=&quot;token attr-value&quot;&gt;&lt;span class=&quot;token punctuation attr-equals&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;http://assets.justinpinkney.com/sandbox/montage/montage_files/{z}/{x}_{y}.jpg&lt;span class=&quot;token punctuation&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;/&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One simple gotcha is highlighted on &lt;a href=&quot;https://github.com/dweirich/gatsby-plugin-react-leaflet#step-3&quot;&gt;gatsby-plugin-react-leaflet&#39;s readme&lt;/a&gt;. Make sure any usage of leaflet or react-leaflet are wrapped in a check that the window is defined. During build time the plugin will &lt;a href=&quot;https://www.gatsbyjs.org/docs/debugging-html-builds/#fixing-third-party-modules&quot;&gt;stub out the leaflet loader&lt;/a&gt; so any imports will return an undefined, and if you try and use these during build time (rather than run time, which is what the window check above ensures) you are likely to get x is undefined type errors.&lt;/p&gt;
&lt;p&gt;[^1]:  I actually started off trying to use &lt;a href=&quot;https://openseadragon.github.io/&quot;&gt;OpenSeadragon&lt;/a&gt; which has a much smoother pan and zoom action, but with no existing Gatsby integration it seemed like the harder option to start with.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Awesome Pretrained StyleGAN</title>
		<link href="https://justinpinkney.com/blog/2020/pretrained-stylegan/"/>
		<updated>2020-07-03T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/pretrained-stylegan/</id>
		<content type="html">&lt;p&gt;I maintain two collections of links to StyleGAN models pre-trained on a variety of datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan&quot;&gt;Awesome Pretrained StyleGAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan2&quot;&gt;Awesome Pretrained StyleGAN 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of these have been shared via the very active StyleGAN creative community on twitter, and if you&#39;re aware of any others then please send them my way. Either create an issue or fill out one of the following forms:&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/tiled.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;pretrained-models-are-useful-for-lots-of-things&quot; tabindex=&quot;-1&quot;&gt;Pretrained models are useful for lots of things &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id=&quot;speed-up-model-training-with-transfer-learning&quot; tabindex=&quot;-1&quot;&gt;Speed up model training with transfer learning &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Apart from just generating some example image of whatever the model was trained on, pre-trained models are a super useful for training your own model using transfer learning.&lt;/p&gt;
&lt;p&gt;If you want to train a model which is similar to an existing one, for example &lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself&quot;&gt;Ukiyo-e faces&lt;/a&gt; then taking a pre-trained model as your starting point to get you some decent results within just a few hours rather than days of training time.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/fakes000312.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/CerxBeHzmA-200.webp 200w, https://justinpinkney.com/img/CerxBeHzmA-320.webp 320w, https://justinpinkney.com/img/CerxBeHzmA-500.webp 500w, https://justinpinkney.com/img/CerxBeHzmA-800.webp 800w, https://justinpinkney.com/img/CerxBeHzmA-1024.webp 1024w, https://justinpinkney.com/img/CerxBeHzmA-1600.webp 1600w, https://justinpinkney.com/img/CerxBeHzmA-7168.webp 7168w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/CerxBeHzmA-200.jpeg 200w, https://justinpinkney.com/img/CerxBeHzmA-320.jpeg 320w, https://justinpinkney.com/img/CerxBeHzmA-500.jpeg 500w, https://justinpinkney.com/img/CerxBeHzmA-800.jpeg 800w, https://justinpinkney.com/img/CerxBeHzmA-1024.jpeg 1024w, https://justinpinkney.com/img/CerxBeHzmA-1600.jpeg 1600w, https://justinpinkney.com/img/CerxBeHzmA-7168.jpeg 7168w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/CerxBeHzmA-200.jpeg&quot; width=&quot;7168&quot; height=&quot;4096&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;get-weird-results-with-intermediate-models-and-mixing&quot; tabindex=&quot;-1&quot;&gt;Get weird results with intermediate models and mixing &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Transfer learning (aka fine-tuning) also sometimes gives you some pretty interesting results just as the model starts to transform the generated images from the original objects to the new thing you&#39;re training for. See a couple of nice examples below.&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Norod78/status/1255200236181630979
https://www.twitter.com/mmariansky/status/1226756838613491713&lt;/p&gt;
&lt;p&gt;In fact you can go further and combine two models, one which has been fine-tuned from another and either do weight averaging or &lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself&quot;&gt;layer swapping&lt;/a&gt; to effectively mix the outputs.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/montage.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.webp 200w, https://justinpinkney.com/img/r2lNykNcp0-320.webp 320w, https://justinpinkney.com/img/r2lNykNcp0-500.webp 500w, https://justinpinkney.com/img/r2lNykNcp0-800.webp 800w, https://justinpinkney.com/img/r2lNykNcp0-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.jpeg 200w, https://justinpinkney.com/img/r2lNykNcp0-320.jpeg 320w, https://justinpinkney.com/img/r2lNykNcp0-500.jpeg 500w, https://justinpinkney.com/img/r2lNykNcp0-800.jpeg 800w, https://justinpinkney.com/img/r2lNykNcp0-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.jpeg&quot; width=&quot;1024&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;do-experiments-and-investigate-the-properties-of-a-latent-space&quot; tabindex=&quot;-1&quot;&gt;Do experiments and investigate the properties of a latent space &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Pre-trained models can also be useful if you want to investigate the properties of specific modifications and manipulations of a trained GAN. In fact my Awesome StyleGAN made and appearance in the excellent &lt;a href=&quot;https://github.com/harskish/ganspace&quot;&gt;GANSpace&lt;/a&gt; &lt;a href=&quot;https://arxiv.org/abs/2004.02546&quot;&gt;paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A well trained model is also useful if you just want to &lt;a href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan&quot;&gt;mess around with the internals of StyleGAN&lt;/a&gt;,&lt;/p&gt;
&lt;h2 id=&quot;maybe-transfer-learning-will-become-the-standard-way-to-train-a-gan&quot; tabindex=&quot;-1&quot;&gt;Maybe transfer learning will become the standard way to train a GAN &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Transfer learning is currently a very active area of research in the world of GANs, in particular there have been a bunch of publications recently looking at methods for preventing the discriminator from over-fitting when you only have a small dataset. These would seem to make it possible to get very good results using a pre-trained model and only a small amount of data.&lt;/p&gt;
&lt;h2 id=&quot;contribute&quot; tabindex=&quot;-1&quot;&gt;Contribute! &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you have a StyleGAN model you&#39;d like to share I&#39;d love it if you contribute to the appropriate repository. In particular I see lots of users of RunwayML sharing links to their models on Runway, but not sharing the .pkl files, please set those models free!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan&quot;&gt;Awesome Pretrained StyleGAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan2&quot;&gt;Awesome Pretrained StyleGAN 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan/pretrained.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/fEx6vBmZ7g-200.webp 200w, https://justinpinkney.com/img/fEx6vBmZ7g-320.webp 320w, https://justinpinkney.com/img/fEx6vBmZ7g-500.webp 500w, https://justinpinkney.com/img/fEx6vBmZ7g-800.webp 800w, https://justinpinkney.com/img/fEx6vBmZ7g-959.webp 959w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/fEx6vBmZ7g-200.jpeg 200w, https://justinpinkney.com/img/fEx6vBmZ7g-320.jpeg 320w, https://justinpinkney.com/img/fEx6vBmZ7g-500.jpeg 500w, https://justinpinkney.com/img/fEx6vBmZ7g-800.jpeg 800w, https://justinpinkney.com/img/fEx6vBmZ7g-959.jpeg 959w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/fEx6vBmZ7g-200.jpeg&quot; width=&quot;959&quot; height=&quot;488&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>MATLAB StyleGAN Playground 🙃</title>
		<link href="https://justinpinkney.com/blog/2020/matlab-stylegan/"/>
		<updated>2020-06-18T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/matlab-stylegan/</id>
		<content type="html">&lt;p&gt;Everyone who&#39;s ever seen output from GANs has probably seen faces generated by StyleGAN. Now you can &lt;a href=&quot;https://github.com/justinpinkney/stylegan-matlab-playground&quot;&gt;do the same in MATLAB!&lt;/a&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/circle.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;StyleGAN (and it&#39;s successor) have had a big impact on the use and application of generative models, particularly among artists. Much of this has been a combination of accessible and (fairly) straightforward to run code, great stability in training, a particularly well formed and editable latent space representations, and ease of transfer learning. It&#39;s been taken up very readily by people on twitter and tools like runwayml make it very accessible for non-programmers too. There&#39;s also a wealth of &lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan&quot;&gt;pre-trained models&lt;/a&gt; available, and I maintain my own &lt;a href=&quot;https://justinpinkney.com/blog/2020/pretrained-stylegan&quot;&gt;collection of them&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One thing that makes StyleGAN less appealing is that they&#39;re written in TensorFlow. And although they&#39;re very performant it makes them much more less accessible to modify and adapt. In fact it&#39;s a telling reflection on the differences between deep learning frameworks that many subsequent papers proposing modifications and extensions to StyleGAN have actually chosen to use a PyTorch port rather than the TensorFlow original.&lt;/p&gt;
&lt;h2 id=&quot;stylegan-in-matlab-yes-really&quot; tabindex=&quot;-1&quot;&gt;StyleGAN in MATLAB (yes really) &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Since R2019b MATLAB has had support for a low-level interface to deep learning. Rather than dealing with layers, you can write networks as normal functions. This makes it a good deal easier to port things from other frameworks (in particular PyTorch).&lt;/p&gt;
&lt;p&gt;So as an exercise in testing out the new features and really getting to grips with the details of StyleGAN, I ported the original to MATLAB. Below is a version of the classic style mixing figure from the original paper (using the original FFHQ model)&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/style_mixing.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/CNv0Nx_VVn-200.webp 200w, https://justinpinkney.com/img/CNv0Nx_VVn-320.webp 320w, https://justinpinkney.com/img/CNv0Nx_VVn-500.webp 500w, https://justinpinkney.com/img/CNv0Nx_VVn-800.webp 800w, https://justinpinkney.com/img/CNv0Nx_VVn-1024.webp 1024w, https://justinpinkney.com/img/CNv0Nx_VVn-1600.webp 1600w, https://justinpinkney.com/img/CNv0Nx_VVn-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/CNv0Nx_VVn-200.jpeg 200w, https://justinpinkney.com/img/CNv0Nx_VVn-320.jpeg 320w, https://justinpinkney.com/img/CNv0Nx_VVn-500.jpeg 500w, https://justinpinkney.com/img/CNv0Nx_VVn-800.jpeg 800w, https://justinpinkney.com/img/CNv0Nx_VVn-1024.jpeg 1024w, https://justinpinkney.com/img/CNv0Nx_VVn-1600.jpeg 1600w, https://justinpinkney.com/img/CNv0Nx_VVn-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/CNv0Nx_VVn-200.jpeg&quot; width=&quot;2048&quot; height=&quot;2048&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To use a pre-trained generator I have a &lt;a href=&quot;https://github.com/justinpinkney/stylegan-matlab-playground/blob/master/scripts/stylegan_convertor.ipynb&quot;&gt;small bit of Python code&lt;/a&gt; (which you can run freely in Colab without needing any Python setup) which allows you to take a model trained in the original implementation and convert the weights to a .mat file for use in MATLAB. If you&#39;re looking for some pre-trained models a good place is &lt;a href=&quot;https://github.com/justinpinkney/awesome-pretrained-stylegan&quot;&gt;my pretrained StyleGAN model collection&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As well as simply creating faces using the generator it&#39;s now very simple to use the rest of the deep learning features in MATLAB. For example you can use one of the pre-trained models to compute a perceptual path length to see the differences in interpolation methods. (Note the original uses a different perceptual loss so the numbers aren&#39;t comparable to those in the paper) Interpolation in z space makes some random glasses appear, whereas using w space gives a much smoother transition.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/interp.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;hacking-the-model&quot; tabindex=&quot;-1&quot;&gt;Hacking the Model &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The big advantage (at least for me) of having an implementation in MATLAB, rather than TensorFlow, is that it becomes a lot easier to monkey around with the internals of the model.&lt;/p&gt;
&lt;p&gt;Currently there is a system of callbacks so that you can reach into the model and mess around with things like the internal activations. Here&#39;s an example where you can modify the learned constant that is at the begning of the model to change the content of the generated image. If you extend it you can create more background (or endless hair), if you set it to some random numbers you get a mish mash of facial features.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/padded.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/DltellV5vI-200.webp 200w, https://justinpinkney.com/img/DltellV5vI-320.webp 320w, https://justinpinkney.com/img/DltellV5vI-500.webp 500w, https://justinpinkney.com/img/DltellV5vI-800.webp 800w, https://justinpinkney.com/img/DltellV5vI-1024.webp 1024w, https://justinpinkney.com/img/DltellV5vI-1600.webp 1600w, https://justinpinkney.com/img/DltellV5vI-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/DltellV5vI-200.jpeg 200w, https://justinpinkney.com/img/DltellV5vI-320.jpeg 320w, https://justinpinkney.com/img/DltellV5vI-500.jpeg 500w, https://justinpinkney.com/img/DltellV5vI-800.jpeg 800w, https://justinpinkney.com/img/DltellV5vI-1024.jpeg 1024w, https://justinpinkney.com/img/DltellV5vI-1600.jpeg 1600w, https://justinpinkney.com/img/DltellV5vI-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/DltellV5vI-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/randomised.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/huGWSWshD0-200.webp 200w, https://justinpinkney.com/img/huGWSWshD0-320.webp 320w, https://justinpinkney.com/img/huGWSWshD0-500.webp 500w, https://justinpinkney.com/img/huGWSWshD0-800.webp 800w, https://justinpinkney.com/img/huGWSWshD0-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/huGWSWshD0-200.jpeg 200w, https://justinpinkney.com/img/huGWSWshD0-320.jpeg 320w, https://justinpinkney.com/img/huGWSWshD0-500.jpeg 500w, https://justinpinkney.com/img/huGWSWshD0-800.jpeg 800w, https://justinpinkney.com/img/huGWSWshD0-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/huGWSWshD0-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And here&#39;s another where I&#39;m using just rotating the activations at a particular layer. If you&#39;re interested in what can be done with this sort of approach, see the &lt;a href=&quot;https://terencebroad.com/research/network-bending&quot;&gt;work by Terence Broad&lt;/a&gt; which takes the idea a lot further.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/rotate.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;no-training&quot; tabindex=&quot;-1&quot;&gt;No Training &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Unfortunately MATLAB doesn&#39;t yet support double differentiation so it&#39;s not possible to train a StyleGAN model in MATLAB.&lt;/p&gt;
&lt;h2 id=&quot;the-code&quot; tabindex=&quot;-1&quot;&gt;The Code &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The StyleGAN implementation is released as an open-source &lt;a href=&quot;https://github.com/justinpinkney/stylegan-matlab-playground&quot;&gt;project on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The code is currently in a state that it&#39;s probably not super comprehensible to someone else (there are a few &lt;a href=&quot;https://github.com/justinpinkney/stylegan-matlab-playground#examples&quot;&gt;example live scripts&lt;/a&gt; at least). If anyone is actually interested in using this, create an issue or reach out to me &lt;a href=&quot;https://twitter.com/buntworthy&quot;&gt;on Twitter&lt;/a&gt; and I&#39;ll endeavour to get round to making it a bit more useable.&lt;/p&gt;
&lt;p&gt;Of course shortly after I&#39;d done all of this Nvidia went ahead and released &lt;a href=&quot;https://github.com/NVlabs/stylegan2&quot;&gt;StyleGAN2&lt;/a&gt; which redefined the state of the art again!&lt;/p&gt;
&lt;h2 id=&quot;and-finally&quot; tabindex=&quot;-1&quot;&gt;And finally &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-stylegan/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;If you want a peek at what things I&#39;ve used this for I&#39;ve collected some of my experiments with generated faces is this Twitter thread:&lt;/p&gt;
&lt;p&gt;https://www.twitter.com/Buntworthy/status/1275175544087367682&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Ukiyo-e Yourself with StyleGAN 2</title>
		<link href="https://justinpinkney.com/blog/2020/ukiyoe-yourself/"/>
		<updated>2020-06-10T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/ukiyoe-yourself/</id>
		<content type="html">&lt;!-- --&gt;
&lt;p&gt;I&#39;ve spent some time training a StyleGAN2 model on ukiyo-e faces. Here are some results from training and some experimentation with model interpolation.&lt;/p&gt;
&lt;!-- &lt;a href=&quot;fakes000312.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;/img/CerxBeHzmA-200.webp 200w, /img/CerxBeHzmA-320.webp 320w, /img/CerxBeHzmA-500.webp 500w, /img/CerxBeHzmA-800.webp 800w, /img/CerxBeHzmA-1024.webp 1024w, /img/CerxBeHzmA-1600.webp 1600w, /img/CerxBeHzmA-7168.webp 7168w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;/img/CerxBeHzmA-200.jpeg 200w, /img/CerxBeHzmA-320.jpeg 320w, /img/CerxBeHzmA-500.jpeg 500w, /img/CerxBeHzmA-800.jpeg 800w, /img/CerxBeHzmA-1024.jpeg 1024w, /img/CerxBeHzmA-1600.jpeg 1600w, /img/CerxBeHzmA-7168.jpeg 7168w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;/img/CerxBeHzmA-200.jpeg&quot; width=&quot;7168&quot; height=&quot;4096&quot;&gt;&lt;/picture&gt;&lt;/a&gt; --&gt;
&lt;p&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/CerxBeHzmA-100.webp 100w, https://justinpinkney.com/img/CerxBeHzmA-200.webp 200w, https://justinpinkney.com/img/CerxBeHzmA-500.webp 500w, https://justinpinkney.com/img/CerxBeHzmA-1024.webp 1024w&quot; sizes=&quot;(max-width:800px) 50vw, 50vw&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/CerxBeHzmA-100.jpeg 100w, https://justinpinkney.com/img/CerxBeHzmA-200.jpeg 200w, https://justinpinkney.com/img/CerxBeHzmA-500.jpeg 500w, https://justinpinkney.com/img/CerxBeHzmA-1024.jpeg 1024w&quot; sizes=&quot;(max-width:800px) 50vw, 50vw&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/CerxBeHzmA-100.jpeg&quot; width=&quot;1024&quot; height=&quot;585&quot;&gt;&lt;/picture&gt;&lt;/p&gt;
&lt;h2 id=&quot;dataset&quot; tabindex=&quot;-1&quot;&gt;Dataset &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;ve spent some time collecting face images from museum images, suprisingly AWS Rekognition does a reasonable job of detecting faces and landmarks. As most of the images provided by museums are not very high resolution I&#39;ve also used ESRGAN to upscale to a final 1024x1204 resolution.&lt;/p&gt;
&lt;p&gt;Now I&#39;ve got a few thousand high-resolution and algined ukiyo-e faces ready for StyleGAN training&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/ukiyoe-grid.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/1Nm-0Ds6Oc-200.webp 200w, https://justinpinkney.com/img/1Nm-0Ds6Oc-320.webp 320w, https://justinpinkney.com/img/1Nm-0Ds6Oc-500.webp 500w, https://justinpinkney.com/img/1Nm-0Ds6Oc-800.webp 800w, https://justinpinkney.com/img/1Nm-0Ds6Oc-1024.webp 1024w, https://justinpinkney.com/img/1Nm-0Ds6Oc-1600.webp 1600w, https://justinpinkney.com/img/1Nm-0Ds6Oc-2394.webp 2394w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/1Nm-0Ds6Oc-200.jpeg 200w, https://justinpinkney.com/img/1Nm-0Ds6Oc-320.jpeg 320w, https://justinpinkney.com/img/1Nm-0Ds6Oc-500.jpeg 500w, https://justinpinkney.com/img/1Nm-0Ds6Oc-800.jpeg 800w, https://justinpinkney.com/img/1Nm-0Ds6Oc-1024.jpeg 1024w, https://justinpinkney.com/img/1Nm-0Ds6Oc-1600.jpeg 1600w, https://justinpinkney.com/img/1Nm-0Ds6Oc-2394.jpeg 2394w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/1Nm-0Ds6Oc-200.jpeg&quot; width=&quot;2394&quot; height=&quot;1348&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;training-results&quot; tabindex=&quot;-1&quot;&gt;Training results &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;m fine tuning from the original FFHQ model (config-e) which has lots of frontal portraits, but these were very uncommon in uikyo-e pictures. The model quickly learns the basic style; then spends a while with a dilemma of how best to turn people&#39;s heads; then settles on one side.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/training.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/1AdtD62pqM-200.webp 200w, https://justinpinkney.com/img/1AdtD62pqM-320.webp 320w, https://justinpinkney.com/img/1AdtD62pqM-500.webp 500w, https://justinpinkney.com/img/1AdtD62pqM-800.webp 800w, https://justinpinkney.com/img/1AdtD62pqM-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/1AdtD62pqM-200.jpeg 200w, https://justinpinkney.com/img/1AdtD62pqM-320.jpeg 320w, https://justinpinkney.com/img/1AdtD62pqM-500.jpeg 500w, https://justinpinkney.com/img/1AdtD62pqM-800.jpeg 800w, https://justinpinkney.com/img/1AdtD62pqM-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/1AdtD62pqM-200.jpeg&quot; width=&quot;1024&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Eventually the model basically learns not to have frontal faces and interpolations generally give sharp transitions from left to right looking faces&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/ukiyoe-interpolation.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;ukiyo-e-yourself&quot; tabindex=&quot;-1&quot;&gt;Ukiyo-e yourself &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The early stages of training are interesting in that you often see &amp;quot;ukiyo-e ish&amp;quot; looking faces but with traits of the original. Take a look at this example of the same seed for different iterations in training, in the 2nd panel he&#39;s clearly still wearing a suit!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/montage.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.webp 200w, https://justinpinkney.com/img/r2lNykNcp0-320.webp 320w, https://justinpinkney.com/img/r2lNykNcp0-500.webp 500w, https://justinpinkney.com/img/r2lNykNcp0-800.webp 800w, https://justinpinkney.com/img/r2lNykNcp0-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.jpeg 200w, https://justinpinkney.com/img/r2lNykNcp0-320.jpeg 320w, https://justinpinkney.com/img/r2lNykNcp0-500.jpeg 500w, https://justinpinkney.com/img/r2lNykNcp0-800.jpeg 800w, https://justinpinkney.com/img/r2lNykNcp0-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/r2lNykNcp0-200.jpeg&quot; width=&quot;1024&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This early stage of training seems like it might be a promising way of creating a &amp;quot;real&amp;quot; to &amp;quot;ukiyo-e&amp;quot; face model, but at this point the model is already starting to twist the pose of the faces.&lt;/p&gt;
&lt;h3 id=&quot;model-interpolation&quot; tabindex=&quot;-1&quot;&gt;Model interpolation &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Playing with well known technique of averaging the weights of the base and the transferred model helps get a bit closer to the original, but the pose is still way off. Here&#39;s a few frames of interpolation from the weights of one model to another&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/averaging.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h3 id=&quot;layer-swapping&quot; tabindex=&quot;-1&quot;&gt;Layer Swapping &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;So I&#39;ve also been playing with something you could call &amp;quot;layer swapping&amp;quot;. Taking different resolution layers from the different models and combining them. This helps to avoid changing the pose (which is controlled by early, low resolution layers). Here&#39;s an animation as I progressively swap in more layers from the original FFHQ model into the ukiyo-e model.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/swapping.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Swapping in only the lower resolution layers from FFHQ into Ukiyo-e serves to preserve the pose of the generated face, but still transfer the features and style of a typical ukiyo-e portrait. Here&#39;s a detail of the point at which I think it looks best.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/mr79.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.webp 200w, https://justinpinkney.com/img/dB65WPoc2Y-320.webp 320w, https://justinpinkney.com/img/dB65WPoc2Y-500.webp 500w, https://justinpinkney.com/img/dB65WPoc2Y-800.webp 800w, https://justinpinkney.com/img/dB65WPoc2Y-1024.webp 1024w, https://justinpinkney.com/img/dB65WPoc2Y-1600.webp 1600w, https://justinpinkney.com/img/dB65WPoc2Y-3072.webp 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.jpeg 200w, https://justinpinkney.com/img/dB65WPoc2Y-320.jpeg 320w, https://justinpinkney.com/img/dB65WPoc2Y-500.jpeg 500w, https://justinpinkney.com/img/dB65WPoc2Y-800.jpeg 800w, https://justinpinkney.com/img/dB65WPoc2Y-1024.jpeg 1024w, https://justinpinkney.com/img/dB65WPoc2Y-1600.jpeg 1600w, https://justinpinkney.com/img/dB65WPoc2Y-3072.jpeg 3072w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/dB65WPoc2Y-200.jpeg&quot; width=&quot;3072&quot; height=&quot;1024&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are still some artifacts but I think it&#39;s a fun technique and sometimes gives some nice results. Here are a bunch more cherry picked examples:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0014.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/paVJsYk4nO-200.webp 200w, https://justinpinkney.com/img/paVJsYk4nO-320.webp 320w, https://justinpinkney.com/img/paVJsYk4nO-500.webp 500w, https://justinpinkney.com/img/paVJsYk4nO-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/paVJsYk4nO-200.jpeg 200w, https://justinpinkney.com/img/paVJsYk4nO-320.jpeg 320w, https://justinpinkney.com/img/paVJsYk4nO-500.jpeg 500w, https://justinpinkney.com/img/paVJsYk4nO-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/paVJsYk4nO-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0005.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/TUEqJarnGy-200.webp 200w, https://justinpinkney.com/img/TUEqJarnGy-320.webp 320w, https://justinpinkney.com/img/TUEqJarnGy-500.webp 500w, https://justinpinkney.com/img/TUEqJarnGy-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/TUEqJarnGy-200.jpeg 200w, https://justinpinkney.com/img/TUEqJarnGy-320.jpeg 320w, https://justinpinkney.com/img/TUEqJarnGy-500.jpeg 500w, https://justinpinkney.com/img/TUEqJarnGy-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/TUEqJarnGy-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0034.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/0HQVuJQwIb-200.webp 200w, https://justinpinkney.com/img/0HQVuJQwIb-320.webp 320w, https://justinpinkney.com/img/0HQVuJQwIb-500.webp 500w, https://justinpinkney.com/img/0HQVuJQwIb-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/0HQVuJQwIb-200.jpeg 200w, https://justinpinkney.com/img/0HQVuJQwIb-320.jpeg 320w, https://justinpinkney.com/img/0HQVuJQwIb-500.jpeg 500w, https://justinpinkney.com/img/0HQVuJQwIb-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/0HQVuJQwIb-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0053.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/h6D-pI51p0-200.webp 200w, https://justinpinkney.com/img/h6D-pI51p0-320.webp 320w, https://justinpinkney.com/img/h6D-pI51p0-500.webp 500w, https://justinpinkney.com/img/h6D-pI51p0-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/h6D-pI51p0-200.jpeg 200w, https://justinpinkney.com/img/h6D-pI51p0-320.jpeg 320w, https://justinpinkney.com/img/h6D-pI51p0-500.jpeg 500w, https://justinpinkney.com/img/h6D-pI51p0-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/h6D-pI51p0-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0056.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/eVyp81ptVp-200.webp 200w, https://justinpinkney.com/img/eVyp81ptVp-320.webp 320w, https://justinpinkney.com/img/eVyp81ptVp-500.webp 500w, https://justinpinkney.com/img/eVyp81ptVp-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/eVyp81ptVp-200.jpeg 200w, https://justinpinkney.com/img/eVyp81ptVp-320.jpeg 320w, https://justinpinkney.com/img/eVyp81ptVp-500.jpeg 500w, https://justinpinkney.com/img/eVyp81ptVp-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/eVyp81ptVp-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0089.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/B0O_CFv-fb-200.webp 200w, https://justinpinkney.com/img/B0O_CFv-fb-320.webp 320w, https://justinpinkney.com/img/B0O_CFv-fb-500.webp 500w, https://justinpinkney.com/img/B0O_CFv-fb-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/B0O_CFv-fb-200.jpeg 200w, https://justinpinkney.com/img/B0O_CFv-fb-320.jpeg 320w, https://justinpinkney.com/img/B0O_CFv-fb-500.jpeg 500w, https://justinpinkney.com/img/B0O_CFv-fb-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/B0O_CFv-fb-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0092.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/A1m0sfz7WS-200.webp 200w, https://justinpinkney.com/img/A1m0sfz7WS-320.webp 320w, https://justinpinkney.com/img/A1m0sfz7WS-500.webp 500w, https://justinpinkney.com/img/A1m0sfz7WS-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/A1m0sfz7WS-200.jpeg 200w, https://justinpinkney.com/img/A1m0sfz7WS-320.jpeg 320w, https://justinpinkney.com/img/A1m0sfz7WS-500.jpeg 500w, https://justinpinkney.com/img/A1m0sfz7WS-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/A1m0sfz7WS-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/examples/seed0094.png.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Xj0j3BPDmY-200.webp 200w, https://justinpinkney.com/img/Xj0j3BPDmY-320.webp 320w, https://justinpinkney.com/img/Xj0j3BPDmY-500.webp 500w, https://justinpinkney.com/img/Xj0j3BPDmY-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Xj0j3BPDmY-200.jpeg 200w, https://justinpinkney.com/img/Xj0j3BPDmY-320.jpeg 320w, https://justinpinkney.com/img/Xj0j3BPDmY-500.jpeg 500w, https://justinpinkney.com/img/Xj0j3BPDmY-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Xj0j3BPDmY-200.jpeg&quot; width=&quot;512&quot; height=&quot;256&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Since this original example I have put a bit more work into testing out these layer swapping ideas. You can get code to do it yourself, and see what other weird and wonderful things myself and others have come up with &lt;a href=&quot;https://justinpinkney.com/blog/2020/stylegan-network-blending&quot;&gt;in this post&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;other-thoughts&quot; tabindex=&quot;-1&quot;&gt;Other thoughts &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I tried using CycleGAN to iron out the defects in the outputs of the combined model. Training it to convert from the modified model to the original ukiyo-e faces, but the brief attempts I&#39;ve given haven&#39;t got any good results:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/cyclegan.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/4eNosUot3--200.webp 200w, https://justinpinkney.com/img/4eNosUot3--320.webp 320w, https://justinpinkney.com/img/4eNosUot3--500.webp 500w, https://justinpinkney.com/img/4eNosUot3--512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/4eNosUot3--200.jpeg 200w, https://justinpinkney.com/img/4eNosUot3--320.jpeg 320w, https://justinpinkney.com/img/4eNosUot3--500.jpeg 500w, https://justinpinkney.com/img/4eNosUot3--512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/4eNosUot3--200.jpeg&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While I&#39;m at it, a CycleGAN trained to turn Ukiyo-e faces into real ones does give some pretty amusing results:&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/ukiyoe_face.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Currently my &amp;quot;layer swapping&amp;quot; is just a straight copy of the weights of certain layers of one model to the other. But this leads to a harsh transition in the model. Maybe a smoother transition would give nicer results, for example gradual interpolating from one model to the other as you move through the layers.&lt;/p&gt;
&lt;p&gt;Another completely different idea would be to simply not train the lower resolution layers to begin. This might prevent the model changing the low resolution appearance (like pose) while getting the appearance of ukiyo-e right&lt;/p&gt;
&lt;h2 id=&quot;pix2pixhd-model-distillation&quot; tabindex=&quot;-1&quot;&gt;pix2pixHD model distillation &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;So one thing to remember is that the people above are &lt;em&gt;not&lt;/em&gt; real people, they&#39;re generated by the FFHQ trained model. To apply this to a real person&#39;s face we&#39;d first have to find their latent representation using an optimisation process (like the projector included in StyleGAN2). To skip out on this laborious step I tried to train a pix2pixHD model (as in the paper &lt;a href=&quot;https://github.com/EvgenyKashin/stylegan2-distillation&quot;&gt;&lt;em&gt;StyleGAN2 Distillation for Feed-forward Image Manipulation&lt;/em&gt;&lt;/a&gt;) to &amp;quot;distill&amp;quot; the transformation. This way I can give it an arbitray image of a person and get the ukiyo-e version. The end results aren&#39;t too bad&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/arnold_01_synthesized_image.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/e8fI7tyMjf-200.webp 200w, https://justinpinkney.com/img/e8fI7tyMjf-320.webp 320w, https://justinpinkney.com/img/e8fI7tyMjf-500.webp 500w, https://justinpinkney.com/img/e8fI7tyMjf-512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/e8fI7tyMjf-200.jpeg 200w, https://justinpinkney.com/img/e8fI7tyMjf-320.jpeg 320w, https://justinpinkney.com/img/e8fI7tyMjf-500.jpeg 500w, https://justinpinkney.com/img/e8fI7tyMjf-512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/e8fI7tyMjf-200.jpeg&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/beat_01_synthesized_image.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9UBwMOKuh--200.webp 200w, https://justinpinkney.com/img/9UBwMOKuh--320.webp 320w, https://justinpinkney.com/img/9UBwMOKuh--500.webp 500w, https://justinpinkney.com/img/9UBwMOKuh--512.webp 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/9UBwMOKuh--200.jpeg 200w, https://justinpinkney.com/img/9UBwMOKuh--320.jpeg 320w, https://justinpinkney.com/img/9UBwMOKuh--500.jpeg 500w, https://justinpinkney.com/img/9UBwMOKuh--512.jpeg 512w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9UBwMOKuh--200.jpeg&quot; width=&quot;512&quot; height=&quot;512&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But there are still some interesting failure cases when the input image isn&#39;t aligned in the way the training images are.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/ukiyoe-yourself/beat_synthesized_image.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/vIiWftPNLy-200.webp 200w, https://justinpinkney.com/img/vIiWftPNLy-320.webp 320w, https://justinpinkney.com/img/vIiWftPNLy-500.webp 500w, https://justinpinkney.com/img/vIiWftPNLy-800.webp 800w, https://justinpinkney.com/img/vIiWftPNLy-1024.webp 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/vIiWftPNLy-200.jpeg 200w, https://justinpinkney.com/img/vIiWftPNLy-320.jpeg 320w, https://justinpinkney.com/img/vIiWftPNLy-500.jpeg 500w, https://justinpinkney.com/img/vIiWftPNLy-800.jpeg 800w, https://justinpinkney.com/img/vIiWftPNLy-1024.jpeg 1024w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/vIiWftPNLy-200.jpeg&quot; width=&quot;1024&quot; height=&quot;1536&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>MATLAB pix2pix 🏢</title>
		<link href="https://justinpinkney.com/blog/2020/matlab-pix2pix/"/>
		<updated>2020-06-09T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/matlab-pix2pix/</id>
		<content type="html">&lt;p&gt;Of all the GAN architectures pix2pix is a personal favourite. It popularised the use of GANs for image to image translation, it&#39;s nice and simple, trains relatively quickly, and invariably produces some surprisingly pleasant results. If you don&#39;t know what pix2pix is, see the &lt;a href=&quot;https://phillipi.github.io/pix2pix/&quot;&gt;original project page&lt;/a&gt; which has some nice examples and a demo. Or just &lt;a href=&quot;https://twitter.com/search?q=%23pix2pix&quot;&gt;search Twitter for #pix2pix&lt;/a&gt; for some fun examples.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/matlab-pix2pix/result.jpg&quot; alt=&quot;Input to the model on the left and output on the right&quot;&gt;&lt;/p&gt;
&lt;p&gt;Since R2019b MATLAB has had support for automatic differentiation and it&#39;s been possible to train GANs. So one of the first things I tried out was writing a MATLAB version of pix2pix (unlike Python it&#39;s easy to be the first to write and implementation of a deep learning paper in MATLAB).&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/matlab-deep-learning/pix2pix/master/docs/training.gif&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The classic example for pix2pix is the facades dataset and running this is gives similar results to the original paper (although training time is quite a lot slower).&lt;/p&gt;
&lt;p&gt;I&#39;ve released this as an open-source &lt;a href=&quot;https://github.com/matlab-deep-learning/pix2pix&quot;&gt;project on GitHub&lt;/a&gt; if you run into any problem please file an &lt;a href=&quot;https://github.com/matlab-deep-learning/pix2pix/issues&quot;&gt;issue.&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Colour Sorter 🌈</title>
		<link href="https://justinpinkney.com/blog/2020/colour-sorter/"/>
		<updated>2020-06-08T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/colour-sorter/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/ColourSorterLogo.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ndfHnZR2yR-200.webp 200w, https://justinpinkney.com/img/ndfHnZR2yR-320.webp 320w, https://justinpinkney.com/img/ndfHnZR2yR-500.webp 500w, https://justinpinkney.com/img/ndfHnZR2yR-800.webp 800w, https://justinpinkney.com/img/ndfHnZR2yR-1024.webp 1024w, https://justinpinkney.com/img/ndfHnZR2yR-1600.webp 1600w, https://justinpinkney.com/img/ndfHnZR2yR-2814.webp 2814w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ndfHnZR2yR-200.jpeg 200w, https://justinpinkney.com/img/ndfHnZR2yR-320.jpeg 320w, https://justinpinkney.com/img/ndfHnZR2yR-500.jpeg 500w, https://justinpinkney.com/img/ndfHnZR2yR-800.jpeg 800w, https://justinpinkney.com/img/ndfHnZR2yR-1024.jpeg 1024w, https://justinpinkney.com/img/ndfHnZR2yR-1600.jpeg 1600w, https://justinpinkney.com/img/ndfHnZR2yR-2814.jpeg 2814w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ndfHnZR2yR-200.jpeg&quot; width=&quot;2814&quot; height=&quot;795&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;strong&gt;Takes a picture and rearranges the pixels to make another.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A little generative art Java application I made a long time ago which implements an algorithm to take all the pixels from an image and use them to make a new image, based on a (sort of) sorting algorithm.&lt;/p&gt;
&lt;p&gt;https://youtu.be/FCp3UCBGCmc&lt;/p&gt;
&lt;p&gt;This was all inspired by &lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/(https:/web.archive.org/web/20180812152126/http:/joco.name/2014/03/02/all-rgb-colors-in-one-image/)&quot;&gt;József Fejes&#39;s all RGB sorting&lt;/a&gt; &lt;a href=&quot;http://rainbowsmoke.hu/home&quot;&gt;work.&lt;/a&gt; I implemented his basic concept for pixel colour sorting, but using an image as the source of pixels and added lots of options for sort order, distance metrics, and starting patterns. The combination of options as well as the sheer variety of possible starting material (i.e. any image or photograph) produces a huge variety of results which is very fun to curate.&lt;/p&gt;
&lt;p&gt;You can find the &lt;a href=&quot;https://github.com/justinpinkney/ColourSorter&quot;&gt;source code on GitHub.&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;examples&quot; tabindex=&quot;-1&quot;&gt;Examples &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/0.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/4BGy-mkAu3-200.webp 200w, https://justinpinkney.com/img/4BGy-mkAu3-320.webp 320w, https://justinpinkney.com/img/4BGy-mkAu3-500.webp 500w, https://justinpinkney.com/img/4BGy-mkAu3-800.webp 800w, https://justinpinkney.com/img/4BGy-mkAu3-1024.webp 1024w, https://justinpinkney.com/img/4BGy-mkAu3-1600.webp 1600w, https://justinpinkney.com/img/4BGy-mkAu3-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/4BGy-mkAu3-200.jpeg 200w, https://justinpinkney.com/img/4BGy-mkAu3-320.jpeg 320w, https://justinpinkney.com/img/4BGy-mkAu3-500.jpeg 500w, https://justinpinkney.com/img/4BGy-mkAu3-800.jpeg 800w, https://justinpinkney.com/img/4BGy-mkAu3-1024.jpeg 1024w, https://justinpinkney.com/img/4BGy-mkAu3-1600.jpeg 1600w, https://justinpinkney.com/img/4BGy-mkAu3-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/4BGy-mkAu3-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/V7SVVV0-bQ-200.webp 200w, https://justinpinkney.com/img/V7SVVV0-bQ-320.webp 320w, https://justinpinkney.com/img/V7SVVV0-bQ-500.webp 500w, https://justinpinkney.com/img/V7SVVV0-bQ-800.webp 800w, https://justinpinkney.com/img/V7SVVV0-bQ-1024.webp 1024w, https://justinpinkney.com/img/V7SVVV0-bQ-1600.webp 1600w, https://justinpinkney.com/img/V7SVVV0-bQ-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/V7SVVV0-bQ-200.jpeg 200w, https://justinpinkney.com/img/V7SVVV0-bQ-320.jpeg 320w, https://justinpinkney.com/img/V7SVVV0-bQ-500.jpeg 500w, https://justinpinkney.com/img/V7SVVV0-bQ-800.jpeg 800w, https://justinpinkney.com/img/V7SVVV0-bQ-1024.jpeg 1024w, https://justinpinkney.com/img/V7SVVV0-bQ-1600.jpeg 1600w, https://justinpinkney.com/img/V7SVVV0-bQ-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/V7SVVV0-bQ-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/2.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/HLE-tjIXPY-200.webp 200w, https://justinpinkney.com/img/HLE-tjIXPY-320.webp 320w, https://justinpinkney.com/img/HLE-tjIXPY-500.webp 500w, https://justinpinkney.com/img/HLE-tjIXPY-800.webp 800w, https://justinpinkney.com/img/HLE-tjIXPY-1024.webp 1024w, https://justinpinkney.com/img/HLE-tjIXPY-1600.webp 1600w, https://justinpinkney.com/img/HLE-tjIXPY-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/HLE-tjIXPY-200.jpeg 200w, https://justinpinkney.com/img/HLE-tjIXPY-320.jpeg 320w, https://justinpinkney.com/img/HLE-tjIXPY-500.jpeg 500w, https://justinpinkney.com/img/HLE-tjIXPY-800.jpeg 800w, https://justinpinkney.com/img/HLE-tjIXPY-1024.jpeg 1024w, https://justinpinkney.com/img/HLE-tjIXPY-1600.jpeg 1600w, https://justinpinkney.com/img/HLE-tjIXPY-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/HLE-tjIXPY-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/3.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/wmW9CLd_jm-200.webp 200w, https://justinpinkney.com/img/wmW9CLd_jm-320.webp 320w, https://justinpinkney.com/img/wmW9CLd_jm-500.webp 500w, https://justinpinkney.com/img/wmW9CLd_jm-800.webp 800w, https://justinpinkney.com/img/wmW9CLd_jm-1024.webp 1024w, https://justinpinkney.com/img/wmW9CLd_jm-1600.webp 1600w, https://justinpinkney.com/img/wmW9CLd_jm-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/wmW9CLd_jm-200.jpeg 200w, https://justinpinkney.com/img/wmW9CLd_jm-320.jpeg 320w, https://justinpinkney.com/img/wmW9CLd_jm-500.jpeg 500w, https://justinpinkney.com/img/wmW9CLd_jm-800.jpeg 800w, https://justinpinkney.com/img/wmW9CLd_jm-1024.jpeg 1024w, https://justinpinkney.com/img/wmW9CLd_jm-1600.jpeg 1600w, https://justinpinkney.com/img/wmW9CLd_jm-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/wmW9CLd_jm-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/4.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/uVE-XLCNhw-200.webp 200w, https://justinpinkney.com/img/uVE-XLCNhw-320.webp 320w, https://justinpinkney.com/img/uVE-XLCNhw-500.webp 500w, https://justinpinkney.com/img/uVE-XLCNhw-800.webp 800w, https://justinpinkney.com/img/uVE-XLCNhw-1024.webp 1024w, https://justinpinkney.com/img/uVE-XLCNhw-1600.webp 1600w, https://justinpinkney.com/img/uVE-XLCNhw-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/uVE-XLCNhw-200.jpeg 200w, https://justinpinkney.com/img/uVE-XLCNhw-320.jpeg 320w, https://justinpinkney.com/img/uVE-XLCNhw-500.jpeg 500w, https://justinpinkney.com/img/uVE-XLCNhw-800.jpeg 800w, https://justinpinkney.com/img/uVE-XLCNhw-1024.jpeg 1024w, https://justinpinkney.com/img/uVE-XLCNhw-1600.jpeg 1600w, https://justinpinkney.com/img/uVE-XLCNhw-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/uVE-XLCNhw-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/5.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/KgbW4cZ5iX-200.webp 200w, https://justinpinkney.com/img/KgbW4cZ5iX-320.webp 320w, https://justinpinkney.com/img/KgbW4cZ5iX-500.webp 500w, https://justinpinkney.com/img/KgbW4cZ5iX-800.webp 800w, https://justinpinkney.com/img/KgbW4cZ5iX-1024.webp 1024w, https://justinpinkney.com/img/KgbW4cZ5iX-1600.webp 1600w, https://justinpinkney.com/img/KgbW4cZ5iX-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/KgbW4cZ5iX-200.jpeg 200w, https://justinpinkney.com/img/KgbW4cZ5iX-320.jpeg 320w, https://justinpinkney.com/img/KgbW4cZ5iX-500.jpeg 500w, https://justinpinkney.com/img/KgbW4cZ5iX-800.jpeg 800w, https://justinpinkney.com/img/KgbW4cZ5iX-1024.jpeg 1024w, https://justinpinkney.com/img/KgbW4cZ5iX-1600.jpeg 1600w, https://justinpinkney.com/img/KgbW4cZ5iX-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/KgbW4cZ5iX-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/6.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/_AKm-s8fQ_-200.webp 200w, https://justinpinkney.com/img/_AKm-s8fQ_-320.webp 320w, https://justinpinkney.com/img/_AKm-s8fQ_-500.webp 500w, https://justinpinkney.com/img/_AKm-s8fQ_-800.webp 800w, https://justinpinkney.com/img/_AKm-s8fQ_-1024.webp 1024w, https://justinpinkney.com/img/_AKm-s8fQ_-1600.webp 1600w, https://justinpinkney.com/img/_AKm-s8fQ_-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/_AKm-s8fQ_-200.jpeg 200w, https://justinpinkney.com/img/_AKm-s8fQ_-320.jpeg 320w, https://justinpinkney.com/img/_AKm-s8fQ_-500.jpeg 500w, https://justinpinkney.com/img/_AKm-s8fQ_-800.jpeg 800w, https://justinpinkney.com/img/_AKm-s8fQ_-1024.jpeg 1024w, https://justinpinkney.com/img/_AKm-s8fQ_-1600.jpeg 1600w, https://justinpinkney.com/img/_AKm-s8fQ_-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/_AKm-s8fQ_-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/8.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/zNV3pZH_ff-200.webp 200w, https://justinpinkney.com/img/zNV3pZH_ff-320.webp 320w, https://justinpinkney.com/img/zNV3pZH_ff-500.webp 500w, https://justinpinkney.com/img/zNV3pZH_ff-800.webp 800w, https://justinpinkney.com/img/zNV3pZH_ff-1024.webp 1024w, https://justinpinkney.com/img/zNV3pZH_ff-1600.webp 1600w, https://justinpinkney.com/img/zNV3pZH_ff-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/zNV3pZH_ff-200.jpeg 200w, https://justinpinkney.com/img/zNV3pZH_ff-320.jpeg 320w, https://justinpinkney.com/img/zNV3pZH_ff-500.jpeg 500w, https://justinpinkney.com/img/zNV3pZH_ff-800.jpeg 800w, https://justinpinkney.com/img/zNV3pZH_ff-1024.jpeg 1024w, https://justinpinkney.com/img/zNV3pZH_ff-1600.jpeg 1600w, https://justinpinkney.com/img/zNV3pZH_ff-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/zNV3pZH_ff-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/9.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/Y1D0Tisrs6-200.webp 200w, https://justinpinkney.com/img/Y1D0Tisrs6-320.webp 320w, https://justinpinkney.com/img/Y1D0Tisrs6-500.webp 500w, https://justinpinkney.com/img/Y1D0Tisrs6-800.webp 800w, https://justinpinkney.com/img/Y1D0Tisrs6-1024.webp 1024w, https://justinpinkney.com/img/Y1D0Tisrs6-1600.webp 1600w, https://justinpinkney.com/img/Y1D0Tisrs6-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/Y1D0Tisrs6-200.jpeg 200w, https://justinpinkney.com/img/Y1D0Tisrs6-320.jpeg 320w, https://justinpinkney.com/img/Y1D0Tisrs6-500.jpeg 500w, https://justinpinkney.com/img/Y1D0Tisrs6-800.jpeg 800w, https://justinpinkney.com/img/Y1D0Tisrs6-1024.jpeg 1024w, https://justinpinkney.com/img/Y1D0Tisrs6-1600.jpeg 1600w, https://justinpinkney.com/img/Y1D0Tisrs6-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/Y1D0Tisrs6-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/10.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ObVaFdnIs5-200.webp 200w, https://justinpinkney.com/img/ObVaFdnIs5-320.webp 320w, https://justinpinkney.com/img/ObVaFdnIs5-500.webp 500w, https://justinpinkney.com/img/ObVaFdnIs5-800.webp 800w, https://justinpinkney.com/img/ObVaFdnIs5-1024.webp 1024w, https://justinpinkney.com/img/ObVaFdnIs5-1600.webp 1600w, https://justinpinkney.com/img/ObVaFdnIs5-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ObVaFdnIs5-200.jpeg 200w, https://justinpinkney.com/img/ObVaFdnIs5-320.jpeg 320w, https://justinpinkney.com/img/ObVaFdnIs5-500.jpeg 500w, https://justinpinkney.com/img/ObVaFdnIs5-800.jpeg 800w, https://justinpinkney.com/img/ObVaFdnIs5-1024.jpeg 1024w, https://justinpinkney.com/img/ObVaFdnIs5-1600.jpeg 1600w, https://justinpinkney.com/img/ObVaFdnIs5-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ObVaFdnIs5-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/colour-sorter/11.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ChTgqUYxvk-200.webp 200w, https://justinpinkney.com/img/ChTgqUYxvk-320.webp 320w, https://justinpinkney.com/img/ChTgqUYxvk-500.webp 500w, https://justinpinkney.com/img/ChTgqUYxvk-800.webp 800w, https://justinpinkney.com/img/ChTgqUYxvk-1024.webp 1024w, https://justinpinkney.com/img/ChTgqUYxvk-1600.webp 1600w, https://justinpinkney.com/img/ChTgqUYxvk-1920.webp 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ChTgqUYxvk-200.jpeg 200w, https://justinpinkney.com/img/ChTgqUYxvk-320.jpeg 320w, https://justinpinkney.com/img/ChTgqUYxvk-500.jpeg 500w, https://justinpinkney.com/img/ChTgqUYxvk-800.jpeg 800w, https://justinpinkney.com/img/ChTgqUYxvk-1024.jpeg 1024w, https://justinpinkney.com/img/ChTgqUYxvk-1600.jpeg 1600w, https://justinpinkney.com/img/ChTgqUYxvk-1920.jpeg 1920w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ChTgqUYxvk-200.jpeg&quot; width=&quot;1920&quot; height=&quot;1080&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>My Day Job 🧑‍💻</title>
		<link href="https://justinpinkney.com/blog/day-job/"/>
		<updated>2020-06-07T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/day-job/</id>
		<content type="html">&lt;h2 id=&quot;midjourney&quot; tabindex=&quot;-1&quot;&gt;Midjourney &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/day-job/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I work as a machine learning researcher at Midjourney, working on state of the art text-to-image models.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;before&quot; tabindex=&quot;-1&quot;&gt;Before &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/day-job/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Lambda Labs&lt;/strong&gt; - I worked as a Senior Machine Learning Researcher at Lambda Labs mostly with a focus on computer vision and generative models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MathWorks&lt;/strong&gt; - I used to work as a senior consultant at MathWorks helping people to: develop computer vision and deep learning algorithms, write better MATLAB code and embrace good software development practices, develop and deploy more complex application than your average MATLAB user usually thinks about.&lt;/p&gt;
&lt;p&gt;As part of that job I was able to release some open source projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://justinpinkney.com/blog/matlab-pix2pix&quot;&gt;pix2pix - Image to image translation with GANs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://justinpinkney.com/blog/matlab-mtcnn&quot;&gt;MTCNN - Deep learning face detection&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Sagentia&lt;/strong&gt; - I used to work at a product development consultancy called Sagentia.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PhD&lt;/strong&gt; - Before I had a real job I did a PhD in &lt;a href=&quot;https://justinpinkney.com/blog/2020/biophysics/&quot;&gt;Biophysics&lt;/a&gt; at the University of Oxford.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>MATLAB Face Detection with MTCNN 🔎😄</title>
		<link href="https://justinpinkney.com/blog/2020/matlab-mtcnn/"/>
		<updated>2020-06-06T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/matlab-mtcnn/</id>
		<content type="html">&lt;p&gt;&lt;strong&gt;Get a fast and accurate face and facial feature detector for MATLAB &lt;a href=&quot;https://github.com/matlab-deep-learning/mtcnn-face-detection/releases&quot;&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/pretty-good.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;intro&quot; tabindex=&quot;-1&quot;&gt;Intro &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Everyone pretty much takes good quality face detection for granted these days, and it&#39;s essentially a solved problem in computer vision. Everyone&#39;s photo app can detect all the face you care about and computer vision competition results are increasingly focussing on performance for difficult, small, or occluded faces.&lt;/p&gt;
&lt;p&gt;Although MATLAB has a face detector built into the Computer Vision Toolbox I&#39;m sure most who have had some reason to use it have come away disappointed. Honestly, it&#39;s pretty outdated as deep learning has totally revolutionised computer vision since that face detector was released.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/compare.jpg&quot; alt=&quot;MATLAB&#39;s face detection in yellow, MTCNN in teal.&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;enter-mtcnn&quot; tabindex=&quot;-1&quot;&gt;Enter MTCNN &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are now tons of deep learning based face detectors, each of them eeking out more and more performance on standard benchmarks. But sometimes you just want something that is pretty good and well tested in real life.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/trek-faces.jpg&quot; alt=&quot;Image source: NASA&quot;&gt;&lt;/p&gt;
&lt;p&gt;Multi-task Cascaded Convolutional Neural Network (MTCNN) is a little old but has a fairly simple architecture, is small and fast, and performs well. It also has the additional advantage of outputting the locations of facial features (eyes nose and mouth). It&#39;s been widely used for a pre-processing step in lots of other applications and it works well and reliably.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/sota.png&quot; alt=&quot;Multitask Cascade CNN (MTCNN) was state of the art in 2016 and is still pretty good for most faces.&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;mtcnn-in-matlab&quot; tabindex=&quot;-1&quot;&gt;MTCNN in MATLAB &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&#39;ve ported the &lt;a href=&quot;https://kpzhang93.github.io/MTCNN_face_detection_alignment/&quot;&gt;original MTCNN&lt;/a&gt; pre-trained weights into MATLAB, using some of the deep learning features introduced in R2019b. (I&#39;ve also done some work to make sure that it still runs in R2019a, although it&#39;s a little slower.)&lt;/p&gt;
&lt;p&gt;I&#39;ve released this as an open source project, the code and toolbox (for simple install) is &lt;a href=&quot;https://github.com/matlab-deep-learning/mtcnn-face-detection&quot;&gt;available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/crowd.mp4&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;h2 id=&quot;training&quot; tabindex=&quot;-1&quot;&gt;Training? &lt;a class=&quot;header-anchor&quot; href=&quot;https://justinpinkney.com/blog/2020/matlab-mtcnn/&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One thing to note is that the current code doesn&#39;t support training a new model, although this would be perfectly possible to do. People don&#39;t seem particularly interested in training a new model, just using the standard pre-trained weights. If I&#39;m wrong about this and you&#39;re desperate to train you own models please &lt;a href=&quot;https://github.com/matlab-deep-learning/mtcnn-face-detection/issues/1&quot;&gt;comment on this issue&lt;/a&gt; to let me know.&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Biophysics 🔬🧬</title>
		<link href="https://justinpinkney.com/blog/2020/biophysics/"/>
		<updated>2020-06-04T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2020/biophysics/</id>
		<content type="html">&lt;p&gt;My biophysics phd was focused on single molecule fluorescence methods for studying DNA-protein interactions.&lt;/p&gt;
&lt;p&gt;A bit more specifically I used single molecule methods such as FRET (Forster resonance energy transfer), PIFE (protein induced fluorescence enhancement), and a method I developed, TFM (tethered fluorophore motion), to study a range of DNA processing proteins (DNA polymerase, recombinases, and motor proteins).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2020/biophysics/biophysics.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/b5jOK8DGFy-200.webp 200w, https://justinpinkney.com/img/b5jOK8DGFy-320.webp 320w, https://justinpinkney.com/img/b5jOK8DGFy-500.webp 500w, https://justinpinkney.com/img/b5jOK8DGFy-800.webp 800w, https://justinpinkney.com/img/b5jOK8DGFy-1024.webp 1024w, https://justinpinkney.com/img/b5jOK8DGFy-1174.webp 1174w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/b5jOK8DGFy-200.jpeg 200w, https://justinpinkney.com/img/b5jOK8DGFy-320.jpeg 320w, https://justinpinkney.com/img/b5jOK8DGFy-500.jpeg 500w, https://justinpinkney.com/img/b5jOK8DGFy-800.jpeg 800w, https://justinpinkney.com/img/b5jOK8DGFy-1024.jpeg 1024w, https://justinpinkney.com/img/b5jOK8DGFy-1174.jpeg 1174w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/b5jOK8DGFy-200.jpeg&quot; width=&quot;1174&quot; height=&quot;874&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This was where I first started to write code in earnest, learnt the appeal of building tools and methods, started processing images, and got to play with lasers.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2020/biophysics/microscopebuilding.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;Here are the papers I published back then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[Capturing reaction paths and intermediates in Cre-loxP recombination using single-molecule fluorescence](Capturing reaction paths and intermediates in Cre-loxP recombination using single-molecule fluorescence.pdf)&lt;/li&gt;
&lt;li&gt;[Conformational transitions during FtsK translocase activation of individual XerCD–dif recombination complexes](Conformational transitions during FtsK translocase activation of individual XerCD–dif recombination complexes.pdf)&lt;/li&gt;
&lt;li&gt;[Extending and combining single molecule methods to study site specific recombination](Extending and combining single molecule methods to study site specific recombination.pdf)&lt;/li&gt;
&lt;li&gt;[Rotavirus mRNAS are released by transcript-specific channels in the double-layered viral capsid](Rotavirus mRNAS are released by transcript-specific channels in the double-layered viral capsid.pdf)&lt;/li&gt;
&lt;li&gt;[Tethered Fluorophore Motion Studying Large DNA Conformational Changes by Single-fluorophore Imaging](Tethered Fluorophore Motion Studying Large DNA Conformational Changes by Single-fluorophore Imaging.pdf)&lt;/li&gt;
&lt;/ul&gt;
</content>
	</entry>
	
	<entry>
		<title>Better OBJ model loading in Processing</title>
		<link href="https://justinpinkney.com/blog/2015/better-obj/"/>
		<updated>2015-04-12T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2015/better-obj/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2015/better-obj/2015-04-12_screenshot_001.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/UUAeptuPrg-200.webp 200w, https://justinpinkney.com/img/UUAeptuPrg-320.webp 320w, https://justinpinkney.com/img/UUAeptuPrg-500.webp 500w, https://justinpinkney.com/img/UUAeptuPrg-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/UUAeptuPrg-200.jpeg 200w, https://justinpinkney.com/img/UUAeptuPrg-320.jpeg 320w, https://justinpinkney.com/img/UUAeptuPrg-500.jpeg 500w, https://justinpinkney.com/img/UUAeptuPrg-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;OBJ model loaded in Processing&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/UUAeptuPrg-200.jpeg&quot; width=&quot;720&quot; height=&quot;360&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Processing is a great at many things, it&#39;s so easy to do so much, including displaying things in 3D. Still it&#39;s easy to make things grind to a halt when trying to do to much. But there are a lot of things you can do to speed things up like &lt;a href=&quot;https://processing.org/tutorials/pshape/&quot; target=&quot;_blank&quot;&gt;PShape recording&lt;/a&gt;, using &lt;a href=&quot;https://github.com/processing/processing/wiki/Advanced-OpenGL&quot; target=&quot;_blank&quot;&gt;Vertex Arrays to display lots of points,&lt;/a&gt; or &lt;a href=&quot;https://processing.org/tutorials/pshader/&quot; target=&quot;_blank&quot;&gt;writing GLSL Shaders&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Loading a .obj model in Processing is as easy as using loadShape(), but loading a reasonably complicated textured obj models things slow down dramatically. Things seem to run happily when there are no textures involved. I think it&#39;s something to do with the fact that each face of the model is loaded as an individual PShape, that are all lumped together (but it&#39;s for a &lt;a href=&quot;https://github.com/processing/processing/issues/2873&quot; target=&quot;_blank&quot;&gt;good reason&lt;/a&gt;).&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;
&lt;video controls=&quot;&quot; src=&quot;https://justinpinkney.com/blog/2015/better-obj/objModel2.mp4&quot; loop=&quot;true&quot;&gt;&lt;/video&gt;
&lt;/p&gt;
&lt;p&gt;You can pretty easily speed things up by taking the PShape made by loadShape and grabbing all the child shapes and putting them into a single PShape. You can&#39;t mix quads and triangles in a single PShape, and different textures and material properties would need to be split up too. Still from a crawling 4 fps, we easily get back up to 60 fps. Which is cool, hopefully that&#39;s helpful to someone, as it took me ages to figure out why things were going so slowly. And at some point I might have a crack at writing my own version of the OBJ loader.&lt;/p&gt;
&lt;pre class=&quot;language-java&quot; tabindex=&quot;0&quot;&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; objShape&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; triShape&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; quadShape&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;setup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;720&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;360&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token constant&quot;&gt;P3D&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Load the obj model normally&lt;/span&gt;
  objShape &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;loadShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;heartDecim.obj&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// Make a PShape with the all the faces with three vertices&lt;/span&gt;
  triShape &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShapeTri&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;objShape&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Make a PShape with the all the faces with four vertices&lt;/span&gt;
  quadShape &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShapeQuad&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;objShape&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;draw&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;background&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token function&quot;&gt;translate&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;width&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; height&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;rotateX&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;PI&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;rotateY&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;mouseX&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;triShape&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;quadShape&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;//shape(objShape);&lt;/span&gt;

  &lt;span class=&quot;token function&quot;&gt;lights&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token function&quot;&gt;println&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;frameRate&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShapeTri&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token class-name&quot;&gt;PImage&lt;/span&gt; tex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;loadImage&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;HeartC.JPG&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;beginShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;TRIANGLES&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;noStroke&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;texture&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;tex&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;textureMode&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;NORMAL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;lt&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getChildCount &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertexCount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;lt&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getChild &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertexCount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token class-name&quot;&gt;PVector&lt;/span&gt; p &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token class-name&quot;&gt;PVector&lt;/span&gt; n &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getNormal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt; u &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getTextureU&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt; v &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getTextureV&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;normal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;z&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;vertex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;z&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;endShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShapeQuad&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token class-name&quot;&gt;PImage&lt;/span&gt; tex &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;loadImage&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;HeartC.JPG&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token class-name&quot;&gt;PShape&lt;/span&gt; s &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;createShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;beginShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;QUADS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;noStroke&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;texture&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;tex&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;textureMode&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token constant&quot;&gt;NORMAL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;lt&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getChildCount &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertexCount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;&amp;amp;&lt;/span&gt;lt&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;getChild &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertexCount&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; j&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token class-name&quot;&gt;PVector&lt;/span&gt; p &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getVertex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token class-name&quot;&gt;PVector&lt;/span&gt; n &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getNormal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt; u &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getTextureU&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt; v &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getChild&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getTextureV&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;j&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;normal&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; n&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;z&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;vertex&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; p&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;z&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; u&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; v&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
  s&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;endShape&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
</content>
	</entry>
	
	<entry>
		<title>DNA Lamp</title>
		<link href="https://justinpinkney.com/blog/2014/dna-lamp/"/>
		<updated>2014-11-23T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2014/dna-lamp/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/DSC08404.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/589MgO0_9y-200.webp 200w, https://justinpinkney.com/img/589MgO0_9y-320.webp 320w, https://justinpinkney.com/img/589MgO0_9y-500.webp 500w, https://justinpinkney.com/img/589MgO0_9y-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/589MgO0_9y-200.jpeg 200w, https://justinpinkney.com/img/589MgO0_9y-320.jpeg 320w, https://justinpinkney.com/img/589MgO0_9y-500.jpeg 500w, https://justinpinkney.com/img/589MgO0_9y-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;LED bokeh&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/589MgO0_9y-200.jpeg&quot; width=&quot;720&quot; height=&quot;482&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A prototype of a stolen idea from a friend of DNA sequence displaying LED lamp.&lt;/p&gt;
&lt;p&gt;http://www.youtube.com/tCwi4XRZ3RU&lt;/p&gt;
&lt;p&gt;My first with controlling addressable LEDs with Processing and the fantastic &lt;a href=&quot;http://www.misc.name/#/fadecandy/&quot; target=&quot;_blank&quot;&gt;Fadecandy&lt;/a&gt; board from &lt;a href=&quot;http://scanlime.org/&quot; target=&quot;_blank&quot;&gt;Micah Scott&lt;/a&gt;. Of course it would be way more fitting to have two LED strips and arrange them in a double helix...&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/two_images.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/XC7UwUQ89m-200.webp 200w, https://justinpinkney.com/img/XC7UwUQ89m-320.webp 320w, https://justinpinkney.com/img/XC7UwUQ89m-500.webp 500w, https://justinpinkney.com/img/XC7UwUQ89m-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/XC7UwUQ89m-200.jpeg 200w, https://justinpinkney.com/img/XC7UwUQ89m-320.jpeg 320w, https://justinpinkney.com/img/XC7UwUQ89m-500.jpeg 500w, https://justinpinkney.com/img/XC7UwUQ89m-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;DNA LED lamp&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/XC7UwUQ89m-200.jpeg&quot; width=&quot;720&quot; height=&quot;538&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/DSC08396.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/KOTpYqNSP7-200.webp 200w, https://justinpinkney.com/img/KOTpYqNSP7-320.webp 320w, https://justinpinkney.com/img/KOTpYqNSP7-500.webp 500w, https://justinpinkney.com/img/KOTpYqNSP7-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/KOTpYqNSP7-200.jpeg 200w, https://justinpinkney.com/img/KOTpYqNSP7-320.jpeg 320w, https://justinpinkney.com/img/KOTpYqNSP7-500.jpeg 500w, https://justinpinkney.com/img/KOTpYqNSP7-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Fadecandy board&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/KOTpYqNSP7-200.jpeg&quot; width=&quot;720&quot; height=&quot;482&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/DSC08409.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/SP8zmpQKjQ-200.webp 200w, https://justinpinkney.com/img/SP8zmpQKjQ-320.webp 320w, https://justinpinkney.com/img/SP8zmpQKjQ-500.webp 500w, https://justinpinkney.com/img/SP8zmpQKjQ-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/SP8zmpQKjQ-200.jpeg 200w, https://justinpinkney.com/img/SP8zmpQKjQ-320.jpeg 320w, https://justinpinkney.com/img/SP8zmpQKjQ-500.jpeg 500w, https://justinpinkney.com/img/SP8zmpQKjQ-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Processing LED sketch&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/SP8zmpQKjQ-200.jpeg&quot; width=&quot;720&quot; height=&quot;720&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/DSC08383.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/G1yF2KzVYt-200.webp 200w, https://justinpinkney.com/img/G1yF2KzVYt-320.webp 320w, https://justinpinkney.com/img/G1yF2KzVYt-500.webp 500w, https://justinpinkney.com/img/G1yF2KzVYt-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/G1yF2KzVYt-200.jpeg 200w, https://justinpinkney.com/img/G1yF2KzVYt-320.jpeg 320w, https://justinpinkney.com/img/G1yF2KzVYt-500.jpeg 500w, https://justinpinkney.com/img/G1yF2KzVYt-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Fadecandy board&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/G1yF2KzVYt-200.jpeg&quot; width=&quot;720&quot; height=&quot;482&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/dna-lamp/DSC08395.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/JGh2PzqhET-200.webp 200w, https://justinpinkney.com/img/JGh2PzqhET-320.webp 320w, https://justinpinkney.com/img/JGh2PzqhET-500.webp 500w, https://justinpinkney.com/img/JGh2PzqhET-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/JGh2PzqhET-200.jpeg 200w, https://justinpinkney.com/img/JGh2PzqhET-320.jpeg 320w, https://justinpinkney.com/img/JGh2PzqhET-500.jpeg 500w, https://justinpinkney.com/img/JGh2PzqhET-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Addressable RGB LED&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/JGh2PzqhET-200.jpeg&quot; width=&quot;720&quot; height=&quot;482&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Visualising SIFT descriptors</title>
		<link href="https://justinpinkney.com/blog/2014/sift-features/"/>
		<updated>2014-11-02T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2014/sift-features/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/sift-features/walle1-copy.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/ANM4yNTlIH-200.webp 200w, https://justinpinkney.com/img/ANM4yNTlIH-320.webp 320w, https://justinpinkney.com/img/ANM4yNTlIH-500.webp 500w, https://justinpinkney.com/img/ANM4yNTlIH-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/ANM4yNTlIH-200.jpeg 200w, https://justinpinkney.com/img/ANM4yNTlIH-320.jpeg 320w, https://justinpinkney.com/img/ANM4yNTlIH-500.jpeg 500w, https://justinpinkney.com/img/ANM4yNTlIH-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Visualisation of SIFT features of a frame from Wall-E&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/ANM4yNTlIH-200.jpeg&quot; width=&quot;720&quot; height=&quot;601&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Having worked on a few computer vision projects at work recently I&#39;ve been interested in trying to understand what the computer is seeing. A lot of image processing algorithms involve putting the image through filters, or transforms, or extracting local descriptions of portions of the image. Often these are modelled around what the human vision system is understood to be doing, but of course our brain hides this low-levelling processing from us. It seems really interesting to try and visualise some of the intermediate steps of these  algorithms to try and get a better understanding of how the computer is interpreting an image.&lt;/p&gt;
&lt;p&gt;Of course it&#39;s nothing new, computer vision papers explaining or describing algorithms often tend to contain interesting visualisations (just try searching for &lt;a href=&quot;https://www.google.co.uk/search?q=eigenfaces&amp;amp;biw=1536&amp;amp;bih=764&amp;amp;source=lnms&amp;amp;tbm=isch&amp;amp;sa=X&amp;amp;ei=uf1PVMBB1dlqgp-CqAQ&amp;amp;ved=0CAYQ_AUoAQ&quot; target=&quot;_blank&quot;&gt;eigenfaces &lt;/a&gt;or &lt;a href=&quot;https://twitter.com/Buntworthy/status/436260777888976896&quot; target=&quot;_blank&quot;&gt;HOG pedestrian detection&lt;/a&gt;). I also came across a few really nice examples of people specifically trying to visualise (or reconstruct from) image features including: &lt;a href=&quot;http://web.mit.edu/vondrick/ihog/&quot; target=&quot;_blank&quot;&gt;visualisation of HOG features&lt;/a&gt;, and &lt;a href=&quot;https://hal.archives-ouvertes.fr/file/index/docid/567194/filename/weinzaepfel_cvpr11.pdf&quot; target=&quot;_blank&quot;&gt;image reconstruction from SIFT descriptors&lt;/a&gt;, both produce some fantastic and interesting images.&lt;/p&gt;
&lt;h2&gt;Visualising SIFT&lt;/h2&gt;
&lt;p&gt;The Scale Invariant Feature Transform is a commonly used method for detecting and describing local &#39;features&#39; in an image, for a good description of what it is and how it works see the &lt;a href=&quot;http://www.vlfeat.org/api/sift.html&quot; target=&quot;_blank&quot;&gt;VLFeat API documentation&lt;/a&gt;. Basically SIFT produces features in the image that are local points of likely interest and distinctiveness, these are described by descriptors which take a small patch of pixels and compute local histograms of intensity gradients. I&#39;ve used these descriptors several times for detecting and matching objects in scenes before, and have always wanted to better understand what the computer is seeing, and what it&#39;s giving importance to. Typically SIFT descriptors can be visualised as boxes with many arrows, which do give a hint of what the underlying algorithm is producing, but I wanted to try and produce something a little more visually pleasing (if less accurate).&lt;/p&gt;
&lt;p&gt;I came up with a simple visualisation model for a SIFT descriptor. The descriptor is a 128 element vector representing the bins of a histogram of the local intensity gradients in an image over 16 patches around the keypoint. An example of my representation for a few keypoints is shown below:&lt;/p&gt;
&lt;p&gt;http://youtu.be/RWi813cNN_k&lt;/p&gt;
&lt;p&gt;Using OpenCV (in Python) to do the SIFT detection&lt;span class=&quot;sidenote&quot;&gt;
&lt;label class=&quot;sidenote-label&quot; for=&quot;side-note-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/label&gt;
&lt;input class=&quot;sidenote-checkbox&quot; type=&quot;checkbox&quot; id=&quot;side-note-1&quot;&gt;
&lt;span class=&quot;sidenote-content&quot;&gt;1. There is a good explanation of the slightly confusing OpenCV code for the SIFT keypoints &lt;a href=&quot;http://stackoverflow.com/questions/17015995/opencv-sift-descriptor-keypoint-radius&quot;&gt;here&lt;/a&gt;.&lt;/span&gt;
&lt;/span&gt; and descriptions I placed descriptor visualisations into a blank image (scaled and rotated appropriately) and slowly saw some of the original structure of the image reappear in a ghostly form.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/sift-features/sift_lantern.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/os9xeLqJiA-200.webp 200w, https://justinpinkney.com/img/os9xeLqJiA-320.webp 320w, https://justinpinkney.com/img/os9xeLqJiA-500.webp 500w, https://justinpinkney.com/img/os9xeLqJiA-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/os9xeLqJiA-200.jpeg 200w, https://justinpinkney.com/img/os9xeLqJiA-320.jpeg 320w, https://justinpinkney.com/img/os9xeLqJiA-500.jpeg 500w, https://justinpinkney.com/img/os9xeLqJiA-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;SIFT features of lantern&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/os9xeLqJiA-200.jpeg&quot; width=&quot;720&quot; height=&quot;537&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The colour in the images comes from doing feature detection in the red, green, and blue channels and adding appropriately coloured keypoints to the image. It&#39;s surprising just how much of the original details of the image begins to reappear, and it&#39;s also interesting to see what the SIFT algorithm pays attention to in the image and what it doesn&#39;t, (for example it has no interest in EVE in the frame from Wall-E above, she&#39;s just too sleek, uniform and smooth I guess).&lt;/p&gt;
&lt;p&gt;The current algorithm is written in Python, and is painfully slow, so rendering short frames of video is fun, but takes a long time (I guess I need to look into how to use Cython, or learn C++...). The shimmering ghostly movement of rendered motion is particularly nice (and would be better if I could render it at higher resolution!)&lt;/p&gt;
&lt;p&gt;http://youtu.be/Nw-8KRUAJBg&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/sift-features/big-walle.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/mD9DRMXzdi-200.webp 200w, https://justinpinkney.com/img/mD9DRMXzdi-320.webp 320w, https://justinpinkney.com/img/mD9DRMXzdi-500.webp 500w, https://justinpinkney.com/img/mD9DRMXzdi-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/mD9DRMXzdi-200.jpeg 200w, https://justinpinkney.com/img/mD9DRMXzdi-320.jpeg 320w, https://justinpinkney.com/img/mD9DRMXzdi-500.jpeg 500w, https://justinpinkney.com/img/mD9DRMXzdi-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;High detail SIFT feature visualiasation&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/mD9DRMXzdi-200.jpeg&quot; width=&quot;720&quot; height=&quot;360&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Another Wall-E image (in case you couldn&#39;t &lt;a href=&quot;http://cdn.twitrcovers.com/wp-content/uploads/2012/11/Wall-E-l.jpg&quot;&gt;tell&lt;/a&gt;)
&lt;/div&gt;
</content>
	</entry>
	
	<entry>
		<title>Abstract landscapes in Blender</title>
		<link href="https://justinpinkney.com/blog/2014/abstract-landscapes/"/>
		<updated>2014-08-25T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2014/abstract-landscapes/</id>
		<content type="html">&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/craters1-copy-copy.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/cz3orvMH2P-200.webp 200w, https://justinpinkney.com/img/cz3orvMH2P-320.webp 320w, https://justinpinkney.com/img/cz3orvMH2P-500.webp 500w, https://justinpinkney.com/img/cz3orvMH2P-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/cz3orvMH2P-200.jpeg 200w, https://justinpinkney.com/img/cz3orvMH2P-320.jpeg 320w, https://justinpinkney.com/img/cz3orvMH2P-500.jpeg 500w, https://justinpinkney.com/img/cz3orvMH2P-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Abstract 3D pillar landscape created in Blender&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/cz3orvMH2P-200.jpeg&quot; width=&quot;720&quot; height=&quot;405&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently saw some great work by Lee Riggs, using Maya to create wonderful &lt;a href=&quot;https://leegriggs.wordpress.com/2014/06/28/xgen-color/&quot; target=&quot;_blank&quot;&gt;abstract landscapes&lt;/a&gt; out of millions
of small coloured block or pillars. They&#39;re fantastic images, and I wanted to try and emulate the effect in Blender as
part of my &lt;a title=&quot;Exporting from Processing to Blender&quot; href=&quot;https://www.cutsquash.com/2014/04/exporting-processing-blender/&quot; target=&quot;_blank&quot;&gt;ongoing&lt;/a&gt; &lt;a title=&quot;Low poly&quot; href=&quot;https://www.cutsquash.com/2014/02/low-poly/&quot; target=&quot;_blank&quot;&gt;efforts&lt;/a&gt; to try and get to
grips with the program. It turns out it was a lot easier than I expected, just using a couple of modifiers and some
duplication you can create some fairly nice effects. So here is a tutorial on how to achieve that blocky abstract
landscape look with just an interesting image and some modifiers. Although it wasn&#39;t hard to get the basics in place,
they&#39;re still not a patch on the originals, but the next plan is to generate some interesting images in Processing and
use these as the basis for a landscape.&lt;/p&gt;
&lt;h2&gt;Blender abstract landscape tutorial&lt;/h2&gt;
&lt;h3&gt;1. Find an image&lt;/h3&gt;
First off find an image to turn into a landscape. Something colourful (or not?) and with some interesting structures
seem to work well. I chose this stunning microscope image from Eckhard Völcker, any of his beautiful images would be
fantastic, (his &lt;a href=&quot;https://www.wunderkanone.de/&quot; target=&quot;_blank&quot;&gt;website&lt;/a&gt; and &lt;a href=&quot;httpss://www.flickr.com/photos/wunderkanone/&quot; target=&quot;_blank&quot;&gt;Flickr stream&lt;/a&gt; are highly recommended).
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/colour_map.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/qfKCiOwUPX-200.webp 200w, https://justinpinkney.com/img/qfKCiOwUPX-320.webp 320w, https://justinpinkney.com/img/qfKCiOwUPX-500.webp 500w, https://justinpinkney.com/img/qfKCiOwUPX-640.webp 640w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/qfKCiOwUPX-200.jpeg 200w, https://justinpinkney.com/img/qfKCiOwUPX-320.jpeg 320w, https://justinpinkney.com/img/qfKCiOwUPX-500.jpeg 500w, https://justinpinkney.com/img/qfKCiOwUPX-640.jpeg 640w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Osmunda regalis, 10x by Eckhard Völcker&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/qfKCiOwUPX-200.jpeg&quot; width=&quot;640&quot; height=&quot;426&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Load the image into your favourite image editing program and resize or crop the image to be square (it makes scaling the
image correctly later a little easier).  Save a colour copy, and save one in black and white (you might want to adjust
the black and white conversion to try and get the height map you want), bear in mind that light areas will appear higher
than dark.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/height_map.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/9HolpW2yez-200.webp 200w, https://justinpinkney.com/img/9HolpW2yez-320.webp 320w, https://justinpinkney.com/img/9HolpW2yez-500.webp 500w, https://justinpinkney.com/img/9HolpW2yez-640.webp 640w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/9HolpW2yez-200.jpeg 200w, https://justinpinkney.com/img/9HolpW2yez-320.jpeg 320w, https://justinpinkney.com/img/9HolpW2yez-500.jpeg 500w, https://justinpinkney.com/img/9HolpW2yez-640.jpeg 640w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Height map from Osmunda regalis, 10x by Eckhard Völcker&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/9HolpW2yez-200.jpeg&quot; width=&quot;640&quot; height=&quot;640&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
The height map generated from the original image, white is higher.
&lt;/div&gt;
&lt;h3&gt;2. Add some shapes&lt;/h3&gt;
Fire up Blender and delete the starting cube. Add a plane, set the scale and dimension to 1 in both the x and y
dimensions. Press tab to enter edit mode, and subdivide the plane 10 times.
&lt;p&gt;Next, add your base object, it can be anything, but a cylinder or cube is a good starting point. Generally it’s going to
have to be long and thin, but the exact dimensions will depend on how many subdivisions you’re going to apply to the
parent object later.&lt;/p&gt;
&lt;p&gt;Set up the dupliverts, as explained &lt;a href=&quot;https://wiki.blender.org/index.php/Doc:2.6/Manual/Modeling/Objects/Duplication/DupliVerts&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;. Basically, set the parent (Ctrl-P) of the object you created above, to be the plane, then
go into the object menu and set duplication to vertices. So we have a grid of object that are going to make up our
landscape, now we need a hell of a lot more and them, and also some landscape!&lt;/p&gt;
&lt;h3&gt;3. Add modifiers&lt;/h3&gt;
We’ll add two modifiers, a displace modifier to create the landscape and a subsurf modifier to increase the number of
vertices, and therefore landscape elements, we have in our scene.
&lt;p&gt;First, add a simple subsurf modifier to increase the number of landscape elements. I&#39;ve been using a number of
subdivisions anywhere from 4 – 7 (7 of which gives you well over a million landscape elements, so be prepared for a bit
of a slowdown when trying to find a good view in the scene). You’ll also probably want to adjust the x and y dimensions
of the landscape elements to accommodate the changes in number caused by the different subdivision levels.&lt;/p&gt;
&lt;p&gt;Next, add a displace modifier, and set the displacement texture to be the grayscale height map you created earlier.
You’ll have to tune the strength of the displacement to give you the effect you want, but that’s a little tricky to tell
until you have the elements set up. In general though a strength of 1 will be too much.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/displace-copy-copy.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/tak-JJxLLw-200.webp 200w, https://justinpinkney.com/img/tak-JJxLLw-320.webp 320w, https://justinpinkney.com/img/tak-JJxLLw-500.webp 500w, https://justinpinkney.com/img/tak-JJxLLw-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/tak-JJxLLw-200.jpeg 200w, https://justinpinkney.com/img/tak-JJxLLw-320.jpeg 320w, https://justinpinkney.com/img/tak-JJxLLw-500.jpeg 500w, https://justinpinkney.com/img/tak-JJxLLw-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Landscape tutorial screen shot&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/tak-JJxLLw-200.jpeg&quot; width=&quot;720&quot; height=&quot;408&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;4. Set up the material&lt;/h3&gt;
Set up the material of the landscape elements by selecting the original element and assigning a material something like
the one illustrated in the image below.
&lt;p&gt;Basically we are using the colour image as a texture input to the colour of the element, using the object info node to
give the position of the element on the image. You’ll also have to add 0.5 to both the x and y values of the object
position vector to make sure the origin of the image isn&#39;t in the middle of your plane.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/material1.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/HQu9oU0tRT-200.webp 200w, https://justinpinkney.com/img/HQu9oU0tRT-320.webp 320w, https://justinpinkney.com/img/HQu9oU0tRT-500.webp 500w, https://justinpinkney.com/img/HQu9oU0tRT-800.webp 800w, https://justinpinkney.com/img/HQu9oU0tRT-1024.webp 1024w, https://justinpinkney.com/img/HQu9oU0tRT-1466.webp 1466w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/HQu9oU0tRT-200.jpeg 200w, https://justinpinkney.com/img/HQu9oU0tRT-320.jpeg 320w, https://justinpinkney.com/img/HQu9oU0tRT-500.jpeg 500w, https://justinpinkney.com/img/HQu9oU0tRT-800.jpeg 800w, https://justinpinkney.com/img/HQu9oU0tRT-1024.jpeg 1024w, https://justinpinkney.com/img/HQu9oU0tRT-1466.jpeg 1466w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Material node set up for tutorial&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/HQu9oU0tRT-200.jpeg&quot; width=&quot;1466&quot; height=&quot;584&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;5. Render!&lt;/h3&gt;
Now you’re all set up, find a good view point, adjust the materials, lighting set-up, and geometry of the landscape
elements and let Blender render the scene (I&#39;ve been using Cycle rather than Blender internal). Some more example images
I&#39;ve generated are below.
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I&#39;ve uploaded one of my .blend file that you can download &lt;a href=&quot;https://assets.justinpinkney.com/cutsquash/Landscape%20tut.zip&quot;&gt;here&lt;/a&gt;. I&#39;ve been really happy to see people posting
some examples they&#39;ve made following this tutorial: &lt;a href=&quot;https://imgur.com/hpMLOnk&quot; target=&quot;_blank&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;https://imgur.com/quaACxs&quot; target=&quot;_blank&quot;&gt;2&lt;/a&gt;, &lt;a href=&quot;https://imgur.com/15ivLm1&quot; target=&quot;_blank&quot;&gt;3&lt;/a&gt;, &lt;a href=&quot;https://imgur.com/UusD3gd&quot; target=&quot;_blank&quot;&gt;4&lt;/a&gt;, and possibly &lt;a href=&quot;https://imgur.com/arHw4wH&quot; target=&quot;_blank&quot;&gt;5&lt;/a&gt;, let me know if you make any more!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/abstract-landscape.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/HUEl7IDOEF-200.webp 200w, https://justinpinkney.com/img/HUEl7IDOEF-320.webp 320w, https://justinpinkney.com/img/HUEl7IDOEF-500.webp 500w, https://justinpinkney.com/img/HUEl7IDOEF-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/HUEl7IDOEF-200.jpeg 200w, https://justinpinkney.com/img/HUEl7IDOEF-320.jpeg 320w, https://justinpinkney.com/img/HUEl7IDOEF-500.jpeg 500w, https://justinpinkney.com/img/HUEl7IDOEF-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Abstract 3D pillar landscape created in Blender&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/HUEl7IDOEF-200.jpeg&quot; width=&quot;720&quot; height=&quot;405&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/3D-abstract-pillars.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/zO2binvFh_-200.webp 200w, https://justinpinkney.com/img/zO2binvFh_-320.webp 320w, https://justinpinkney.com/img/zO2binvFh_-500.webp 500w, https://justinpinkney.com/img/zO2binvFh_-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/zO2binvFh_-200.jpeg 200w, https://justinpinkney.com/img/zO2binvFh_-320.jpeg 320w, https://justinpinkney.com/img/zO2binvFh_-500.jpeg 500w, https://justinpinkney.com/img/zO2binvFh_-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Abstract 3D pillar landscape created in Blender&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/zO2binvFh_-200.jpeg&quot; width=&quot;720&quot; height=&quot;405&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/Blurred-pillars.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/f7X8sB8o-z-200.webp 200w, https://justinpinkney.com/img/f7X8sB8o-z-320.webp 320w, https://justinpinkney.com/img/f7X8sB8o-z-500.webp 500w, https://justinpinkney.com/img/f7X8sB8o-z-720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/f7X8sB8o-z-200.jpeg 200w, https://justinpinkney.com/img/f7X8sB8o-z-320.jpeg 320w, https://justinpinkney.com/img/f7X8sB8o-z-500.jpeg 500w, https://justinpinkney.com/img/f7X8sB8o-z-720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Abstract blurred 3D pillar landscape created in&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/f7X8sB8o-z-200.jpeg&quot; width=&quot;720&quot; height=&quot;405&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2014/abstract-landscapes/blocky-river-render.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/JmGBXOyLJ--200.webp 200w, https://justinpinkney.com/img/JmGBXOyLJ--320.webp 320w, https://justinpinkney.com/img/JmGBXOyLJ--500.webp 500w, https://justinpinkney.com/img/JmGBXOyLJ--720.webp 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/JmGBXOyLJ--200.jpeg 200w, https://justinpinkney.com/img/JmGBXOyLJ--320.jpeg 320w, https://justinpinkney.com/img/JmGBXOyLJ--500.jpeg 500w, https://justinpinkney.com/img/JmGBXOyLJ--720.jpeg 720w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Abstract 3D river landscape created in Blender&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/JmGBXOyLJ--200.jpeg&quot; width=&quot;720&quot; height=&quot;405&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
</content>
	</entry>
	
	<entry>
		<title>Water drop photography</title>
		<link href="https://justinpinkney.com/blog/2012/water-drops/"/>
		<updated>2012-11-13T00:00:00Z</updated>
		<id>https://justinpinkney.com/blog/2012/water-drops/</id>
		<content type="html">&lt;div class=&quot;info&quot;&gt;
testing
&lt;/div&gt;
&lt;p&gt;After seeing many amazing water drop photos all around the internet I tried out some quick attempts at droplet collision photos. Armed with a cheap speedlite, a tray of water, and a syringe it was actually fairly easy to get some decent splashes. It still takes a lot of lucky timing, and many failed attempts, so building an arduino controlled water dropper sounds like a pretty fun project and evidently leads to some &lt;a href=&quot;https://web.archive.org/web/20120531111427/http://www.photosbykev.com/wordpress/tips-and-trick/water-droplet-photography/&quot;&gt;amazing&lt;/a&gt; &lt;a href=&quot;https://www.scantips.com/shako/index.html&quot;&gt;pictures&lt;/a&gt;. There&#39;s also an amazingly large difference between milk and water at these scales, but the shapes both make are incredible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04021.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/5p_Bfmrxlt-200.webp 200w, https://justinpinkney.com/img/5p_Bfmrxlt-320.webp 320w, https://justinpinkney.com/img/5p_Bfmrxlt-500.webp 500w, https://justinpinkney.com/img/5p_Bfmrxlt-800.webp 800w, https://justinpinkney.com/img/5p_Bfmrxlt-1024.webp 1024w, https://justinpinkney.com/img/5p_Bfmrxlt-1600.webp 1600w, https://justinpinkney.com/img/5p_Bfmrxlt-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/5p_Bfmrxlt-200.jpeg 200w, https://justinpinkney.com/img/5p_Bfmrxlt-320.jpeg 320w, https://justinpinkney.com/img/5p_Bfmrxlt-500.jpeg 500w, https://justinpinkney.com/img/5p_Bfmrxlt-800.jpeg 800w, https://justinpinkney.com/img/5p_Bfmrxlt-1024.jpeg 1024w, https://justinpinkney.com/img/5p_Bfmrxlt-1600.jpeg 1600w, https://justinpinkney.com/img/5p_Bfmrxlt-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;water droplet collision&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/5p_Bfmrxlt-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1181&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Hovering
&lt;/div&gt;
&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC03899-Edit.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/pqTRS1Hd4o-200.webp 200w, https://justinpinkney.com/img/pqTRS1Hd4o-320.webp 320w, https://justinpinkney.com/img/pqTRS1Hd4o-500.webp 500w, https://justinpinkney.com/img/pqTRS1Hd4o-800.webp 800w, https://justinpinkney.com/img/pqTRS1Hd4o-1024.webp 1024w, https://justinpinkney.com/img/pqTRS1Hd4o-1600.webp 1600w, https://justinpinkney.com/img/pqTRS1Hd4o-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/pqTRS1Hd4o-200.jpeg 200w, https://justinpinkney.com/img/pqTRS1Hd4o-320.jpeg 320w, https://justinpinkney.com/img/pqTRS1Hd4o-500.jpeg 500w, https://justinpinkney.com/img/pqTRS1Hd4o-800.jpeg 800w, https://justinpinkney.com/img/pqTRS1Hd4o-1024.jpeg 1024w, https://justinpinkney.com/img/pqTRS1Hd4o-1600.jpeg 1600w, https://justinpinkney.com/img/pqTRS1Hd4o-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;water droplet collision&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/pqTRS1Hd4o-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1084&quot;&gt;&lt;/picture&gt;&lt;/a&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04098.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/nbLkC8Mfjd-200.webp 200w, https://justinpinkney.com/img/nbLkC8Mfjd-320.webp 320w, https://justinpinkney.com/img/nbLkC8Mfjd-500.webp 500w, https://justinpinkney.com/img/nbLkC8Mfjd-800.webp 800w, https://justinpinkney.com/img/nbLkC8Mfjd-1024.webp 1024w, https://justinpinkney.com/img/nbLkC8Mfjd-1600.webp 1600w, https://justinpinkney.com/img/nbLkC8Mfjd-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/nbLkC8Mfjd-200.jpeg 200w, https://justinpinkney.com/img/nbLkC8Mfjd-320.jpeg 320w, https://justinpinkney.com/img/nbLkC8Mfjd-500.jpeg 500w, https://justinpinkney.com/img/nbLkC8Mfjd-800.jpeg 800w, https://justinpinkney.com/img/nbLkC8Mfjd-1024.jpeg 1024w, https://justinpinkney.com/img/nbLkC8Mfjd-1600.jpeg 1600w, https://justinpinkney.com/img/nbLkC8Mfjd-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Droopy water droplet collision&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/nbLkC8Mfjd-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1144&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Pink water drop
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04032.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/7uDSrPXano-200.webp 200w, https://justinpinkney.com/img/7uDSrPXano-320.webp 320w, https://justinpinkney.com/img/7uDSrPXano-500.webp 500w, https://justinpinkney.com/img/7uDSrPXano-800.webp 800w, https://justinpinkney.com/img/7uDSrPXano-1024.webp 1024w, https://justinpinkney.com/img/7uDSrPXano-1600.webp 1600w, https://justinpinkney.com/img/7uDSrPXano-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/7uDSrPXano-200.jpeg 200w, https://justinpinkney.com/img/7uDSrPXano-320.jpeg 320w, https://justinpinkney.com/img/7uDSrPXano-500.jpeg 500w, https://justinpinkney.com/img/7uDSrPXano-800.jpeg 800w, https://justinpinkney.com/img/7uDSrPXano-1024.jpeg 1024w, https://justinpinkney.com/img/7uDSrPXano-1600.jpeg 1600w, https://justinpinkney.com/img/7uDSrPXano-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Red water drop hand shaped splash&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/7uDSrPXano-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1502&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
A hand?
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04108.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/lbFOw_CQ-Q-200.webp 200w, https://justinpinkney.com/img/lbFOw_CQ-Q-320.webp 320w, https://justinpinkney.com/img/lbFOw_CQ-Q-500.webp 500w, https://justinpinkney.com/img/lbFOw_CQ-Q-800.webp 800w, https://justinpinkney.com/img/lbFOw_CQ-Q-1024.webp 1024w, https://justinpinkney.com/img/lbFOw_CQ-Q-1600.webp 1600w, https://justinpinkney.com/img/lbFOw_CQ-Q-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/lbFOw_CQ-Q-200.jpeg 200w, https://justinpinkney.com/img/lbFOw_CQ-Q-320.jpeg 320w, https://justinpinkney.com/img/lbFOw_CQ-Q-500.jpeg 500w, https://justinpinkney.com/img/lbFOw_CQ-Q-800.jpeg 800w, https://justinpinkney.com/img/lbFOw_CQ-Q-1024.jpeg 1024w, https://justinpinkney.com/img/lbFOw_CQ-Q-1600.jpeg 1600w, https://justinpinkney.com/img/lbFOw_CQ-Q-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;UFO shaped milk splash in water&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/lbFOw_CQ-Q-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1333&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Milk and water drop ufo
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04087.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/0bkT76IOIL-200.webp 200w, https://justinpinkney.com/img/0bkT76IOIL-320.webp 320w, https://justinpinkney.com/img/0bkT76IOIL-500.webp 500w, https://justinpinkney.com/img/0bkT76IOIL-800.webp 800w, https://justinpinkney.com/img/0bkT76IOIL-1024.webp 1024w, https://justinpinkney.com/img/0bkT76IOIL-1600.webp 1600w, https://justinpinkney.com/img/0bkT76IOIL-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/0bkT76IOIL-200.jpeg 200w, https://justinpinkney.com/img/0bkT76IOIL-320.jpeg 320w, https://justinpinkney.com/img/0bkT76IOIL-500.jpeg 500w, https://justinpinkney.com/img/0bkT76IOIL-800.jpeg 800w, https://justinpinkney.com/img/0bkT76IOIL-1024.jpeg 1024w, https://justinpinkney.com/img/0bkT76IOIL-1600.jpeg 1600w, https://justinpinkney.com/img/0bkT76IOIL-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Pink high speed water splash crown&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/0bkT76IOIL-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1297&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Water splash crown
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04128.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/KjO8aRXqWO-200.webp 200w, https://justinpinkney.com/img/KjO8aRXqWO-320.webp 320w, https://justinpinkney.com/img/KjO8aRXqWO-500.webp 500w, https://justinpinkney.com/img/KjO8aRXqWO-800.webp 800w, https://justinpinkney.com/img/KjO8aRXqWO-1024.webp 1024w, https://justinpinkney.com/img/KjO8aRXqWO-1600.webp 1600w, https://justinpinkney.com/img/KjO8aRXqWO-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/KjO8aRXqWO-200.jpeg 200w, https://justinpinkney.com/img/KjO8aRXqWO-320.jpeg 320w, https://justinpinkney.com/img/KjO8aRXqWO-500.jpeg 500w, https://justinpinkney.com/img/KjO8aRXqWO-800.jpeg 800w, https://justinpinkney.com/img/KjO8aRXqWO-1024.jpeg 1024w, https://justinpinkney.com/img/KjO8aRXqWO-1600.jpeg 1600w, https://justinpinkney.com/img/KjO8aRXqWO-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Milk into water mushroom shaped splash&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/KjO8aRXqWO-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1111&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Water splash mushroom
&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://justinpinkney.com/blog/2012/water-drops/DSC04090.jpg&quot;&gt;&lt;picture&gt;&lt;source type=&quot;image/webp&quot; srcset=&quot;https://justinpinkney.com/img/14riiYB3w7-200.webp 200w, https://justinpinkney.com/img/14riiYB3w7-320.webp 320w, https://justinpinkney.com/img/14riiYB3w7-500.webp 500w, https://justinpinkney.com/img/14riiYB3w7-800.webp 800w, https://justinpinkney.com/img/14riiYB3w7-1024.webp 1024w, https://justinpinkney.com/img/14riiYB3w7-1600.webp 1600w, https://justinpinkney.com/img/14riiYB3w7-2048.webp 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;source type=&quot;image/jpeg&quot; srcset=&quot;https://justinpinkney.com/img/14riiYB3w7-200.jpeg 200w, https://justinpinkney.com/img/14riiYB3w7-320.jpeg 320w, https://justinpinkney.com/img/14riiYB3w7-500.jpeg 500w, https://justinpinkney.com/img/14riiYB3w7-800.jpeg 800w, https://justinpinkney.com/img/14riiYB3w7-1024.jpeg 1024w, https://justinpinkney.com/img/14riiYB3w7-1600.jpeg 1600w, https://justinpinkney.com/img/14riiYB3w7-2048.jpeg 2048w&quot; sizes=&quot;(max-width:640px) 100vw, 640px&quot;&gt;&lt;img alt=&quot;Mushroom shaped water drop&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; src=&quot;https://justinpinkney.com/img/14riiYB3w7-200.jpeg&quot; width=&quot;2048&quot; height=&quot;1208&quot;&gt;&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;
Another mushroom
&lt;/div&gt;
</content>
	</entry>
</feed>
