It is June of 2023, I am sitting on Koh Phangan, a Thai paradise island, and recover from a Norovirus infection. Thus, I finally take the time to write some lines about the website's cover pictures.
In lack of photos when starting my journey back in December 2022, I had fed some prompts of anticipated experiences into Stable Diffusion, a "text-to-image diffusion model capable of generating photo-realistic images". Today, I am sharing some of the generated pictures along with my text prompts. Furthermore, I have fed the same prompts into Adobe's Firefly image generator which was released in March 2023 (currently free; registration required). It is interesting to see how the two models' outputs equal and differ.
Prompt: The silhouette of a man standing on the rooftop of skyscraper of a city at night. Use only eight colors, mostly dark blue.
Remark: The first three images are from Stable Diffusion (often from multiple runs). The second set of three images was created by Firefly in a single run with the default preset (Shape: square, Style: art).
Prompt: Artistic cartoon of a southeast asian beach with five people drinking cocktails in a bar.
Prompt: Monet style painting of abandoned temple ruins in Cambodia.
Prompt: Cartoon of a man with backpack hiking on a volcano in the jungle with colorful birds in the sky. Be very colorful.
Prompt: Cartoon of a jungle landscape with a man lying in a hammock while reading a book. Use only eight colors, mostly green.
Prompt: Pencil drawing of a bearded man with laptop sitting on the porch of a bamboo hut surrounded by palm trees.
Prompt: Mannerism style painting of a man scuba diving who is surrounded with colorful fish.
Prompt: Oil painting of a sunset over the ocean with reflections in the water. In the right half, the sillouette of a man in a small boat who looks through binoculars like a pirate captain.
There are striking similarities between the two generators, especially regarding image composition. These likely reflect and emphasize biases in compositions in the training data. As artificially generated images will become more prevalent in our environment, I expect that our collective idea of how a scuba diving scene or a sunset over the sea are supposed to look will increasingly converge — in other words: we might create a collective bubble much narrower than our current ideas.
Yet, there are also noticeable differences. While Adobe's Firefly generates higher resolution images and shows a more detailed understanding of complicated prompts (e.g., the binoculars in the the sunset scene), Stable Diffusion features a richer set of styles (mannerism, Monet-style,...). Generally, Firefly's images often look more cartoonish, are more consistent across image instances and show less "artistic expression", as I would call it, than Stable Diffusion. Finally, Firefly clearly featured a higher ethnic balance of the protagonists, often bringing Black or Asian people into the scene while Stable Diffusion was still very White.