The AI image generation is here in a big way. A new open-source image synthesis model called Stable Diffusion allows anyone with a PC and a decent GPU to conjure almost any visual reality they can imagine. It can mimic virtually any visual style, and if you give it a descriptive phrase, the results appear on your screen like magic.
Some artists are delighted by perspective, others are not happy with it, and society as a whole still seems largely unaware of the rapidly evolving technological revolution taking place across communities on Twitter, Discord, and Github. Image synthesis arguably has implications as important as the invention of the camera or perhaps the creation of visual art itself. Even our sense of history could be at stake, depending on how things unfold. Either way, Stable Diffusion is leading a new wave of deep learning creative tools that are poised to revolutionize visual media creation.
The Rise of Deep Learning Image Synthesis
Stable Diffusion is the brainchild of Emad Mostaque, a London-based former hedge fund manager whose aim is to bring new deep learning applications to the masses through his company, Stability AI. But the roots of modern image synthesis go back to 2014, and Stable Diffusion wasn’t the first image synthesis model (ISM) to make waves this year.
In April 2022, OpenAI announced DALL-E 2, which shocked social media with its ability to transform a scene written in words (called a “prompt”) into a myriad of visual styles that can be fantastical, photorealistic, or even mundane. . People with privileged access to the closed tool have generated astronauts on horseback, teddy bears buying bread in ancient Egypt, never-before-seen sculptures in the style of famous artists, and much more.
Shortly after DALL-E 2, Google and Meta announced their own text-to-image AI models. MidJourney, available as a Discord server since March 2022 and open to the public a few months later, charges for access and produces similar effects but with a more painterly and illustrative quality by default.
Then there is the stable broadcast. On August 22, Stability AI released its open source image generation model which arguably matches DALL-E 2 in quality. He also launched his own commercial website, called DreamStudio, which sells access to compute time to generate images with Stable Diffusion. Unlike DALL-E 2, anyone can use it, and because the Stable Diffusion code is open source, projects can build on it with few restrictions.
In the past week alone, dozens of projects that take Stable Diffusion in radical new directions have emerged. And people got unexpected results using a technique called “img2img” that “improved” the art of MS-DOS gaming, converted Minecraft graphics into realists, transform a scene from Aladdin into 3D, translate childish scribbles into rich illustrations, and much more. Computer graphics can bring the ability to richly visualize ideas to mass audiences, lowering barriers to entry while accelerating the capabilities of artists who embrace the technology, just as Adobe Photoshop did in the years 1990.
You can run Stable Diffusion locally yourself if you follow a somewhat obscure series of steps. For the past two weeks, we’ve been running it on a Windows PC with an Nvidia RTX 3060 12GB GPU. It can generate 512×512 frames in about 10 seconds. On a 3090 Ti, this time drops to four seconds per frame. Interfaces also continue to evolve rapidly, moving from crude command-line interfaces and Google Colab laptops to more refined (but still complex) front-end GUIs, with much more refined interfaces to come. So if you’re not technically inclined, brace yourself: easier solutions are on the way. And if all else fails, you can try an online demo.
#Stable #Diffusion #online