Version 2.0 of Stable Diffusion brings many advances. The most important new feature is the improved OpenCLIP text-to-image conversion model.
In August 2022, AI startup Stability AI, in collaboration with RunwayML, LMU Munich, EleutherAI and LAION, released Stable Diffusion, an open source image AI that was immediately well received by the community.
Stable Diffusion can be used online for a fee and with content filters, or downloaded for free and used locally with no content restrictions. Version 2.0 continues this open source approach. Stability AI leads the way.
Improved text encoder and new image modes
For version 2.0, the team used OpenCLIP (Contrastive Language-Image Pre-training), an enhanced version of the multi-modal AI system that learns visual concepts from natural language in a self-supervised way. OpenCLIP was released by LAION in three versions in mid-September and is now implemented in Stable Diffusion. Stability AI supported the formation of OpenCLIP. CLIP models can compute representations of images and text as embeds and compare their similarity. This way, an AI system can generate an image that matches some text.
Thanks to this new text encoder, Stability Diffusion 2.0 can generate significantly better images compared to version 1.0, according to Stability AI. The model can generate images with resolutions of 512×512 and 769×768 pixels, which are then scaled to 2048×2048 pixels by a bottom-up diffusion model it is also new.
The new Open CLIP model was trained with a “aesthetic dataset” compiled by Stability AI based on the LAION-5B dataset. Sexual and pornographic content has been filtered beforehand.
Also new is a image depth model which analyzes the depth of an input image and then uses text input to transform it into new patterns with the outlines of the original image.
Stable Diffusion version 2.0 also gets a paint template that can be used to replace individual image elements in an existing image, like painting a cap or VR headset on your head.
Finally, we also include a new text-guided inpainting model, refined on the new Stable Diffusion 2.0 base text-to-image model, which makes it very easy and quick to change parts of an image. pic.twitter.com/3fKng0ti3S
– Stability AI (@StabilityAI) November 24, 2022
Open source as a model of success
Despite the many improvements, Stable Diffusion version 2.0 should still work locally on a single powerful graphics card with enough memory.
We have already seen that when millions of people get their hands on these models, they collectively create some truly amazing things. This is the power of open source: harnessing the vast potential of millions of talented people who may not have the resources to train a cutting-edge model, but have the ability to do something amazing with a.
More information and access to new templates is available on Github. They should also be available for the Stable Diffusion Dreamstudio web interface in the next few days. Developers can find more information in the Stability AI API documentation.