Yesterday, the new Stable Diffusion release referred to as “Stable Cascade” was released and it has been generating significant excitement across various social media platforms. Many resources and support materials are not yet available for developers or users to start experimenting with it. However, within a week, there should be adequate resources available on Automatic1111 and ComfyUI.
For those who cannot wait to try a demo, they can visit the HuggingFace page. Enter your command and click on ‘Run’. After waiting for several seconds for GPU to process, you will be able to see the results. The sample results so far are very encouraging with the image composition and the text being accurately paced. This system supports prompts in natural language; however, it works equally well with older prompts.
Stable Cascade is built using the Würstchen architecture and distinguishes itself from models like Stable Diffusion by operating at a much smaller latent space. The smaller latent space means that not only can the model process information faster, but also the training cost is lower. While Stable Diffusion applies an 8-time compression factor to a 1024×1024 image to encode to 128×128, Stable Cascade achieves a compression factor of 42 to encode a similar image to 24×24 while maintaining sharp reconstructions. Training processes like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are also possible via this method.
A couple of sample images were created using the huggingface page to showcase the new release. One example features a red, closed book on a wooden table with a cup of coffee resting on it and another depicts the word “love” emblazoned on a heart-shaped balloon in a botanical garden, both of which showcase attractive compositions.
The early observations are that the Stable Cascade model provides promising and more refined results than the SDXL. The composition and integrated text are superior, along with the model’s comprehension of natural language when prompted. Enhancements and experiments will be eagerly carried out as user interfaces like Automatic1111 and ComfyUI start supporting this new model. The wait will not be much longer.