Riffusion generates AI music from images Stable Diffusion


Researchers create music from text via a stable diffusion detour.

Despite the significant impact generative AI models have had on the text and image industries, the music industry has yet to experience such a drastic transformation.

However, there are still examples of using generative AI models in audio, such as diffusion, an AI generator developed by entrepreneur Seth Forsgren and engineer Hayk Martiros. Based on the open-source Stable Diffusion model, originally designed for images, Riffusion demonstrates the potential of AI to shape the future of music creation.

Riffusion generates music from Stable Diffusion images. As with image AI systems, all you need is a prompt. | Image: Screenshot / THE DECODER

Stable Diffusion generates spectrograms, which then become music

Riffusion offers a simple approach to music generation through the use of Stable Diffusion v1.5, which generates sound wave images, which are then converted into music. The model is simply refined with images of spectrograms, rather than recycled, the developers write.

A d

A spectrogram is a visual representation of the content of a sound section. The x axis represents time, the y axis represents frequency. The color of each pixel indicates the amplitude of the sound at that point.

Video: broadcast

Diffusion can create endless variations of a prompt by varying the seed. All known Stable Diffusion techniques like img2img, inpainting or negative prompts are ready to use.

When providing prompts, get creative! Try your favorite styles, instruments like saxophone or violin, modifiers like Arabic or Jamaican, genres like jazz or rock, sounds like church bells or rain, or any combination. Many words that are not present in the training data still work because the text encoder can associate words with similar semantics.

The closer a prompt is in spirit to the original frame and BPM, the better the results. For example, a prompt for a genre whose BPM is much faster than the starting frame will result in poor generic sound.


Try Riffusion for free

You can try Riffusion directly on the official website without registration. The parameters are limited to five different starting images, which affect the melodic patterns, and four levels of denoising. The higher the denoising factor you choose, the more creative the result will be, but the less it will hit the beat.


Deepmind's Latest AI Has Better Visual Understanding
Deepmind's Latest AI Has Better Visual Understanding

Riffusion Demo – Prompt: “A robotic skull with a half-seen neural network in the brain and a violin on the shoulder”.

Riffusion allows users to share their generated beats with others via a link or download a five-second snippet in MP3 format for further processing in audio software. User-generated sounds can also be found on the Riffusion subreddit.

Additionally, users can create their own custom Riffusion models trained on specific artists or bands, such as the band Rammstein (with sound samples available). Although the sound generated is not of the highest quality, the distinctive style of the chosen group is clear. A tutorial on how to create these custom templates can be found on Reddit.

Leave a Comment

Your email address will not be published. Required fields are marked *