fal Assets is now live!

fal-ai/stable-audio-3/medium/audio-to-audio

Stable Audio 3 Medium audio-to-audio is a 1.4 billion parameter latent diffusion model that transforms an input audio clip into new stereo variations up to 6 minutes guided by a text prompt.

Inference

Commercial use

Playground API Examples

Prompt examples

Examples are generated using the Stable Audio 3 Medium Audio to Audio. You can customize them by clicking on the "Playground" button.

arcade funk slap bass sparkle

num_inference_steps8

guidance_scale1

seed730910

Transform the source into bright arcade funk instrumental with slap bass, talkbox-style synth lead, and tight disco claps; preserve the main rhythmic contour while changing instrumentation, space, and groove. No vocals.

num_inference_steps8

guidance_scale1

seed730910

Stable Audio 3 Medium (Audio to Audio) API on fal