fal-ai/diffrhythm

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Lyrics*

[00:10.00]Moonlight spills through broken blinds
[00:13.20]Your shadow dances on the dashboard shrine
[00:16.85]Neon ghosts in gasoline rain
[00:20.40]I hear your laughter down the midnight train
[00:24.15]Static whispers through frayed wires
[00:27.65]Guitar strings hum our cathedral choirs
[00:31.30]Flicker screens show reruns of June
[00:34.90]I'm drowning in this mercury lagoon
[00:38.55]Electric veins pulse through concrete skies
[00:42.10]Your name echoes in the hollow where my heartbeat lies
[00:45.75]We're satellites trapped in parallel light
[00:49.25]Burning through the atmosphere of endless night
[01:00.00]Dusty vinyl spins reverse
[01:03.45]Our polaroid timeline bleeds through the verse
[01:07.10]Telescope aimed at dead stars
[01:10.65]Still tracing constellations through prison bars
[01:14.30]Electric veins pulse through concrete skies
[01:17.85]Your name echoes in the hollow where my heartbeat lies
[01:21.50]We're satellites trapped in parallel light
[01:25.05]Burning through the atmosphere of endless night
[02:10.00]Clockwork gears grind moonbeams to rust
[02:13.50]Our fingerprint smudged by interstellar dust
[02:17.15]Velvet thunder rolls through my veins
[02:20.70]Chasing phantom trains through solar plane
[02:24.35]Electric veins pulse through concrete skies
[02:27.90]Your name echoes in the hollow where my heartbeat lies

Reference Audio URL

Hint: Drag and drop audio files from your computer, audio from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp3, ogg, wav, m4a, aac

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "audio": {
    "file_size": 33554520,
    "url": "https://v3.fal.media/files/elephant/VV4wtKXBpZL1bNv6en36t_output.wav",
    "file_name": "output.wav",
    "content_type": "application/octet-stream"
  }
}

Your request will cost $0.01 per 10 second of generated audio. For $1 you can run generate 1000s of music from lyrics.

Logs

DiffRhythm: Lyrics to Song | [text-to-audio]

DiffRhythm delivers full song generation from timestamped lyrics in under 30 seconds at $0.001 per second of audio. Trading maximum musical complexity for speed and cost efficiency, the model generates 95-285 second tracks with reference audio conditioning and style control. Built for developers who need rapid music prototyping without per-generation costs spiraling into double digits.

Use Cases: Lyric Demo Creation | Video Background Music | Rapid Music Prototyping

Performance

At $0.01 per 10 seconds versus $0.05+ for alternatives, DiffRhythm delivers 5-10x cost efficiency for full-length music generation.

Metric	Result	Context
Generation Speed	Under 30 seconds	Full 95-285 second songs
Cost per 10 seconds	$0.01	1,000 seconds per $1.00 on fal
Track Duration	95-285 seconds	Two duration modes: standard (95s) or extended (285s)
Reference Audio	Supported	Style transfer via URL input

Structured Music Control Without DAW Complexity

DiffRhythm processes timestamped lyrics with precise timing markers, each line tagged with exact second placement. This structured input format contrasts with freeform text-to-music models that interpret vague descriptions.

What this means for you:

Exact timing control: Input format `[00:10.00]Moonlight spills through broken blinds` ensures lyrics sync to specific timestamps, not algorithmic guesses
Reference audio conditioning: Supply existing tracks via URL to guide musical style, instrumentation, and genre characteristics
Configurable generation parameters: Adjust CFG strength (1-10 range), scheduler type (Euler/Midpoint/RK4/Implicit Adams), and inference steps (10-100) for quality-speed tradeoffs
Dual duration modes: Generate 95-second tracks for rapid iteration or 285-second extended versions for full song development

Technical Specifications

Spec	Details
Architecture	DiffRhythm
Input Formats	Timestamped lyrics (text), reference audio URL (mp3/wav/m4a/aac/ogg)
Output Formats	WAV audio (application/octet-stream)
Duration Options	95 seconds, 285 seconds
License	Commercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing

How It Stacks Up

Sonauto V2 Text to Audio – DiffRhythm ($0.001/sec) prioritizes timestamped lyric control and reference audio conditioning for structured song generation at 5x lower cost than Sonauto's freeform text-to-music approach. Sonauto V2 emphasizes natural language music descriptions without timestamp requirements for more exploratory creative workflows.