DiffRhythm: Lyrics to Song Text to Audio
Input
Hint: Drag and drop audio files from your computer, audio from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp3, ogg, wav, m4a, aac
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.01 per 10 second of generated audio. For $1 you can run generate 1000s of music from lyrics.
Logs
DiffRhythm: Lyrics to Song | [text-to-audio]
DiffRhythm delivers full song generation from timestamped lyrics in under 30 seconds at $0.001 per second of audio. Trading maximum musical complexity for speed and cost efficiency, the model generates 95-285 second tracks with reference audio conditioning and style control. Built for developers who need rapid music prototyping without per-generation costs spiraling into double digits.
Use Cases: Lyric Demo Creation | Video Background Music | Rapid Music Prototyping
Performance
At $0.01 per 10 seconds versus $0.05+ for alternatives, DiffRhythm delivers 5-10x cost efficiency for full-length music generation.
| Metric | Result | Context |
|---|---|---|
| Generation Speed | Under 30 seconds | Full 95-285 second songs |
| Cost per 10 seconds | $0.01 | 1,000 seconds per $1.00 on fal |
| Track Duration | 95-285 seconds | Two duration modes: standard (95s) or extended (285s) |
| Reference Audio | Supported | Style transfer via URL input |
Structured Music Control Without DAW Complexity
DiffRhythm processes timestamped lyrics with precise timing markers, each line tagged with exact second placement. This structured input format contrasts with freeform text-to-music models that interpret vague descriptions.
What this means for you:
-
Exact timing control: Input format
`[00:10.00]Moonlight spills through broken blinds`ensures lyrics sync to specific timestamps, not algorithmic guesses -
Reference audio conditioning: Supply existing tracks via URL to guide musical style, instrumentation, and genre characteristics
-
Configurable generation parameters: Adjust CFG strength (1-10 range), scheduler type (Euler/Midpoint/RK4/Implicit Adams), and inference steps (10-100) for quality-speed tradeoffs
-
Dual duration modes: Generate 95-second tracks for rapid iteration or 285-second extended versions for full song development
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | DiffRhythm |
| Input Formats | Timestamped lyrics (text), reference audio URL (mp3/wav/m4a/aac/ogg) |
| Output Formats | WAV audio (application/octet-stream) |
| Duration Options | 95 seconds, 285 seconds |
| License | Commercial use permitted |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Sonauto V2 Text to Audio – DiffRhythm ($0.001/sec) prioritizes timestamped lyric control and reference audio conditioning for structured song generation at 5x lower cost than Sonauto's freeform text-to-music approach. Sonauto V2 emphasizes natural language music descriptions without timestamp requirements for more exploratory creative workflows.