Minimax Music Text to Audio
Input
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.03 per generation.
Logs
Music 2.0 | [text-to-audio]
MiniMax's Music 2.0 model generates complete songs with synchronized lyrics at $0.03 per generation. With structured lyric-driven composition, the model requires both musical direction and lyrics upfront with no post-generation editing. Built for creators who know exactly what they want to say and how they want it to sound.
Use Cases: Indie music production | Content soundtrack creation | Lyric-driven composition
Performance
At $0.03 per generation, MiniMax Music 2.0 delivers complete songs with synchronized vocals. 33 generations per dollar makes it accessible for rapid iteration during creative development.
| Metric | Result | Context |
|---|---|---|
| Input Requirements | Dual-prompt system | 10-300 char style + 10-3000 char lyrics |
| Output Format | MP3 audio file | Complete song with vocals |
| Cost per Generation | $0.03 | 33 generations per $1.00 on fal |
| Lyric Structure | Tag-based arrangement | Supports [Intro], [Verse], [Chorus], [Bridge], [Outro] |
| Related Endpoints | v1, v1.5 | Earlier versions with different capabilities |
Structured Composition Through Dual-Prompt Architecture
MiniMax Music 2.0 separates musical direction from lyrical content through a two-input system. Your style prompt (10-300 characters) defines mood, genre, and atmosphere like "Indie folk, melancholic, introspective, longing, solitary walk, coffee shop" while your lyrics prompt (10-3000 characters) provides the actual words and song structure.
What this means for you:
-
Precise arrangement control: Insert structural tags like [Verse], [Chorus], [Bridge] directly into lyrics to guide composition flow. The model builds arrangement around your specified song structure
-
Lyric-synchronized generation: Audio output matches your exact lyrics with appropriate vocal delivery, eliminating the gap between instrumental generation and vocal production
-
Style-mood separation: Define musical characteristics independently from lyrical content, allowing you to experiment with different genres against the same lyrics
-
Extended lyric support: Up to 3,000 characters accommodates full song lyrics including multiple verses, choruses, and bridge sections without truncation
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | MiniMax Music 2.0 |
| Input Formats | Text prompts (style + lyrics) |
| Output Formats | MP3 audio |
| Lyric Length | 10-3,000 characters |
| License | Commercial use via fal partnership |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
MiniMax Music v1.5 ($0.03) – MiniMax Music 2.0 expands lyric capacity and structural control through enhanced tag-based arrangement at the same $0.03 price point. The v1.5 endpoint offers different generation characteristics for workflows prioritizing alternative musical outputs.
MiniMax Video 01 Live – While both models come from MiniMax's generative AI suite, Video 01 Live focuses on text-to-video generation with visual motion synthesis. Music 2.0 specializes in audio-only generation with lyric synchronization, different modalities serving distinct creative needs.