MiniMax Music 2.0: State of the Art AI Text-to-Music Generator

Music 2.0 | [text-to-audio]

MiniMax's Music 2.0 model generates complete songs with synchronized lyrics at $0.03 per generation. With structured lyric-driven composition, the model requires both musical direction and lyrics upfront with no post-generation editing. Built for creators who know exactly what they want to say and how they want it to sound.

Use Cases: Indie music production | Content soundtrack creation | Lyric-driven composition

Performance

At $0.03 per generation, MiniMax Music 2.0 delivers complete songs with synchronized vocals. 33 generations per dollar makes it accessible for rapid iteration during creative development.

Metric	Result	Context
Input Requirements	Dual-prompt system	10-300 char style + 10-3000 char lyrics
Output Format	MP3 audio file	Complete song with vocals
Cost per Generation	$0.03	33 generations per $1.00 on fal
Lyric Structure	Tag-based arrangement	Supports [Intro], [Verse], [Chorus], [Bridge], [Outro]
Related Endpoints	v1, v1.5	Earlier versions with different capabilities

Structured Composition Through Dual-Prompt Architecture

MiniMax Music 2.0 separates musical direction from lyrical content through a two-input system. Your style prompt (10-300 characters) defines mood, genre, and atmosphere like "Indie folk, melancholic, introspective, longing, solitary walk, coffee shop" while your lyrics prompt (10-3000 characters) provides the actual words and song structure.

What this means for you:

Precise arrangement control: Insert structural tags like [Verse], [Chorus], [Bridge] directly into lyrics to guide composition flow. The model builds arrangement around your specified song structure
Lyric-synchronized generation: Audio output matches your exact lyrics with appropriate vocal delivery, eliminating the gap between instrumental generation and vocal production
Style-mood separation: Define musical characteristics independently from lyrical content, allowing you to experiment with different genres against the same lyrics
Extended lyric support: Up to 3,000 characters accommodates full song lyrics including multiple verses, choruses, and bridge sections without truncation

Technical Specifications

Spec	Details
Architecture	MiniMax Music 2.0
Input Formats	Text prompts (style + lyrics)
Output Formats	MP3 audio
Lyric Length	10-3,000 characters
License	Commercial use via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing

How It Stacks Up

MiniMax Music v1.5 ($0.03) – MiniMax Music 2.0 expands lyric capacity and structural control through enhanced tag-based arrangement at the same $0.03 price point. The v1.5 endpoint offers different generation characteristics for workflows prioritizing alternative musical outputs.

MiniMax Video 01 Live – While both models come from MiniMax's generative AI suite, Video 01 Live focuses on text-to-video generation with visual motion synthesis. Music 2.0 specializes in audio-only generation with lyric synchronization, different modalities serving distinct creative needs.

fal-ai/minimax-music/v2

Input

Result

What would you like to do next?

Logs

Music 2.0 | [text-to-audio]

Performance

Structured Composition Through Dual-Prompt Architecture

Technical Specifications

How It Stacks Up