Try New Grok Imagine here!

Minimax Music Text to Audio

fal-ai/minimax-music/v2
Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.03 per generation.

Logs

Music 2.0 | [text-to-audio]

MiniMax's Music 2.0 model generates complete songs with synchronized lyrics at $0.03 per generation. With structured lyric-driven composition, the model requires both musical direction and lyrics upfront with no post-generation editing. Built for creators who know exactly what they want to say and how they want it to sound.

Use Cases: Indie music production | Content soundtrack creation | Lyric-driven composition


Performance

At $0.03 per generation, MiniMax Music 2.0 delivers complete songs with synchronized vocals. 33 generations per dollar makes it accessible for rapid iteration during creative development.

MetricResultContext
Input RequirementsDual-prompt system10-300 char style + 10-3000 char lyrics
Output FormatMP3 audio fileComplete song with vocals
Cost per Generation$0.0333 generations per $1.00 on fal
Lyric StructureTag-based arrangementSupports [Intro], [Verse], [Chorus], [Bridge], [Outro]
Related Endpointsv1, v1.5Earlier versions with different capabilities

Structured Composition Through Dual-Prompt Architecture

MiniMax Music 2.0 separates musical direction from lyrical content through a two-input system. Your style prompt (10-300 characters) defines mood, genre, and atmosphere like "Indie folk, melancholic, introspective, longing, solitary walk, coffee shop" while your lyrics prompt (10-3000 characters) provides the actual words and song structure.

What this means for you:

  • Precise arrangement control: Insert structural tags like [Verse], [Chorus], [Bridge] directly into lyrics to guide composition flow. The model builds arrangement around your specified song structure

  • Lyric-synchronized generation: Audio output matches your exact lyrics with appropriate vocal delivery, eliminating the gap between instrumental generation and vocal production

  • Style-mood separation: Define musical characteristics independently from lyrical content, allowing you to experiment with different genres against the same lyrics

  • Extended lyric support: Up to 3,000 characters accommodates full song lyrics including multiple verses, choruses, and bridge sections without truncation


Technical Specifications

SpecDetails
ArchitectureMiniMax Music 2.0
Input FormatsText prompts (style + lyrics)
Output FormatsMP3 audio
Lyric Length10-3,000 characters
LicenseCommercial use via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

MiniMax Music v1.5 ($0.03) – MiniMax Music 2.0 expands lyric capacity and structural control through enhanced tag-based arrangement at the same $0.03 price point. The v1.5 endpoint offers different generation characteristics for workflows prioritizing alternative musical outputs.

MiniMax Video 01 Live – While both models come from MiniMax's generative AI suite, Video 01 Live focuses on text-to-video generation with visual motion synthesis. Music 2.0 specializes in audio-only generation with lyric synchronization, different modalities serving distinct creative needs.