Run the latest models all in one Sandbox 🏖️

Sonauto V2 Text to Audio

sonauto/v2/text-to-music
Create full songs in any style
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.075 per generation.

Logs

Sonauto V2 | [text-to-audio]

Sonauto V2 generates complete songs with vocals and instrumentals at $0.075 per track, trading the instant generation of text-to-speech models for full musical composition capability. This is a full music generation system that creates structured songs from text descriptions: handling melody, harmony, rhythm, lyrics, and arrangement in a single inference.

Use Cases: Content Creation Soundtracks | Marketing Campaign Music | Prototype Music Production


Performance

At $0.075 per generation, Sonauto V2 delivers complete song creation at a small fraction of traditional music production costs, positioning it as a rapid prototyping tool for creators who need custom music without licensing fees or studio time.

MetricResultContext
Output FormatWAV, FLAC, MP3, OGG, M4AMultiple export formats for different platforms
Cost per Generation$0.07513 generations per $1.00 on fal
Multi-Generation2 songs at 1.5x costGenerate variations for $0.1125 total
BPM ControlAuto or manual (integer)Tempo matching for specific use cases
Bit Rate Options128-320 kbpsQuality control for MP3/M4A exports

Full Song Generation From Text Descriptions

Sonauto V2 takes text prompts and generates complete musical compositions with vocals, instrumentals, and structure, handling the entire production pipeline that would typically require separate tools for lyrics, melody, arrangement, and mixing.

What this means for you:

  • Automatic lyric generation: Input high-level descriptions like "A pop song about turtles flying" and the model generates appropriate lyrics, or provide custom lyrics for precise control over narrative content

  • Style tag conditioning: Access a comprehensive tag explorer to specify musical genres, moods, and instrumentation, combining tags like "indie folk" + "melancholic" + "acoustic guitar" for targeted aesthetic control

  • Seed-based reproducibility: Lock in specific outputs using seed values with identical parameters (lyrics + tags), enabling iterative refinement and A/B testing of variations on successful generations

  • Prompt strength tuning: Control the balance between creative interpretation and prompt adherence via CFG scaling (1.4-3.1 range), trading naturalness for precision when you need exact style matching


Technical Specifications

SpecDetails
ArchitectureSonauto V2
Input FormatsText prompts, style tags, lyrics, seed values
Output FormatsWAV, FLAC, MP3, OGG, M4A with configurable bit rates
Generation ControlPrompt strength (CFG), balance strength, BPM conditioning
LicenseCommercial use enabled via Partner schema

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Beatoven Music Text to Audio ($0.05) – Sonauto V2 ($0.075) offers lyric generation and vocal synthesis alongside instrumentals, where Beatoven focuses on instrumental background music. Sonauto V2 trades cost efficiency for complete song creation with vocals, making it 1.5x more expensive but delivering full production capability. Beatoven remains ideal for background scores and ambient content where vocals aren't required.