Sonauto V2 Text to Audio
Input
Customize your input with more control.
Logs
Sonauto V2 | [text-to-audio]
Sonauto V2 generates complete songs with vocals and instrumentals at $0.075 per track, trading the instant generation of text-to-speech models for full musical composition capability. This is a full music generation system that creates structured songs from text descriptions: handling melody, harmony, rhythm, lyrics, and arrangement in a single inference.
Use Cases: Content Creation Soundtracks | Marketing Campaign Music | Prototype Music Production
Performance
At $0.075 per generation, Sonauto V2 delivers complete song creation at a small fraction of traditional music production costs, positioning it as a rapid prototyping tool for creators who need custom music without licensing fees or studio time.
| Metric | Result | Context |
|---|---|---|
| Output Format | WAV, FLAC, MP3, OGG, M4A | Multiple export formats for different platforms |
| Cost per Generation | $0.075 | 13 generations per $1.00 on fal |
| Multi-Generation | 2 songs at 1.5x cost | Generate variations for $0.1125 total |
| BPM Control | Auto or manual (integer) | Tempo matching for specific use cases |
| Bit Rate Options | 128-320 kbps | Quality control for MP3/M4A exports |
Full Song Generation From Text Descriptions
Sonauto V2 takes text prompts and generates complete musical compositions with vocals, instrumentals, and structure, handling the entire production pipeline that would typically require separate tools for lyrics, melody, arrangement, and mixing.
What this means for you:
-
Automatic lyric generation: Input high-level descriptions like "A pop song about turtles flying" and the model generates appropriate lyrics, or provide custom lyrics for precise control over narrative content
-
Style tag conditioning: Access a comprehensive tag explorer to specify musical genres, moods, and instrumentation, combining tags like "indie folk" + "melancholic" + "acoustic guitar" for targeted aesthetic control
-
Seed-based reproducibility: Lock in specific outputs using seed values with identical parameters (lyrics + tags), enabling iterative refinement and A/B testing of variations on successful generations
-
Prompt strength tuning: Control the balance between creative interpretation and prompt adherence via CFG scaling (1.4-3.1 range), trading naturalness for precision when you need exact style matching
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Sonauto V2 |
| Input Formats | Text prompts, style tags, lyrics, seed values |
| Output Formats | WAV, FLAC, MP3, OGG, M4A with configurable bit rates |
| Generation Control | Prompt strength (CFG), balance strength, BPM conditioning |
| License | Commercial use enabled via Partner schema |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Beatoven Music Text to Audio ($0.05) – Sonauto V2 ($0.075) offers lyric generation and vocal synthesis alongside instrumentals, where Beatoven focuses on instrumental background music. Sonauto V2 trades cost efficiency for complete song creation with vocals, making it 1.5x more expensive but delivering full production capability. Beatoven remains ideal for background scores and ambient content where vocals aren't required.