MiniMax Speech-02 HD Text to Speech
Input
Customize your input with more control.
Result
Your request will cost $0.1 per 1000 characters.
Logs
Readme
Speech 02 HD - High-Quality Text-to-Speech API
Transform text into natural-sounding speech with Speech 02 HD, a powerful text-to-speech model optimized for lifelike voice synthesis. Built by MiniMax and available through fal.ai, this model delivers enterprise-grade speech generation with support for 30+ languages.
Overview
Speech 02 HD excels at converting text into speech across multiple use cases:
- Professional voiceovers and narration
- Real-time voice assistants
- Audiobook production
- Educational content creation
- Customer service automation
The model supports multiple languages and voices while maintaining high quality, emotion control, and natural prosody.
Getting Started
Getting up and running with Speech 02 HD takes just a few minutes. Here's how to begin:
- Create your fal.ai API key in the dashboard
- Install the client library for your preferred language
- Make your first API call
For JavaScript/TypeScript:
For Python:
Technical Specifications
Performance Metrics:
- 30+ languages supported with native pronunciation
- 300+ pre-built voices across different demographics
- Real-time streaming capabilities
- Processing up to 5,000 characters in real-time
- Up to 1 million characters asynchronously
- Maximum text length: 200,000 characters
Voice Capabilities:
- Emotion control (happy, sad, angry, fearful, disgusted, surprised, neutral)
- Speed adjustment (0.5x to 2.0x)
- Volume control
- Pitch adjustment
- Support for multiple audio formats (MP3, WAV, FLAC, PCM)
Advanced Features
Fine-tune your speech output with advanced parameters:
Queue-Based Processing
For handling asynchronous generation:
Best Practices
Make the most of Speech 02 HD by following these guidelines:
- Include proper punctuation for natural pausing and intonation
- Use the flag for better number reading in English
- Process long texts in chunks for optimal performance
- Monitor your usage through the dashboard
- Cache frequently generated speech outputs
Supported Languages
Speech 02 HD supports 30+ languages including:
- Chinese, Chinese (Yue/Cantonese)
- English, Spanish, French, German, Italian, Portuguese
- Japanese, Korean
- Arabic, Russian, Turkish, Dutch, Ukrainian
- Vietnamese, Indonesian, Thai, Hindi
- Polish, Romanian, Greek, Czech, Finnish
Audio Output Formats
- MP3: Default format, good compression
- WAV: Uncompressed, high quality
- FLAC: Lossless compression
- PCM: Raw audio data
Sample rates: 8000, 16000, 22050, 24000, 32000, 44100 Hz Bitrates: 64000 to 320000 bps
Pricing and Usage
- Cost: $0.1 per 1000 characters
- Transparent, usage-based pricing
- No subscription necessary
- No hidden fees or minimum commitments
View detailed pricing or contact sales for enterprise solutions.
Available Models
MiniMax offers multiple TTS models:
- speech-02-hd: High-definition quality, best for production use
- speech-02-turbo: Optimized for real-time applications with low latency
- speech-01-hd: Previous generation HD model
- speech-01-turbo: Previous generation turbo model
Support and Resources
We're here to help you succeed:
- API Documentation
- General fal.ai Documentation
- Active developer community
- Technical support via email
- Regular model updates and improvements
Get started with Speech 02 HD today and experience the next generation of text-to-speech technology. Visit our dashboard to create your API key and begin transforming text into natural, expressive speech.