MiniMax Speech-02 HD Text to Speech

fal-ai/minimax/speech-02-hd
Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

Your request will cost $0.1 per 1000 characters.

Logs

Readme

Speech 02 HD - High-Quality Text-to-Speech API

Transform text into natural-sounding speech with Speech 02 HD, a powerful text-to-speech model optimized for lifelike voice synthesis. Built by MiniMax and available through fal.ai, this model delivers enterprise-grade speech generation with support for 30+ languages.

Overview

Speech 02 HD excels at converting text into speech across multiple use cases:

  • Professional voiceovers and narration
  • Real-time voice assistants
  • Audiobook production
  • Educational content creation
  • Customer service automation

The model supports multiple languages and voices while maintaining high quality, emotion control, and natural prosody.

Getting Started

Getting up and running with Speech 02 HD takes just a few minutes. Here's how to begin:

  1. Create your fal.ai API key in the dashboard
  2. Install the client library for your preferred language
  3. Make your first API call

For JavaScript/TypeScript:


For Python:


Technical Specifications

Performance Metrics:

  • 30+ languages supported with native pronunciation
  • 300+ pre-built voices across different demographics
  • Real-time streaming capabilities
  • Processing up to 5,000 characters in real-time
  • Up to 1 million characters asynchronously
  • Maximum text length: 200,000 characters

Voice Capabilities:

  • Emotion control (happy, sad, angry, fearful, disgusted, surprised, neutral)
  • Speed adjustment (0.5x to 2.0x)
  • Volume control
  • Pitch adjustment
  • Support for multiple audio formats (MP3, WAV, FLAC, PCM)
Advanced Features

Fine-tune your speech output with advanced parameters:


Queue-Based Processing

For handling asynchronous generation:


Best Practices

Make the most of Speech 02 HD by following these guidelines:

  • Include proper punctuation for natural pausing and intonation
  • Use the flag for better number reading in English
  • Process long texts in chunks for optimal performance
  • Monitor your usage through the dashboard
  • Cache frequently generated speech outputs
Supported Languages

Speech 02 HD supports 30+ languages including:

  • Chinese, Chinese (Yue/Cantonese)
  • English, Spanish, French, German, Italian, Portuguese
  • Japanese, Korean
  • Arabic, Russian, Turkish, Dutch, Ukrainian
  • Vietnamese, Indonesian, Thai, Hindi
  • Polish, Romanian, Greek, Czech, Finnish
Audio Output Formats
  • MP3: Default format, good compression
  • WAV: Uncompressed, high quality
  • FLAC: Lossless compression
  • PCM: Raw audio data

Sample rates: 8000, 16000, 22050, 24000, 32000, 44100 Hz Bitrates: 64000 to 320000 bps

Pricing and Usage
  • Cost: $0.1 per 1000 characters
  • Transparent, usage-based pricing
  • No subscription necessary
  • No hidden fees or minimum commitments

View detailed pricing or contact sales for enterprise solutions.

Available Models

MiniMax offers multiple TTS models:

  • speech-02-hd: High-definition quality, best for production use
  • speech-02-turbo: Optimized for real-time applications with low latency
  • speech-01-hd: Previous generation HD model
  • speech-01-turbo: Previous generation turbo model
Support and Resources

We're here to help you succeed:

Get started with Speech 02 HD today and experience the next generation of text-to-speech technology. Visit our dashboard to create your API key and begin transforming text into natural, expressive speech.