MiniMax Speech-02 HD (Text to Speech) API on fal

Readme

Speech 02 HD - High-Quality Text-to-Speech API

Transform text into natural-sounding speech with Speech 02 HD, a powerful text-to-speech model optimized for lifelike voice synthesis. Built by MiniMax and available through fal.ai, this model delivers enterprise-grade speech generation with support for 30+ languages.

Overview

Speech 02 HD excels at converting text into speech across multiple use cases:

Professional voiceovers and narration
Real-time voice assistants
Audiobook production
Educational content creation
Customer service automation

The model supports multiple languages and voices while maintaining high quality, emotion control, and natural prosody.

Getting Started

Getting up and running with Speech 02 HD takes just a few minutes. Here's how to begin:

Create your fal.ai API key in the dashboard
Install the client library for your preferred language
Make your first API call

For JavaScript/TypeScript:

javascript
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY"
});

const result = await fal.subscribe("fal-ai/minimax/speech-02-hd", {
  input: {
    text: "Hello world! This is a test of the text-to-speech system."
  }
});

For Python:

python
import fal_client

result = fal_client.subscribe(
    "fal-ai/minimax/speech-02-hd",
    arguments={
        "text": "Hello world! This is a test of the text-to-speech system."
    }
)

Technical Specifications

Performance Metrics:

30+ languages supported with native pronunciation
300+ pre-built voices across different demographics
Real-time streaming capabilities
Processing up to 5,000 characters in real-time
Up to 1 million characters asynchronously
Maximum text length: 200,000 characters

Voice Capabilities:

Emotion control (happy, sad, angry, fearful, disgusted, surprised, neutral)
Speed adjustment (0.5x to 2.0x)
Volume control
Pitch adjustment
Support for multiple audio formats (MP3, WAV, FLAC, PCM)

Advanced Features

Fine-tune your speech output with advanced parameters:

javascript
const result = await fal.subscribe("fal-ai/minimax/speech-02-hd", {
  input: {
    text: "Your text here",
    voice_setting: {
      voice_id: "Wise_Woman",
      speed: 1.0,
      vol: 1.0,
      pitch: 0,
      english_normalization: false
    },
    output_format: "mp3"
  }
});

Queue-Based Processing

For handling asynchronous generation:

javascript
// Submit request
const { request_id } = await fal.queue.submit("fal-ai/minimax/speech-02-hd", {
  input: {
    text: "Your text here"
  },
  webhookUrl: "https://optional.webhook.url/for/results"
});

// Check status
const status = await fal.queue.status("fal-ai/minimax/speech-02-hd", {
  requestId: request_id,
  logs: true
});

// Get result
const result = await fal.queue.result("fal-ai/minimax/speech-02-hd", {
  requestId: request_id
});

Best Practices

Make the most of Speech 02 HD by following these guidelines:

Include proper punctuation for natural pausing and intonation
Use the `english_normalization` flag for better number reading in English
Process long texts in chunks for optimal performance
Monitor your usage through the dashboard
Cache frequently generated speech outputs

Supported Languages

Speech 02 HD supports 30+ languages including:

Chinese, Chinese (Yue/Cantonese)
English, Spanish, French, German, Italian, Portuguese
Japanese, Korean
Arabic, Russian, Turkish, Dutch, Ukrainian
Vietnamese, Indonesian, Thai, Hindi
Polish, Romanian, Greek, Czech, Finnish

Audio Output Formats

MP3: Default format, good compression
WAV: Uncompressed, high quality
FLAC: Lossless compression
PCM: Raw audio data

Sample rates: 8000, 16000, 22050, 24000, 32000, 44100 Hz Bitrates: 64000 to 320000 bps

Pricing and Usage

Cost: $0.1 per 1000 characters
Transparent, usage-based pricing
No subscription necessary
No hidden fees or minimum commitments

View detailed pricing or contact sales for enterprise solutions.

Available Models

MiniMax offers multiple TTS models:

speech-02-hd: High-definition quality, best for production use
speech-02-turbo: Optimized for real-time applications with low latency
speech-01-hd: Previous generation HD model
speech-01-turbo: Previous generation turbo model

Support and Resources

We're here to help you succeed:

API Documentation
General fal.ai Documentation
Active developer community
Technical support via email
Regular model updates and improvements

Get started with Speech 02 HD today and experience the next generation of text-to-speech technology. Visit our dashboard to create your API key and begin transforming text into natural, expressive speech.

fal-ai/minimax/speech-02-hd

Input

Result

What would you like to do next?

Logs