Dia Text to Speech

fal-ai/dia-tts
Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.
Inference
Commercial use

Input

Result

Idle

Your request will cost $0.04 per 1000 character.

Logs

Readme

Dia TTS: Professional Text-to-Speech API

Transform your text into natural-sounding speech with Dia TTS, a state-of-the-art text-to-speech API that delivers studio-quality voice output with realistic dialogue generation.

Overview

Dia TTS enables developers to generate lifelike speech from text input, offering exceptional clarity and natural prosody. Dia directly generates realistic dialogue from transcripts with audio conditioning that enables emotion control and produces natural nonverbals like laughter and throat clearing.

Key Features
Advanced Neural Voice Technology

Generate remarkably natural speech with a 1.6 billion parameter model trained on extensive voice datasets. The model captures subtle nuances in pronunciation, intonation, and emotional expression, specializing in realistic dialogue synthesis.

Dialogue and Multi-Speaker Support
  • Supports multi-speaker conversations using [S1], [S2] speaker tags
  • Produces natural nonverbals like laughter and throat clearing
  • Generates realistic dialogue directly from transcripts
  • Audio conditioning enables emotion control
Voice Cloning Capabilities

Clone dialog voices from a sample audio and generate dialogs from text prompts using advanced AI techniques to create high-quality text-to-speech with voice consistency.

Getting Started

Getting started with Dia TTS takes just a few minutes. Here's how:

  1. Install the SDK:
bash
# Using npm
npm install --save @fal-ai/client

# Using pip
pip install fal-client
  1. Configure your API key:
javascript
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY_HERE"
});
  1. Make your first API call:
javascript
const result = await fal.subscribe("fal-ai/dia-tts", {
  input: {
    text: "[S1] Welcome to Dia TTS. This is an example of generated speech. (laughs) [S2] It sounds incredibly natural!"
  }
});
Technical Specifications
Audio Output
  • Format: MP3
  • Optimized for dialogue and conversational speech
  • Supports nonverbal cues and emotional expression
Model Capabilities
  • 1.6 billion parameter model by Nari Labs
  • Specialized for realistic dialogue synthesis
  • Zero-shot voice variety (produces new synthetic voice with each run)
  • Supports voice cloning with reference audio
Best Practices
Optimize Your Results
  • Use [S1], [S2] tags for multi-speaker dialogues
  • Include nonverbal cues in parentheses: (laughs), (sighs), (clears throat)
  • Break long text into natural conversation chunks
  • Use voice cloning for consistent character voices across longer content
Dialogue Formatting
  • Use speaker tags: [S1] for first speaker, [S2] for second speaker
  • Include actions and emotions: (whispers), (excited), (chuckles)
  • Structure as natural conversation flow
Error Handling

Implement robust error handling to manage API responses:

javascript
try {
  const result = await fal.subscribe("fal-ai/dia-tts", {
    input: {
      text: inputText
    }
  });
} catch (error) {
  console.error("TTS generation failed:", error.message);
}
Pricing and Usage

Your request will cost $0.04 per 1000 characters. Our transparent, usage-based pricing scales with your needs:

  • Pay-per-use model based on character count
  • No subscription required
  • Competitive rates for dialogue generation
  • No hidden fees or minimum commitments

View detailed pricing or contact sales for enterprise solutions.

Support Resources

We're here to help you succeed:

  • Comprehensive API documentation at docs.fal.ai
  • Interactive code examples
  • Community forums
  • Direct technical support

Ready to transform your text into natural speech? Sign up now to get your API key and start building with Dia TTS.