Skip to main content

Overview

Models for audio processing including text-to-speech, speech-to-text, voice cloning, and music generation.

Top Models

Whisper API

Whisper is a model for speech transcription and translation.
Example output from Whisper

Kling Video Create Voice API

Create Voices to be used with Kling Models Voice Control
Example output from Kling Video Create Voice

xAI Text to Speech API

Generate speech with expressive and realistic voices from xAI
Example output from xAI Text to Speech
Explore all audio models on fal.ai/models.

Quick Start

Get started with Whisper:
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/whisper",
    arguments={
        "audio_url": "https://storage.googleapis.com/falserverless/model_tests/whisper/dinner_conversation.mp3"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Pricing

For detailed pricing information, see the fal.ai pricing page or individual model pages.