# Gemini TTS

> Use Gemini TTS Models to convert your prompts to real audio.


## Overview

- **Endpoint**: `https://fal.run/fal-ai/gemini-tts`
- **Model ID**: `fal-ai/gemini-tts`
- **Category**: text-to-audio
- **Kind**: inference
**Tags**: text-to-speech, audio, gemini



## Pricing

Your request will cost **0.5$** per 1 M input tokens and **10$** per 1 M output token for flash model. Cost will be double for pro model.

For more details, see [fal.ai pricing](https://fal.ai/pricing).

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:


- **`prompt`** (`string`, _required_):
  The text to convert to speech. Gemini TTS supports natural-language prompting for style, pace, accent, and emotional expression — include delivery instructions inline with the text (e.g. 'Say cheerfully: Have a wonderful day!'). For multi-speaker synthesis, prefix lines with speaker aliases defined in the speakers field (e.g. 'Alice: Hello!
  Bob: Hi!'). Supports inline pace/style markers like [slowly], [whispering], [excited], [extremely fast].
  - Examples: "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!\nDrChen: Thanks for having me! I'm excited to be here.\nHost: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?\nDrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?"

- **`style_instructions`** (`string`, _optional_):
  Optional style and delivery instructions prepended to the prompt. Controls expressiveness, accent, pace, tone, and emotional expression using natural language. Use this to separate style control from the text content. Examples: 'Speak warmly and slowly', 'Read this as a dramatic newscast', 'Use a British accent with a cheerful tone', 'Whisper mysteriously'.
  - Examples: "Say the following in a warm, conversational tone", "Read this as a dramatic newscast with gravitas", "Speak with a British accent, cheerfully and energetically", "This is a podcast conversation. The host is enthusiastic and curious, the guest is knowledgeable and articulate"

- **`voice`** (`VoiceEnum`, _optional_):
  Voice preset for single-speaker synthesis. 30 distinct voices are available. Ignored when speakers is set. Popular choices: Kore (strong, firm female), Puck (upbeat, lively male), Charon (calm, professional male), Zephyr (bright, clear female), Aoede (warm, melodic female). Default value: `"Kore"`
  - Default: `"Kore"`
  - Options: `"Achernar"`, `"Achird"`, `"Algenib"`, `"Algieba"`, `"Alnilam"`, `"Aoede"`, `"Autonoe"`, `"Callirrhoe"`, `"Charon"`, `"Despina"`, `"Enceladus"`, `"Erinome"`, `"Fenrir"`, `"Gacrux"`, `"Iapetus"`, `"Kore"`, `"Laomedeia"`, `"Leda"`, `"Orus"`, `"Pulcherrima"`, `"Puck"`, `"Rasalgethi"`, `"Sadachbia"`, `"Sadaltager"`, `"Schedar"`, `"Sulafat"`, `"Umbriel"`, `"Vindemiatrix"`, `"Zephyr"`, `"Zubenelgenubi"`

- **`model`** (`ModelEnum`, _optional_):
  Which Gemini TTS model to use. gemini-2.5-flash-tts: low latency, cost-efficient for everyday applications (recommended). gemini-2.5-pro-tts: highest quality, best for structured workflows like podcasts, audiobooks, and customer support. Default value: `"gemini-2.5-flash-tts"`
  - Default: `"gemini-2.5-flash-tts"`
  - Options: `"gemini-2.5-flash-tts"`, `"gemini-2.5-pro-tts"`

- **`language_code`** (`Enum`, _optional_):
  Language for multilingual synthesis. When set, steers the model to speak in the specified language. Supports 24 GA languages and 60+ Preview languages. If not set, the model auto-detects the language from the text.
  - Options: `"Arabic (Egypt)"`, `"Bangla (Bangladesh)"`, `"Dutch (Netherlands)"`, `"English (India)"`, `"English (US)"`, `"French (France)"`, `"German (Germany)"`, `"Hindi (India)"`, `"Indonesian (Indonesia)"`, `"Italian (Italy)"`, `"Japanese (Japan)"`, `"Korean (South Korea)"`, `"Marathi (India)"`, `"Polish (Poland)"`, `"Portuguese (Brazil)"`, `"Romanian (Romania)"`, `"Russian (Russia)"`, `"Spanish (Spain)"`, `"Tamil (India)"`, `"Telugu (India)"`, `"Thai (Thailand)"`, `"Turkish (Turkey)"`, `"Ukrainian (Ukraine)"`, `"Vietnamese (Vietnam)"`, `"Afrikaans (South Africa)"`, `"Albanian (Albania)"`, `"Amharic (Ethiopia)"`, `"Arabic (World)"`, `"Armenian (Armenia)"`, `"Azerbaijani (Azerbaijan)"`, `"Basque (Spain)"`, `"Belarusian (Belarus)"`, `"Bulgarian (Bulgaria)"`, `"Burmese (Myanmar)"`, `"Catalan (Spain)"`, `"Cebuano (Philippines)"`, `"Chinese Mandarin (China)"`, `"Chinese Mandarin (Taiwan)"`, `"Croatian (Croatia)"`, `"Czech (Czech Republic)"`, `"Danish (Denmark)"`, `"English (Australia)"`, `"English (UK)"`, `"Estonian (Estonia)"`, `"Filipino (Philippines)"`, `"Finnish (Finland)"`, `"French (Canada)"`, `"Galician (Spain)"`, `"Georgian (Georgia)"`, `"Greek (Greece)"`, `"Gujarati (India)"`, `"Haitian Creole (Haiti)"`, `"Hebrew (Israel)"`, `"Hungarian (Hungary)"`, `"Icelandic (Iceland)"`, `"Javanese (Java)"`, `"Kannada (India)"`, `"Konkani (India)"`, `"Lao (Laos)"`, `"Latin (Vatican City)"`, `"Latvian (Latvia)"`, `"Lithuanian (Lithuania)"`, `"Luxembourgish (Luxembourg)"`, `"Macedonian (North Macedonia)"`, `"Maithili (India)"`, `"Malagasy (Madagascar)"`, `"Malay (Malaysia)"`, `"Malayalam (India)"`, `"Mongolian (Mongolia)"`, `"Nepali (Nepal)"`, `"Norwegian Bokmal (Norway)"`, `"Norwegian Nynorsk (Norway)"`, `"Odia (India)"`, `"Pashto (Afghanistan)"`, `"Persian (Iran)"`, `"Portuguese (Portugal)"`, `"Punjabi (India)"`, `"Serbian (Serbia)"`, `"Sindhi (India)"`, `"Sinhala (Sri Lanka)"`, `"Slovak (Slovakia)"`, `"Slovenian (Slovenia)"`, `"Spanish (Latin America)"`, `"Spanish (Mexico)"`, `"Swahili (Kenya)"`, `"Swedish (Sweden)"`, `"Urdu (Pakistan)"`
  - Examples: "English (US)", "French (France)", "Japanese (Japan)"

- **`speakers`** (`list<SpeakerConfig>`, _optional_):
  Multi-speaker voice configuration. When set, enables multi-speaker synthesis where different parts of the text are spoken by different voices. Each speaker needs a voice and a speaker_id (alias) that matches prefixes in the prompt. Requires gemini-2.5-pro-tts or gemini-2.5-flash-tts model. Not supported with gemini-2.5-flash-lite-preview-tts.
  - Array of SpeakerConfig
  - Examples: [{"speaker_id":"Host","voice":"Charon"},{"speaker_id":"DrChen","voice":"Kore"}]

- **`temperature`** (`float`, _optional_):
  Controls the randomness of the speech output. Higher values produce more creative and varied delivery, while lower values make the output more predictable and focused. Default value: `1`
  - Default: `1`
  - Range: `0` to `2`

- **`output_format`** (`OutputFormatEnum`, _optional_):
  Audio output format. mp3: compressed, small file size (recommended). wav: uncompressed PCM wrapped in WAV (24 kHz, 16-bit mono). ogg_opus: Ogg container with Opus codec, good quality-to-size ratio. Default value: `"mp3"`
  - Default: `"mp3"`
  - Options: `"wav"`, `"mp3"`, `"ogg_opus"`



**Required Parameters Example**:

```json
{
  "prompt": "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!\nDrChen: Thanks for having me! I'm excited to be here.\nHost: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?\nDrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?"
}
```

**Full Example**:

```json
{
  "prompt": "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!\nDrChen: Thanks for having me! I'm excited to be here.\nHost: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?\nDrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?",
  "style_instructions": "Say the following in a warm, conversational tone",
  "voice": "Kore",
  "model": "gemini-2.5-flash-tts",
  "language_code": "English (US)",
  "speakers": [
    {
      "speaker_id": "Host",
      "voice": "Charon"
    },
    {
      "speaker_id": "DrChen",
      "voice": "Kore"
    }
  ],
  "temperature": 1,
  "output_format": "mp3"
}
```


### Output Schema

The API returns the following output format:

- **`audio`** (`File`, _required_):
  The generated audio file.
  - Examples: {"url":"https://v3b.fal.media/files/b/0a935d4f/Ez4NpcnFTuGsu2FHDaJTR_gemini_tts_output.mp3"}



**Example Response**:

```json
{
  "audio": {
    "url": "https://v3b.fal.media/files/b/0a935d4f/Ez4NpcnFTuGsu2FHDaJTR_gemini_tts_output.mp3"
  }
}
```


## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://fal.run/fal-ai/gemini-tts \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "prompt": "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!\nDrChen: Thanks for having me! I'm excited to be here.\nHost: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?\nDrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?"
   }'
```

### Python

Ensure you have the Python client installed:

```bash
pip install fal-client
```

Then use the API client to make requests:

```python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/gemini-tts",
    arguments={
        "prompt": "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!
    DrChen: Thanks for having me! I'm excited to be here.
    Host: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?
    DrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)
```

### JavaScript

Ensure you have the JavaScript client installed:

```bash
npm install --save @fal-ai/client
```

Then use the API client to make requests:

```javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/gemini-tts", {
  input: {
    prompt: "Host: Welcome back to AI Frontiers, the podcast where we explore the latest breakthroughs in artificial intelligence. Today we have a very special guest. Doctor Chen, thank you for joining us!
  DrChen: Thanks for having me! I'm excited to be here.
  Host: So, let's dive right in. Your recent paper on neural architecture search has been making waves. Can you tell our listeners what inspired this research?
  DrChen: Absolutely. It all started when we noticed that most existing approaches were optimizing for the wrong metrics. We asked ourselves, what if we could let the model design itself?"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);
```


## Additional Resources

### Documentation

- [Model Playground](https://fal.ai/models/fal-ai/gemini-tts)
- [API Documentation](https://fal.ai/models/fal-ai/gemini-tts/api)
- [OpenAPI Schema](https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=fal-ai/gemini-tts)

### fal.ai Platform

- [Platform Documentation](https://docs.fal.ai)
- [Python Client](https://docs.fal.ai/clients/python)
- [JavaScript Client](https://docs.fal.ai/clients/javascript)
