Whisper API

Endpoint: POST https://fal.run/fal-ai/whisper Endpoint ID: fal-ai/whisper

Try it in the Playground

Run this model interactively with your own prompts.

Quick Start

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/whisper",
    arguments={
        "audio_url": "https://storage.googleapis.com/falserverless/model_tests/whisper/dinner_conversation.mp3"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Capabilities

Audio input
Text prompt input

API Reference

Input Schema

audio_url

string

required

URL of the audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav or webm.

task

TaskEnum

default:"transcribe"

Task to perform on the audio file. Either transcribe or translate. Default value: "transcribe"Possible values: transcribe, translate

language

Enum

Language of the audio file. If set to null, the language will be automatically detected. Defaults to null.If translate is selected as the task, the audio will be translated to English, regardless of the language selected.Possible values: af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh

diarize

boolean

default:"false"

Whether to diarize the audio file. Defaults to false. Setting to true will add costs proportional to diarization inference time.

chunk_level

ChunkLevelEnum

default:"segment"

Level of the chunks to return. Either none, segment or word. none would imply that all of the audio will be transcribed without the timestamp tokens, we suggest to switch to none if you are not satisfied with the transcription quality, since it will usually improve the quality of the results. Switching to none will also provide minor speed ups in the transcription due to less amount of generated tokens. Notice that setting to none will produce a single chunk with the whole transcription. Default value: "segment"Possible values: none, segment, word

batch_size

integer

default:"64"

Default value: 64Range: 1 to 64

prompt

string

default:""

Prompt to use for generation. Defaults to an empty string. Default value: ""

num_speakers

integer

Number of speakers in the audio file. Defaults to null. If not provided, the number of speakers will be automatically detected.

Output Schema

text

string

required

Transcription of the audio file

chunks

list<WhisperChunk>

Timestamp chunks of the audio file

inferred_languages

list<Enum>

required

List of languages that the audio file is inferred to be. Defaults to null.

diarization_segments

list<DiarizationSegment>

required

Speaker diarization segments of the audio file. Only present if diarization is enabled.

Input Example

{
  "audio_url": "https://storage.googleapis.com/falserverless/model_tests/whisper/dinner_conversation.mp3",
  "task": "transcribe",
  "diarize": false,
  "chunk_level": "segment",
  "batch_size": 64,
  "prompt": "",
  "num_speakers": null
}

Output Example

{
  "text": "María, ¿qué cenamos hoy? No sé, ¿qué cenamos? ¿Cenamos pollo frito o pollo asado o algo? Mejor a la plancha, quiero una salada. A la plancha, vale. Y hacemos una ensalada con tomate y esas cosas. Vale. Pues eso lo hacemos, ¿vale? Venga, vale.",
  "chunks": [
    {
      "text": ""
    }
  ],
  "diarization_segments": [
    {
      "speaker": ""
    }
  ]
}

Limitations

task restricted to: transcribe, translate
chunk_level restricted to: none, segment, word
batch_size range: 1 to 64

Try it in the Playground

​Quick Start

​Capabilities

​API Reference

​Input Schema

​Output Schema

​Input Example

​Output Example

​Limitations

Quick Start

Capabilities

API Reference

Input Schema

Output Schema

Input Example

Output Example

Limitations