POST https://fal.run/fal-ai/whisper
Endpoint ID: fal-ai/whisper
Try it in the Playground
Run this model interactively with your own prompts.
Quick Start
Capabilities
- Audio input
- Text prompt input
API Reference
Input Schema
URL of the audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav or webm.
Task to perform on the audio file. Either transcribe or translate. Default value:
"transcribe"Possible values: transcribe, translateLanguage of the audio file. If set to null, the language will be
automatically detected. Defaults to null.If translate is selected as the task, the audio will be translated to
English, regardless of the language selected.Possible values:
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zhWhether to diarize the audio file. Defaults to false. Setting to true will add costs proportional to diarization inference time.
Level of the chunks to return. Either none, segment or word.
none would imply that all of the audio will be transcribed without the timestamp tokens, we suggest to switch to none if you are not satisfied with the transcription quality, since it will usually improve the quality of the results. Switching to none will also provide minor speed ups in the transcription due to less amount of generated tokens. Notice that setting to none will produce a single chunk with the whole transcription. Default value: "segment"Possible values: none, segment, wordDefault value:
64Range: 1 to 64Prompt to use for generation. Defaults to an empty string. Default value:
""Number of speakers in the audio file. Defaults to null.
If not provided, the number of speakers will be automatically
detected.
Output Schema
Transcription of the audio file
Timestamp chunks of the audio file
List of languages that the audio file is inferred to be. Defaults to null.
Speaker diarization segments of the audio file. Only present if diarization is enabled.
Input Example
Output Example
Limitations
taskrestricted to:transcribe,translatechunk_levelrestricted to:none,segment,wordbatch_sizerange: 1 to 64