fal Sandbox is here - run all your models together! 🏖️

Workflow Utilities Video to Video

fal-ai/workflow-utilities/auto-subtitle
Add automatic subtitles to videos
Inference
Commercial use

About

Automatically generate and add subtitles to video.

Uses speech-to-text to transcribe audio and adds karaoke-style subtitles with word-level highlighting. Supports multiple languages, Google Fonts, and customizable styling including animation effects. This endpoint:

  1. Extracts audio from video
  2. Transcribes audio with word-level timing
  3. Groups words into readable subtitle segments
  4. Adds styled subtitles to video with customizable font and colors

1. Calling the API#

Install the client#

The client provides a convenient way to interact with the model API.

npm install --save @fal-ai/client

Setup your API Key#

Set FAL_KEY as an environment variable in your runtime.

export FAL_KEY="YOUR_API_KEY"

Submit a request#

The client API handles the API submit protocol. It will handle the request status updates and return the result when the request is completed.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/workflow-utilities/auto-subtitle", {
  input: {
    video_url: "https://v3b.fal.media/files/b/kangaroo/oUCiZjQwEy6bIQdPUSLDF_output.mp4"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);

2. Authentication#

The API uses an API Key for authentication. It is recommended you set the FAL_KEY environment variable in your runtime when possible.

API Key#

In case your app is running in an environment where you cannot set environment variables, you can set the API Key manually as a client configuration.
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY"
});

3. Queue#

Submit a request#

The client API provides a convenient way to submit requests to the model.

import { fal } from "@fal-ai/client";

const { request_id } = await fal.queue.submit("fal-ai/workflow-utilities/auto-subtitle", {
  input: {
    video_url: "https://v3b.fal.media/files/b/kangaroo/oUCiZjQwEy6bIQdPUSLDF_output.mp4"
  },
  webhookUrl: "https://optional.webhook.url/for/results",
});

Fetch request status#

You can fetch the status of a request to check if it is completed or still in progress.

import { fal } from "@fal-ai/client";

const status = await fal.queue.status("fal-ai/workflow-utilities/auto-subtitle", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b",
  logs: true,
});

Get the result#

Once the request is completed, you can fetch the result. See the Output Schema for the expected result format.

import { fal } from "@fal-ai/client";

const result = await fal.queue.result("fal-ai/workflow-utilities/auto-subtitle", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b"
});
console.log(result.data);
console.log(result.requestId);

4. Files#

Some attributes in the API accept file URLs as input. Whenever that's the case you can pass your own URL or a Base64 data URI.

Data URI (base64)#

You can pass a Base64 data URI as a file input. The API will handle the file decoding for you. Keep in mind that for large files, this alternative although convenient can impact the request performance.

Hosted files (URL)#

You can also pass your own URLs as long as they are publicly accessible. Be aware that some hosts might block cross-site requests, rate-limit, or consider the request as a bot.

Uploading files#

We provide a convenient file storage that allows you to upload files and use them in your requests. You can upload files using the client API and use the returned URL in your requests.

import { fal } from "@fal-ai/client";

const file = new File(["Hello, World!"], "hello.txt", { type: "text/plain" });
const url = await fal.storage.upload(file);

Read more about file handling in our file upload guide.

5. Schema#

Input#

video_url string* required

URL of the video file to add automatic subtitles to

language string

Language code for transcription (e.g., 'en', 'es', 'fr', 'de', 'it', 'pt', 'nl', 'ja', 'zh', 'ko') or 3-letter ISO code (e.g., 'eng', 'spa', 'fra') Default value: "en"

font_name string

Any Google Font name from fonts.google.com (e.g., 'Montserrat', 'Poppins', 'BBH Sans Hegarty') Default value: "Montserrat"

font_size integer

Font size for subtitles (TikTok style uses larger text) Default value: 100

font_weight FontWeightEnum

Font weight (TikTok style typically uses bold or black) Default value: "bold"

Possible enum values: normal, bold, black

font_color FontColorEnum

Subtitle text color for non-active words Default value: "white"

Possible enum values: white, black, red, green, blue, yellow, orange, purple, pink, brown, gray, cyan, magenta

highlight_color HighlightColorEnum

Color for the currently speaking word (karaoke-style highlight) Default value: "purple"

Possible enum values: white, black, red, green, blue, yellow, orange, purple, pink, brown, gray, cyan, magenta

stroke_width integer

Text stroke/outline width in pixels (0 for no stroke) Default value: 3

stroke_color StrokeColorEnum

Text stroke/outline color Default value: "black"

Possible enum values: black, white, red, green, blue, yellow, orange, purple, pink, brown, gray, cyan, magenta

background_color BackgroundColorEnum

Background color behind text ('none' or 'transparent' for no background) Default value: "none"

Possible enum values: black, white, red, green, blue, yellow, orange, purple, pink, brown, gray, cyan, magenta, none, transparent

background_opacity float

Background opacity (0.0 = fully transparent, 1.0 = fully opaque)

position PositionEnum

Vertical position of subtitles Default value: "bottom"

Possible enum values: top, center, bottom

y_offset integer

Vertical offset in pixels (positive = move down, negative = move up) Default value: 75

words_per_subtitle integer

Maximum number of words per subtitle segment. Use 1 for single-word display, 2-3 for short phrases, or 8-12 for full sentences. Default value: 3

enable_animation boolean

Enable animation effects for subtitles (bounce style entrance) Default value: true

{
  "video_url": "https://v3b.fal.media/files/b/kangaroo/oUCiZjQwEy6bIQdPUSLDF_output.mp4",
  "language": "en",
  "font_name": "Montserrat",
  "font_size": 100,
  "font_weight": "bold",
  "font_color": "white",
  "highlight_color": "purple",
  "stroke_width": 3,
  "stroke_color": "black",
  "background_color": "none",
  "position": "bottom",
  "y_offset": 75,
  "words_per_subtitle": 1,
  "enable_animation": true
}

Output#

video File* required

The video with automatic subtitles

transcription string* required

Full transcription text

subtitle_count integer* required

Number of subtitle segments generated

{
  "video": {
    "file_size": 16789234,
    "file_name": "output.mp4",
    "content_type": "video/mp4",
    "url": "https://v3b.fal.media/files/b/monkey/HPBSoe-QsAxSIkDh7Zn76_output.mp4"
  },
  "transcription": ""
}

Other types#

AudioFile#

url string* required

URL of the audio file

content_type string* required

Content type of the audio file

file_name string* required

Name of the audio file

file_size integer* required

Size of the audio file in bytes

Image#

url string* required

The URL where the file can be downloaded from.

content_type string

The mime type of the file.

file_name string

The name of the file. It will be auto-generated if not provided.

file_size integer

The size of the file in bytes.

file_data string

File data

width integer

The width of the image in pixels.

height integer

The height of the image in pixels.

File#

url string* required

The URL where the file can be downloaded from.

content_type string

The mime type of the file.

file_name string

The name of the file. It will be auto-generated if not provided.

file_size integer

The size of the file in bytes.

file_data string

File data

SubtitleSegment#

start float* required

Start time in seconds (e.g., 0.0 for beginning, 5.5 for 5.5 seconds)

end float* required

End time in seconds (must be greater than start time)

text string* required

Subtitle text to display (supports multiple lines with )