Nano Banana 2 is here 🍌 4x faster, lower cost, better quality

fal-ai/ltx-2/audio-to-video

Generate video from audio using LTX-2
Inference
Commercial use
Partner

About

Generate a video from an audio file.

1. Calling the API#

Install the client#

The client provides a convenient way to interact with the model API.

npm install --save @fal-ai/client

Setup your API Key#

Set FAL_KEY as an environment variable in your runtime.

export FAL_KEY="YOUR_API_KEY"

Submit a request#

The client API handles the API submit protocol. It will handle the request status updates and return the result when the request is completed.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/ltx-2/audio-to-video", {
  input: {
    audio_url: "https://v3b.fal.media/files/b/0a90de09/am7s1zXzVQL52FUC3xvXU_output.mp3"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);

2. Authentication#

The API uses an API Key for authentication. It is recommended you set the FAL_KEY environment variable in your runtime when possible.

API Key#

In case your app is running in an environment where you cannot set environment variables, you can set the API Key manually as a client configuration.
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY"
});

3. Queue#

Submit a request#

The client API provides a convenient way to submit requests to the model.

import { fal } from "@fal-ai/client";

const { request_id } = await fal.queue.submit("fal-ai/ltx-2/audio-to-video", {
  input: {
    audio_url: "https://v3b.fal.media/files/b/0a90de09/am7s1zXzVQL52FUC3xvXU_output.mp3"
  },
  webhookUrl: "https://optional.webhook.url/for/results",
});

Fetch request status#

You can fetch the status of a request to check if it is completed or still in progress.

import { fal } from "@fal-ai/client";

const status = await fal.queue.status("fal-ai/ltx-2/audio-to-video", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b",
  logs: true,
});

Get the result#

Once the request is completed, you can fetch the result. See the Output Schema for the expected result format.

import { fal } from "@fal-ai/client";

const result = await fal.queue.result("fal-ai/ltx-2/audio-to-video", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b"
});
console.log(result.data);
console.log(result.requestId);

4. Files#

Some attributes in the API accept file URLs as input. Whenever that's the case you can pass your own URL or a Base64 data URI.

Data URI (base64)#

You can pass a Base64 data URI as a file input. The API will handle the file decoding for you. Keep in mind that for large files, this alternative although convenient can impact the request performance.

Hosted files (URL)#

You can also pass your own URLs as long as they are publicly accessible. Be aware that some hosts might block cross-site requests, rate-limit, or consider the request as a bot.

Uploading files#

We provide a convenient file storage that allows you to upload files and use them in your requests. You can upload files using the client API and use the returned URL in your requests.

import { fal } from "@fal-ai/client";

const file = new File(["Hello, World!"], "hello.txt", { type: "text/plain" });
const url = await fal.storage.upload(file);

Read more about file handling in our file upload guide.

5. Schema#

Input#

audio_url string* required

URL of the audio file to generate a video from. Duration must be between 2 and 20 seconds. Must be publicly accessible or base64 data URI.

image_url string

URL of an image to use as the first frame of the video. If not provided, prompt is required.

prompt string

Text description of how the video should be generated. Required if image_url is not provided. When image_url is provided, this describes how the image should be animated.

guidance_scale float

Guidance scale for video generation. Higher values make the output more closely follow the prompt. Defaults to 5 for text-to-video, or 9 when providing an image.

{
  "audio_url": "https://v3b.fal.media/files/b/0a90de09/am7s1zXzVQL52FUC3xvXU_output.mp3",
  "image_url": "https://v3b.fal.media/files/b/0a90de1e/UY1yK7JVK3BJuJULcISI2_GnLc3bHv.png",
  "prompt": "An angry man speaking"
}

Output#

video VideoFile* required

The generated video file

{
  "video": {
    "file_name": "mItaWijZCwrk4WRZ86I8d_lbFJoefY.mp4",
    "content_type": "video/mp4",
    "url": "https://v3b.fal.media/files/b/0a90dfb3/mItaWijZCwrk4WRZ86I8d_lbFJoefY.mp4"
  }
}

Other types#

LTXExtendVideoRequest#

video_url string* required

The URL of the video to extend

prompt string

Description of what should happen in the extended portion of the video.

duration float

Duration in seconds to extend the video. Maximum 20 seconds. Default value: 5

mode ModeEnum

Where to extend the video: 'end' extends at the end, 'start' extends at the beginning. Default value: "end"

Possible enum values: start, end

context float

Number of seconds from the input video to use as context for the extension (maximum 20 seconds). If not provided, defaults to maximize available context within the 505 frame limit.

LTXV20TextToVideoRequest#

prompt string* required

The prompt to generate the video from

duration DurationEnum

The duration of the generated video in seconds Default value: "6"

Possible enum values: 6, 8, 10

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 25, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXRetakeVideoResponse#

video VideoFile* required

The generated video file

LTXV23ExtendVideoRequest#

video_url string* required

The URL of the video to extend

prompt string

Description of what should happen in the extended portion of the video.

duration float

Duration in seconds to extend the video. Maximum 20 seconds. Default value: 5

mode ModeEnum

Where to extend the video: 'end' extends at the end, 'start' extends at the beginning. Default value: "end"

Possible enum values: start, end

context float

Number of seconds from the input video to use as context for the extension (maximum 20 seconds). If not provided, defaults to maximize available context within the 505 frame limit.

LTXV23ImageToVideoFastRequest#

image_url string* required

URL of the image to generate the video from. Must be publicly accessible or base64 data URI. Supports PNG, JPEG, WebP, AVIF, and HEIF formats.

end_image_url string

The URL of the end image to use for the generated video. When provided, generates a transition video between start and end frames.

prompt string* required

The prompt to generate the video from

duration DurationEnum

The duration of the generated video in seconds. The fast model supports 6-20 seconds. Note: Durations longer than 10 seconds (12, 14, 16, 18, 20) are only supported with 25 FPS and 1080p resolution. Default value: "6"

Possible enum values: 6, 8, 10, 12, 14, 16, 18, 20

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

aspect_ratio AspectRatioEnum

The aspect ratio of the generated video Default value: "auto"

Possible enum values: auto, 16:9, 9:16

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 24, 25, 48, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXV20ImageToVideoFastRequest#

image_url string* required

URL of the image to generate the video from. Must be publicly accessible or base64 data URI. Supports PNG, JPEG, WebP, AVIF, and HEIF formats.

prompt string* required

The prompt to generate the video from

duration DurationEnum

The duration of the generated video in seconds. The fast model supports 6-20 seconds. Note: Durations longer than 10 seconds (12, 14, 16, 18, 20) are only supported with 25 FPS and 1080p resolution. Default value: "6"

Possible enum values: 6, 8, 10, 12, 14, 16, 18, 20

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 25, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXV20ImageToVideoRequest#

image_url string* required

URL of the image to generate the video from. Must be publicly accessible or base64 data URI. Supports PNG, JPEG, WebP, AVIF, and HEIF formats.

prompt string* required

The prompt to generate the video from

duration DurationEnum

The duration of the generated video in seconds Default value: "6"

Possible enum values: 6, 8, 10

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 25, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXV23RetakeVideoRequest#

video_url string* required

The URL of the video to retake

prompt string* required

The prompt to retake the video with

start_time float

The start time of the video to retake in seconds

duration float

The duration of the video to retake in seconds Default value: 5

retake_mode RetakeModeEnum

The retake mode to use for the retake Default value: "replace_audio_and_video"

Possible enum values: replace_audio, replace_video, replace_audio_and_video

LTXV23TextToVideoFastResponse#

video VideoFile* required

The generated video file

LTXV23TextToVideoResponse#

video VideoFile* required

The generated video file

LTXRetakeVideoRequest#

video_url string* required

The URL of the video to retake

prompt string* required

The prompt to retake the video with

start_time float

The start time of the video to retake in seconds

duration float

The duration of the video to retake in seconds Default value: 5

retake_mode RetakeModeEnum

The retake mode to use for the retake Default value: "replace_audio_and_video"

Possible enum values: replace_audio, replace_video, replace_audio_and_video

LTXV23ImageToVideoResponse#

video VideoFile* required

The generated video file

LTXVImageToVideoResponse#

video VideoFile* required

The generated video file

LTXV23AudioToVideoResponse#

video VideoFile* required

The generated video file

LTXExtendVideoResponse#

video VideoFile* required

The extended video file

LTXV23TextToVideoRequest#

prompt string* required

The prompt to use for the generated video

duration DurationEnum

The duration of the generated video in seconds Default value: "6"

Possible enum values: 6, 8, 10

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

aspect_ratio AspectRatioEnum

The aspect ratio of the generated video Default value: "16:9"

Possible enum values: 16:9, 9:16

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 24, 25, 48, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXV23RetakeVideoResponse#

video VideoFile* required

The generated video file

LTXV23ImageToVideoRequest#

image_url string* required

The URL of the start image to use for the generated video.

end_image_url string

The URL of the end image to use for the generated video. When provided, generates a transition video between start and end frames.

prompt string* required

The prompt to use for the generated video

duration DurationEnum

The duration of the generated video in seconds Default value: "6"

Possible enum values: 6, 8, 10

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

aspect_ratio AspectRatioEnum

The aspect ratio of the generated video. If 'auto', the aspect ratio will be determined automatically based on the input image. Default value: "auto"

Possible enum values: auto, 16:9, 9:16

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 24, 25, 48, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

LTXV20TextToVideoFastRequest#

prompt string* required

The prompt to generate the video from

duration DurationEnum

The duration of the generated video in seconds. The fast model supports 6-20 seconds. Note: Durations longer than 10 seconds (12, 14, 16, 18, 20) are only supported with 25 FPS and 1080p resolution. Default value: "6"

Possible enum values: 6, 8, 10, 12, 14, 16, 18, 20

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 25, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

VideoFile#

url string* required

The URL where the file can be downloaded from.

content_type string

The mime type of the file.

file_name string

The name of the file. It will be auto-generated if not provided.

file_size integer

The size of the file in bytes.

width integer

The width of the video

height integer

The height of the video

fps float

The FPS of the video

duration float

The duration of the video

num_frames integer

The number of frames in the video

LTXV23AudioToVideoRequest#

audio_url string* required

URL of the audio file to generate a video from. Duration must be between 2 and 20 seconds. Must be publicly accessible or base64 data URI.

image_url string

URL of an image to use as the first frame of the video. If not provided, prompt is required.

prompt string

Text description of how the video should be generated. Required if image_url is not provided. When image_url is provided, this describes how the image should be animated.

guidance_scale float

Guidance scale for video generation. Higher values make the output more closely follow the prompt. Defaults to 5 for text-to-video, or 9 when providing an image.

LTXV23ExtendVideoResponse#

video VideoFile* required

The extended video file

LTXV23ImageToVideoFastResponse#

video VideoFile* required

The generated video file

LTXVTextToVideoResponse#

video VideoFile* required

The generated video file

LTXV23TextToVideoFastRequest#

prompt string* required

The prompt to use for the generated video

duration DurationEnum

The duration of the generated video in seconds. The fast model supports 6-20 seconds. Note: Durations longer than 10 seconds (12, 14, 16, 18, 20) are only supported with 25 FPS and 1080p resolution. Default value: "6"

Possible enum values: 6, 8, 10, 12, 14, 16, 18, 20

resolution ResolutionEnum

The resolution of the generated video Default value: "1080p"

Possible enum values: 1080p, 1440p, 2160p

aspect_ratio AspectRatioEnum

The aspect ratio of the generated video Default value: "16:9"

Possible enum values: 16:9, 9:16

fps FramesperSecondEnum

The frames per second of the generated video Default value: "25"

Possible enum values: 24, 25, 48, 50

generate_audio boolean

Whether to generate audio for the generated video Default value: true

Related Models