Image to Video APIs

Explore fal’s Collection Of The Best Image-to-Video APIs

fal is the best developer-friendly, one-stop shop for AI image-to-video models. Every image-to-video model on fal runs through the same SDK pattern, so once you’ve integrated one, switching between Seedance 2.0, Kling 3.0 Pro, or Veo 3.1 is a one-line endpoint change.

How do I turn an image into a video on fal?

Image-to-video endpoints take an image URL plus a text prompt describing the motion, and return a URL to a generated video file. After installing `@fal-ai/client` and setting your `FAL_KEY`, it looks like this:

js
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("bytedance/seedance-2.0/image-to-video", {
  input: {
    prompt: "Slow cinematic push-in with the subject's hair moving gently in the wind",
    image_url: "https://your-host.com/photo.jpg",
    resolution: "720p"
  }
});

console.log(result.data.video.url);

The same call shape works across Seedance 2.0, Kling 3.0 Pro, Veo 3.1, and the rest of the catalog. You swap the endpoint string and adjust the input fields each model expects. For example, Seedance 2.0 takes `image_url`, while Kling 3.0 Pro takes `start_image_url`.

Which models generate image-to-video with native audio?

Most premium image-to-video models on fal generate synchronized audio alongside the video itself.

Seedance 2.0 generates audio and video together in a single pass, with the same rate whether `generate_audio` is on or off. Output covers music, lip-synced dialogue, and ambient sound.
Kling 3.0 Pro produces native audio when `generate_audio` is enabled, with support for multiple speakers and English and Chinese voice output. Audio adds 50% to the per-second cost, moving from $0.112 to $0.168.
Veo 3.1 supports synchronized audio, charged at $0.40 per second with audio versus $0.20 per second without audio at 720p or 1080p.

For projects where audio is part of the delivery, the cost difference can be meaningful. Enabling audio on Veo 3.1 doubles the rate, while Seedance bills the same either way.

Which image-to-video models support start and end frame control?

Several models on fal accept an end frame alongside the start image, animating the transition between two specific points.

Seedance 2.0 takes both `image_url` and `end_image_url`. When both are provided, the model generates motion that transitions from the first frame to the second.
Veo 3.1 has dedicated first-last-frame endpoints, including `veo3.1/first-last-frame-to-video` and the fast variant, built for this workflow.
Kling 3.0 Pro accepts `end_image_url` in the same way, with the same start-to-end transition behavior.

For longer narratives, Kling 3.0 Pro’s `multi_prompt` feature lets you define multiple shots in sequence with distinct prompts and durations.

How are image-to-video models priced on fal?

Pricing on fal generally scales by the second across image-to-video models, with resolution, audio, or both affecting the rate depending on the model.

Model	Price
Kling 2.5 Turbo Pro	$0.35 for 5 seconds, then $0.07 / additional second
Kling 3.0 Pro	$0.112 / second audio off
Kling 3.0 Pro with audio	$0.168 / second
Kling 3.0 Pro with voice control	$0.196 / second
Seedance 2.0 Fast	$0.2419 / second at 720p, audio included
Seedance 2.0 Standard	$0.3024 / second at 720p, audio included
Veo 3.1	$0.20 / second without audio at 720p or 1080p
Veo 3.1 with audio	$0.40 / second at 720p or 1080p

As a worked example, a 5-second image-to-video clip costs roughly:

$0.35 on Kling 2.5 Turbo Pro
$0.56 on Kling 3.0 Pro without audio
$0.84 on Kling 3.0 Pro with audio
$1.21 on Seedance 2.0 Fast at 720p
$1.51 on Seedance 2.0 Standard at 720p
$1.00 on Veo 3.1 without audio
$2.00 on Veo 3.1 with audio

You only pay for what you generate, so you can compare motion quality, audio behavior, and frame-control options without rewriting your integration.

Quick Start

Install the client

bash
npm install --save @fal-ai/client

Set your API key

bash
export FAL_KEY="YOUR_API_KEY"

Call a model

js
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("bytedance/seedance-2.0/image-to-video", {
  input: {
    prompt: "Slow cinematic push-in with the subject's hair moving gently in the wind",
    image_url: "https://your-host.com/photo.jpg",
    resolution: "720p"
  }
});

console.log(result.data.video.url);

The same auth, billing, and queue logic carry across every image-to-video endpoint, so you can compare models side by side without rewriting integration code.

For longer generations, higher-resolution outputs, or production workflows, submit to the queue and rely on webhooks instead of blocking on the result.