bytedance/seedance-2.0/fast/image-to-video

ByteDance's most advanced image-to-video model, fast tier. Lower latency and cost with synchronized audio, start and end frame control, and motion prompts.

Learn more about Seedance 2

Inference

Commercial use

Partner

Schema

LLMs

Playground API Examples

Try in Sandbox

Input

Prompt*

Image Url*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

End Image Url

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Resolution

Duration

Generate Audio

End User Id

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "video": {
    "url": "https://v3b.fal.media/files/b/0a959949/4YjMSjuKAeIIOLFiGktEi_video.mp4",
    "content_type": "video/mp4",
    "file_name": "video.mp4",
    "file_size": 4979791
  },
  "seed": 899283390
}

For every second of 720p video you generated, you will be charged $0.2419/second. Your request will cost $0.0112 per 1000 tokens. The number of tokens is given by (height of output video * width of output video * duration * 24) / 1024.

Logs

Seedance 2.0 : Fast Tier Image to Video API

ByteDance's most advanced image-to-video model, available on fal as `bytedance/seedance-2.0/fast/image-to-video`.

Overview

Provide a starting image URL and a text prompt describing the desired motion. The model preserves the visual content of your image and animates it — with cinematic camera control, realistic physics, and synchronized audio all included.

Key capabilities:

Native audio generation: music, SFX, and dialogue at no extra cost
Director-level camera control: dolly zooms, tracking shots, POV switches, rack focuses
Realistic physics: weight, collisions, fabric, and character motion
Multi-shot cuts possible within a single generation, up to 15 seconds
Start-and-end frame control: provide both a starting and ending image and the model transitions between them
Cinematic output at 720p

Inputs

Parameter	Required	Type	Description
`prompt`	Yes	string	Text describing the desired motion and action
`image_url`	Yes	string	Starting frame to animate. JPEG, PNG, WebP, max 30 MB
`end_image_url`	No	string	Optional ending frame. When provided, the video transitions from start to end image

Parameters

Parameter	Type	Default	Description
`resolution`	enum	`720p`	`480p` (faster/cheaper) or `720p`
`duration`	enum	`auto`	`auto` or any integer from `4` to `15` seconds
`aspect_ratio`	enum	`auto`	`auto` (inferred from input image), `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`
`generate_audio`	boolean	`true`	Synchronized audio: SFX, ambient sound, lip-synced speech
`seed`	integer	—	Fix for reproducibility (minor variation may still occur)
`end_user_id`	string	—	Optional identifier for the end user

Pricing

Billed per second of generated 720p output:

Tier	Rate
Fast tier	$0.2419 / sec
Standard tier	$0.3024 / sec
Token-based billing	$0.014 / 1,000 tokens

Token formula: `tokens = (height × width × duration × 24) / 1024`

A 10-second fast clip costs approximately $2.42, versus ~$3.03 on standard.

Quick Start

Python

bash
pip install fal-client
export FAL_KEY="YOUR_API_KEY"

python
import fal_client

result = fal_client.subscribe(
    "bytedance/seedance-2.0/fast/image-to-video",
    arguments={
        "prompt": "The cat slowly turns its head and blinks, fur ruffling in a gentle breeze.",
        "image_url": "https://your-host.com/cat.jpg",
        "resolution": "720p",
        "duration": "auto",
        "aspect_ratio": "auto",
        "generate_audio": True,
    },
    with_logs=True,
    on_queue_update=lambda u: [print(l["message"]) for l in u.logs]
    if isinstance(u, fal_client.InProgress) else None,
)

print(result["video"]["url"])

With start and end frames:

python
result = fal_client.subscribe(
    "bytedance/seedance-2.0/fast/image-to-video",
    arguments={
        "prompt": "The sun sets behind the mountains, sky shifting from gold to deep purple.",
        "image_url": "https://your-host.com/golden-hour.jpg",
        "end_image_url": "https://your-host.com/twilight.jpg",
        "duration": "8",
        "aspect_ratio": "16:9",
    },
)

JavaScript / Node.js

bash
npm install @fal-ai/client
export FAL_KEY="YOUR_API_KEY"

js
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("bytedance/seedance-2.0/fast/image-to-video", {
  input: {
    prompt: "The cat slowly turns its head and blinks, fur ruffling in a gentle breeze.",
    image_url: "https://your-host.com/cat.jpg",
    resolution: "720p",
    duration: "auto",
    aspect_ratio: "auto",
    generate_audio: true,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.video.url);

Output

json
{
  "video": {
    "url": "https://...",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 4823041
  },
  "seed": 42
}

Async / Queue Usage

For longer generations, submit to the queue and poll:

python
handler = fal_client.submit(
    "bytedance/seedance-2.0/fast/image-to-video",
    arguments={...},
    webhook_url="https://your-server.com/webhook",
)

request_id = handler.request_id
status = fal_client.status("bytedance/seedance-2.0/fast/image-to-video", request_id, with_logs=True)
result = fal_client.result("bytedance/seedance-2.0/fast/image-to-video", request_id)

Fast vs. Standard Tier

The fast tier uses the exact same schema and parameters as `bytedance/seedance-2.0/image-to-video`.

	Fast	Standard
Endpoint	`bytedance/seedance-2.0/fast/image-to-video`	`bytedance/seedance-2.0/image-to-video`
Cost (10 sec)	~$2.42	~$3.03
Latency	Lower	Higher
Output quality	Same	Same

Compared to Reference-to-Video

	Image to Video	Reference to Video
Starting image	1 (required)	Up to 9 (optional)
Ending image	1 (optional)	Not supported
Reference videos	Not supported	Up to 3
Reference audio	Not supported	Up to 3
Use case	Animate a single image	Multi-reference, multi-modal generation

Use image-to-video when you have one image to animate. Use reference-to-video when you need multi-modal inputs or want to reference multiple visual assets in a single prompt.

Availability

April 2, 2026: Launched with geographic and enterprise-only restrictions
April 9, 2026: All restrictions lifted, fully open with no geographic or use-case limitations