bytedance/seedance-2.0/fast/reference-to-video

ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs.

Learn more about Seedance 2

Inference

Commercial use

Partner

Schema

LLMs

Playground API Examples

Input

Prompt*

A hyper-realistic UGC-style medium close-up of a woman in her late 20s standing in her sunlit bedroom in front of a slightly messy bed with white linen sheets, filmed vertically as if she's holding her phone at arm's length for a TikTok or Instagram Reel; she has shoulder-length honey blonde waves, light freckles, natural glowy skin with minimal makeup, wearing a cream ribbed tank top and high-waisted jeans, holding up a compact mirrorless camera on [Image1] in one hand close to the lens so it's clearly visible, her expression bright and genuine — not overly polished — as she says directly to the camera: "Okay I never do this, but I genuinely have to talk about this camera — I've had it for three weeks and it has not left my bag, the photos look like film straight out of it, no editing, and it literally fits in my jacket pocket"; she then turns the camera around to briefly show its compact size against her hand, the background featuring a wooden dresser with scattered jewelry, a small vase of dried flowers, soft morning light pouring through sheer white curtains, a framed poster slightly crooked on the wall, and a houseplant in the corner, all captured with the slightly wider phone-lens distortion, natural skin texture including pores, subtle handheld shake, ambient room tone, and unfiltered warm color grading that feels like an authentic creator review rather than a produced advertisement.

Image Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

Image 1

1 image added

Video Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

Audio Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

Resolution

Duration

Generate Audio

End User Id

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "video": {
    "url": "https://v3b.fal.media/files/b/0a959ac1/vVRE687-Tmi0f1mqmbpSz_video.mp4",
    "content_type": "video/mp4",
    "file_name": "video.mp4",
    "file_size": 4985966
  },
  "seed": 579176660
}

For every second of 720p video you generated, you will be charged $0.2419/second. Your request will cost $0.0112 per 1000 tokens. The number of tokens is given by (height of output video * width of output video * (input video duration + output video duration) * 24) / 1024. If video inputs are provided the price is multiplied by 0.6. With video inputs and 720p resolution the price is $0.14515 per second.

Logs

Run Seedance 2.0 AI Fast Reference To Video API on fal

ByteDance's most advanced video generation model, available on fal as `bytedance/seedance-2.0/fast/reference-to-video`.

Overview

Seedance 2.0 is a true multi-modal production tool that accepts a rich combination of inputs alongside a text prompt, then generates cinematic 720p video with synchronized audio.

Key capabilities:

Native audio generation: music, dialogue, and sound effects rendered alongside the video
Director-level camera control: dolly zooms, rack focuses, tracking shots, POV switches
Realistic physics: weight, collisions, fabric, and character motion
Multi-shot editing: a single generation can include natural cuts, up to 15 seconds
Cinematic output at 720p

Inputs

Modality	Limit	Formats	Notes
Text prompt	1	—	Reference uploaded assets as `@Image1`, `@Video1`, `@Audio1`, etc.
Images	Up to 9	JPEG, PNG, WebP	Max 30 MB each
Videos	Up to 3	MP4, MOV	Combined duration 2–15 s, total under 50 MB, 480p–720p resolution
Audio	Up to 3	MP3, WAV	Combined duration ≤ 15 s, max 15 MB each; requires at least one image or video

Total files across all modalities must not exceed 12.

Pricing

Billed per second of generated output:

Condition	Rate
Standard (720p, fast tier)	$0.2419 / sec
With video input provided	~$0.1452 / sec (0.6× multiplier)
Token-based billing	$0.014 / 1,000 tokens

Token formula: `tokens = height of output video * width of output video * (input video duration + output video duration) * 24) / 1024`

Parameters

Parameter	Type	Default	Description
`prompt`	string	—	Text description of the video to generate
`image_urls`	list<string>	—	Reference image URLs
`video_urls`	list<string>	—	Reference video URLs
`audio_urls`	list<string>	—	Reference audio URLs
`resolution`	enum	`720p`	`480p` (faster/cheaper) or `720p`
`duration`	enum	`auto`	`auto` or any integer from `4` to `15` seconds
`aspect_ratio`	enum	`auto`	`auto`, `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`
`generate_audio`	boolean	`true`	Generate synchronized audio (SFX, ambient, lip-sync)
`seed`	integer	—	Fix seed for reproducibility (minor variation may still occur)
`end_user_id`	string	—	Optional identifier for the end user

Quick Start

Python

bash
pip install fal-client
export FAL_KEY="YOUR_API_KEY"

python
import fal_client

result = fal_client.subscribe(
    "bytedance/seedance-2.0/fast/reference-to-video",
    arguments={
        "prompt": "A surfer rides a massive wave at golden hour. @Image1 sets the scene.",
        "image_urls": ["https://your-host.com/beach.jpg"],
        "resolution": "720p",
        "duration": "auto",
        "aspect_ratio": "16:9",
        "generate_audio": True,
    },
    with_logs=True,
    on_queue_update=lambda u: [print(l["message"]) for l in u.logs]
    if isinstance(u, fal_client.InProgress) else None,
)

print(result["video"]["url"])

JavaScript / Node.js

bash
npm install @fal-ai/client
export FAL_KEY="YOUR_API_KEY"

js
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("bytedance/seedance-2.0/fast/reference-to-video", {
  input: {
    prompt: "A surfer rides a massive wave at golden hour. @Image1 sets the scene.",
    image_urls: ["https://your-host.com/beach.jpg"],
    resolution: "720p",
    duration: "auto",
    aspect_ratio: "16:9",
    generate_audio: true,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.video.url);

Output

json
{
  "video": {
    "url": "https://...",
    "content_type": "video/mp4",
    "file_name": "output.mp4",
    "file_size": 4823041
  },
  "seed": 42
}

Async / Queue Usage

For longer generations, submit to the queue and poll:

python
handler = fal_client.submit(
    "bytedance/seedance-2.0/fast/reference-to-video",
    arguments={...},
    webhook_url="https://your-server.com/webhook",
)

request_id = handler.request_id
status = fal_client.status("bytedance/seedance-2.0/fast/reference-to-video", request_id, with_logs=True)
result = fal_client.result("bytedance/seedance-2.0/fast/reference-to-video", request_id)

Fast vs. Standard Tier

The fast tier uses the same schema and parameters as the standard endpoints: lower latency and lower cost, same capabilities.

	Fast	Standard
Endpoint suffix	`.../fast/reference-to-video`	`.../reference-to-video`
Latency	Lower	Higher
Cost	Lower	Higher
Output quality	Same	Same