bytedance/seedance-2.0/reference-to-video
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Result
What would you like to do next?
For every second of 720p video you generated, you will be charged $0.3024/second. Your request will cost $0.014 per 1000 tokens. The number of tokens is given by (height of output video * width of output video * ( input duration + output duration) * 24) / 1024. If video inputs are provided the price is multiplied by 0.6. With video inputs and 720p resolution the price is $0.1814 per second.
Logs
Run Seedance 2.0 AI Reference To Video API on fal
ByteDance's most advanced reference-to-video model, available on fal as `bytedance/seedance-2.0/reference-to-video`.
Overview
The most flexible Seedance 2.0 endpoint: provide a text prompt alongside up to 12 reference files spanning images, videos, and audio clips, and the model weaves them into a single cinematic output. Reference assets in your prompt using `@Image1`, `@Video1`, `@Audio1`, etc.
Key capabilities:
- Native audio generation: music, SFX, and lip-synced dialogue, all in a single pass at no extra cost
- Director-level camera control via prompt: dolly zooms, tracking shots, POV switches, handheld movement
- Realistic physics: fluid dynamics, cloth, character motion
- Multi-shot editing: natural cuts within a single generation, up to 15 seconds
- Output up to 1080p
Inputs
| Modality | Limit | Formats | Constraints |
|---|---|---|---|
| Images | Up to 9 | JPEG, PNG, WebP | Max 30 MB each |
| Videos | Up to 3 | MP4, MOV | Combined duration 2–15 s, total under 50 MB, 480p–720p resolution per video |
| Audio | Up to 3 | MP3, WAV | Combined duration ≤ 15 s, max 15 MB each; requires at least one image or video |
Total files across all modalities must not exceed 12.
Parameters
| Parameter | Type | Default | Options |
|---|---|---|---|
`prompt` | string | — | Text description; reference assets as `@Image1`, `@Video1`, `@Audio1`, etc. |
`image_urls` | list<string> | — | Reference image URLs |
`video_urls` | list<string> | — | Reference video URLs |
`audio_urls` | list<string> | — | Reference audio URLs |
`resolution` | enum | `720p` | `480p`, `720p`, `1080p` |
`duration` | enum | `auto` | `auto` or any integer from `4` to `15` seconds |
`aspect_ratio` | enum | `auto` | `auto`, `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16` |
`generate_audio` | boolean | `true` | Synchronized audio: SFX, ambient sound, lip-synced speech. Same price either way. |
`seed` | integer | — | Fix for reproducibility (minor variation may still occur) |
`end_user_id` | string | — | Optional identifier for the end user |
Pricing
Billed per second of generated 720p output:
| Condition | Rate | 10-sec clip |
|---|---|---|
| Standard, no video input | $0.3024 / sec | ~$3.02 |
| Standard, with video input | $0.1814 / sec (0.6× discount) | ~$1.81 |
| Fast tier, no video input | $0.2419 / sec | ~$2.42 |
| Fast tier, with video input | $0.1452 / sec (0.6× discount) | ~$1.45 |
| Token-based billing | $0.014 / 1,000 tokens | — |
Token formula (note: includes both input and output duration):
tokens = (height × width × (input_duration + output_duration) × 24) / 1024
Audio generation is included at no extra cost regardless of the `generate_audio` setting.
Quick Start
Python
bashpip install fal-client export FAL_KEY="YOUR_API_KEY"
pythonimport fal_client result = fal_client.subscribe( "bytedance/seedance-2.0/reference-to-video", arguments={ "prompt": "@Image1 shows a product on a marble surface. Slow dolly in with dramatic side lighting, dust particles floating in the air.", "image_urls": ["https://your-host.com/product.jpg"], "resolution": "1080p", "duration": "8", "aspect_ratio": "16:9", "generate_audio": True, }, with_logs=True, on_queue_update=lambda u: [print(l["message"]) for l in u.logs] if isinstance(u, fal_client.InProgress) else None, ) print(result["video"]["url"])
Multi-modal example with images, video, and audio:
pythonresult = fal_client.subscribe( "bytedance/seedance-2.0/reference-to-video", arguments={ "prompt": "Recreate the scene from @Video1 but replace the background with the environment from @Image1. Use @Audio1 as the soundtrack.", "image_urls": ["https://your-host.com/background.jpg"], "video_urls": ["https://your-host.com/scene.mp4"], "audio_urls": ["https://your-host.com/soundtrack.mp3"], "resolution": "720p", "duration": "auto", "aspect_ratio": "auto", }, )
JavaScript / Node.js
bashnpm install @fal-ai/client export FAL_KEY="YOUR_API_KEY"
jsimport { fal } from "@fal-ai/client"; const result = await fal.subscribe("bytedance/seedance-2.0/reference-to-video", { input: { prompt: "@Image1 shows a product on a marble surface. Slow dolly in with dramatic side lighting, dust particles floating in the air.", image_urls: ["https://your-host.com/product.jpg"], resolution: "1080p", duration: "8", aspect_ratio: "16:9", generate_audio: true, }, logs: true, onQueueUpdate: (update) => { if (update.status === "IN_PROGRESS") { update.logs.map((log) => log.message).forEach(console.log); } }, }); console.log(result.data.video.url);
Output
json{ "video": { "url": "https://...", "content_type": "video/mp4", "file_name": "output.mp4", "file_size": 4823041 }, "seed": 42 }
Async / Queue Usage
pythonhandler = fal_client.submit( "bytedance/seedance-2.0/reference-to-video", arguments={...}, webhook_url="https://your-server.com/webhook", ) request_id = handler.request_id status = fal_client.status("bytedance/seedance-2.0/reference-to-video", request_id, with_logs=True) result = fal_client.result("bytedance/seedance-2.0/reference-to-video", request_id)
Standard vs. Fast Tier
The only functional difference between the two tiers is resolution support and cost. Use fast unless you need 1080p.
| Standard | Fast | |
|---|---|---|
| Endpoint | `bytedance/seedance-2.0/reference-to-video` | `bytedance/seedance-2.0/fast/reference-to-video` |
| Max resolution | 1080p | 720p |
| Cost (10 sec, 720p, no video input) | ~$3.02 | ~$2.42 |
| Cost (10 sec, 720p, with video input) | ~$1.81 | ~$1.45 |
| Latency | Higher | Lower |
| Schema | Identical | Identical |
Enterprise Variant
A separate enterprise endpoint adds face-input support for consistent character identity across generations:
fal-ai/seedance-2/enterprise/reference-to-video
Availability
- April 2, 2026: Launched with enterprise-only, geo-restricted access
- April 9, 2026: All restrictions lifted, fully open with no geographic or use-case limitations

