Bytedance Image to Video

fal-ai/bytedance/seedance/v1.5/pro/image-to-video
Generate videos with audio with Seedance 1.5 (supports start & end frame)
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Each 720p 5 second video with audio costs roughly $0.26. For other resolutions, 1 million video tokens with audio costs $2.4. Without audio, the price is 1.2 per millition tokens. tokens(video) = (height x width x FPS x duration) / 1024.

Logs

Seedance 1.5 Pro Image-to-Video

Animate any image into a cinematic video with synchronized audio. Upload a start frame and optionally an end frame and Seedance 1.5 Pro generates the motion, camera movement, dialogue, and sound design in between.


Use Cases

Use CaseWhy Seedance 1.5 Pro fits
Photo animationBreathe life into a still portrait or product shot with realistic motion and ambient sound.
Character animationTurn concept art or a single character frame into a speaking, emoting performance with lip-sync.
Product revealsStart on a hero shot, end on packaging — the model animates the transition with cinematic flair.
Scene transitionsDefine start and end compositions for precise A-to-B shots — useful for ads, trailers, or music videos.
Storyboard-to-videoConvert illustrated storyboard frames into rough-cut motion tests with matching audio.
Social contentAnimate memes, portraits, or fan art into shareable clips with sound.
Virtual avatarsAnimate a single headshot into a talking-head video with natural speech and lip-sync.

Key Features

FeatureDescription
Start frame conditioningUpload an image to set the opening composition, lighting, subject, and style.
End frame conditioningOptionally upload a second image to define where the shot lands — the model generates the motion path between them.
Native audio generationDialogue, sound effects, and ambient audio rendered alongside the video. Lip movements stay locked to speech.
Cinematic camera workPan, tilt, zoom, dolly, orbit, tracking shots — describe the move in your prompt.
Character consistencyThe subject from your start frame stays stable throughout — face, clothing, and expression.
High resolutionOutput up to 1080p with smooth temporal consistency.

Controls

ParameterOptionsNotes
`prompt`Text (required)Describe action, dialogue, camera, and sound
`image_url`URL (required)Start frame — sets the opening composition
`end_image_url`URL (optional)End frame — defines the closing composition
`aspect_ratio``21:9` · `16:9` · `4:3` · `1:1` · `3:4` · `9:16`Default: `16:9`
`resolution``480p` · `720p``480p` for faster iteration; `720p` for final output
`duration``4``12` secondsDefault: `5`
`generate_audio``true` / `false`Default: `true` — set `false` for silent video
`camera_fixed``true` / `false`Lock the camera in place (tripod shot)
`seed`IntegerSet a value for reproducibility; use `-1` for random

Start Frame / End Frame

This is the core differentiator from text-to-video. You control the opening and closing compositions directly.

FrameWhat it does
Start frame (`image_url`)Required. Sets the initial subject, pose, lighting, color grade, and environment. The model animates forward from here.
End frame (`end_image_url`)Optional. Defines the final composition. The model generates a motion path that lands precisely on this frame.

Tips:

  • Use the same subject in both frames for smooth transitions.
  • Match aspect ratio and style between start and end frames.
  • Motion is generated in latent space — not interpolated — so physics and camera movement feel natural.

Prompting Tips

Your prompt guides what happens between the frames:

ElementExample
Action"She turns to face the camera and smiles"
Dialogue`"I've been waiting for this moment."` (use quotes)
Camera"Slow push-in ending on a close-up"
Audio/Foley"Soft piano, room reverb, fabric rustling"

More tips:

  • The start frame already defines the scene — focus your prompt on motion and sound.
  • For talking heads, put the dialogue in quotes and describe the emotion: `"I can't believe it," voice breaking with emotion`.
  • Use `camera_fixed: true` if you want the subject to move but the frame to stay locked.

Specs

SpecValue
Max duration12 seconds
Max resolution1080p
AudioMixed dialogue + foley + score, 48 kHz AAC
Output formatMP4 (H.264)

API

fal.ai → Seedance 1.5 Pro image-to-video