Seedance 1.5 vs Seedance v1: What's new?

Choosing Between ByteDance's Video Models

ByteDance's Seedance models represent two distinct approaches to AI video generation. Seedance 1.5 Pro introduces joint audio-video generation, producing synchronized sound alongside visuals from a single text prompt. The original Seedance v1 focuses exclusively on visual output, but supports higher 1080p resolution. The decision between them hinges on whether your workflow prioritizes integrated audio or maximum visual fidelity.

The architectural differences between these models extend beyond feature sets. Seedance 1.5 employs a dual-branch diffusion transformer that renders video and audio in the same latent space, enabling tight lip-sync and natural foley without post-production work. This multimodal approach builds on research demonstrating that joint audio-video training improves both semantic alignment and temporal synchronization compared to cascaded generation pipelines.¹

Seedance v1, by contrast, channels all computational resources toward visual generation, achieving superior resolution at the cost of audio capability.

Core Capabilities

Seedance v1 established ByteDance's presence in multi-shot video generation with support for both text-to-video and image-to-video workflows. The architecture employs decoupled spatial and temporal layers with an interleaved multimodal positional encoding scheme, enabling native multi-shot generation and consistent subject representation across temporal-spatial transformations. At 1080p output resolution, v1 remains the higher-fidelity option for purely visual applications.

Seedance 1.5 Pro represents a fundamental architectural shift rather than an incremental update. This is ByteDance's first joint audio-video model, processing complex prompts that describe both visual elements and audio cues simultaneously. The model interprets dialogue, environmental sounds, and musical elements alongside visual descriptions. According to fal's documentation, it uses a dual-branch diffusion transformer to render video and audio in the same latent space, producing tight lip-sync and natural foley without additional post-production steps.

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Technical Specifications

Seedance v1 on fal supports six aspect ratios (21:9 through 9:16) with 1080p output as its primary advantage. Both models share identical aspect ratio options.

Seedance 1.5 Pro expands the parameter space considerably:

Resolution options: 480p for faster generation, 720p for balanced quality
Aspect ratios: Six options spanning 21:9 ultra-wide to 9:16 vertical, covering cinematic formats through mobile-optimized outputs
Duration control: Variable length from 4 to 12 seconds, enabling precise cost management
Camera controls: Optional fixed camera position for static shot compositions
Audio toggle: Enable or disable audio generation based on workflow requirements
Safety checker: Configurable content moderation

The prompt structure for Seedance 1.5 accommodates audio descriptions directly. The key difference from v1 appears in the prompt itself:

// Seedance v1 prompt (visual only)
"Courtroom scene, defense attorney giving closing argument, jury watching intently"

// Seedance 1.5 prompt (visual + audio)
"Defense attorney declaring 'Ladies and gentlemen, reasonable doubt is the foundation of justice itself', footsteps on marble, jury shifting, courtroom drama"

The model interprets both the visual scene and acoustic landscape from this single input.

Performance and Speed Comparison

Generation speed differs meaningfully between these models, affecting both development workflows and production economics.

Specification	Seedance v1	Seedance 1.5 Pro
Maximum Resolution	1080p	720p (parameter)
Audio Generation	No	Yes (synchronized)
Duration Range	2-12 seconds	4-12 seconds
Aspect Ratio Options	Six (21:9 to 9:16)	Six (21:9 to 9:16)
Pricing (5s video)	~$0.62 (1080p)	~$0.26 (720p with audio)
Video Extension	No	Yes
End-Frame Conditioning	No	Yes

Seedance v1 delivers predictable generation times without audio processing overhead. A 5-second 1080p video generates in approximately 41 seconds on an NVIDIA L20 GPU. For projects requiring 1080p output, this remains the only option between the two models.

Seedance 1.5 offers dual resolution modes for different use cases. The 480p mode prioritizes speed, suitable for rapid prototyping and preview generation. The 720p mode balances quality and generation time for production use. Because ByteDance architected this as a joint model rather than separate video and audio pipelines, the audio does not simply double generation time. Both modalities are processed simultaneously, yielding efficient combined output.

Pricing Structure

Cost efficiency varies based on parameter selection and workflow structure.

Seedance v1 pricing follows a straightforward model based on resolution and duration. Each 1080p 5-second video costs approximately $0.62, with other resolutions priced at $2.5 per million video tokens calculated as (height x width x FPS x duration) / 1024.

Seedance 1.5 Pro pricing reflects the integrated audio capability. Each 720p 5-second video with audio costs approximately $0.26. For other resolutions, pricing is $2.4 per million video tokens with audio enabled and $1.2 per million tokens without audio. Developers who want 1.5's advanced features (video extension, end-frame conditioning) without the audio cost can disable audio generation and pay the lower rate. When factoring in the cost of separate audio generation, synchronization, and post-processing that would otherwise require additional API calls and processing time, the combined output becomes economically attractive.

The 480p option in Seedance 1.5 provides a budget-friendly entry point for:

Rapid concept testing and creative exploration
Social media content optimized for mobile viewing
High-volume generation scenarios with flexible resolution requirements
Development and testing phases before final production

Output Quality Characteristics

Seedance v1 produces videos with smooth motion, rich detail, and naturalistic color grading. The model maintains temporal coherence across frames, avoiding jittery or morphing artifacts. For image-to-video workflows, source image consistency remains high, with generated motion extending naturally from the starting frame.

Seedance 1.5 Pro maintains these visual quality standards while adding contextually appropriate audio. The synchronized audio generation produces spatially consistent sound that matches visual timing and scene characteristics across four categories: dialogue and speech with appropriate emotional tone, sound effects synchronized with visual elements, ambient environmental audio, and musical accompaniment when prompted.

Advanced Capabilities

Seedance 1.5 Pro extends beyond standard text-to-video generation with capabilities unavailable in v1.

Image-to-Video with Audio allows you to upload a start frame and optionally an end frame. Seedance 1.5 Pro generates the motion, camera movement, dialogue, and sound design in between.

Video Extension enables extending existing video clips while preserving motion continuity, subject identity, and scene coherence. Your prompt guides subsequent action with optional audio generation for extended segments. For additional video extension options, consider LTX Video-0.9.7 13B or Pixverse.

Use Case Recommendations

Choose Seedance v1 when:

maximum 1080p resolution is required
animating existing assets where source consistency matters
audio will be added separately in post-production

For alternative image-to-video approaches, Pixverse Image to Video offers comparable capabilities.

Choose Seedance 1.5 Pro when:

integrated audio matters for your application (social media, advertising, educational videos)
complex multi-dimensional prompts describe your creative vision with specific dialogue and sounds
you need video extension or end-frame conditioning for creative control unavailable in v1

Migration Considerations

Migrating from Seedance v1 to 1.5 requires updating the endpoint to fal-ai/bytedance/seedance/v1.5/pro/text-to-video. Existing authentication and client library code remains compatible. Review the fal documentation for implementation details.

Your existing v1 prompts will work with 1.5, but enhanced prompts leverage the full capabilities by adding audio descriptions. The generate_audio parameter defaults to true; set this to false explicitly for video-only output matching v1 behavior. For resolution, 720p provides comparable perceived quality to 1080p on most modern displays.

Decision Framework

Does your output require 1080p resolution? If yes, Seedance v1 is your only option. If 720p suffices, proceed to question two.

Does synchronized audio add value? If your workflow includes professional audio production or the content is purely visual, v1 makes sense. If integrated audio saves time or enables new creative directions, 1.5 delivers clear advantages. For audio enhancement, DeepFilterNet 3 can clean up generated audio in post-processing.

Do you need video extension or end-frame conditioning? Seedance 1.5 Pro offers these capabilities while v1 does not.

Seedance 1.5 vs Seedance v1: Which ByteDance Video Model Should You Use?

Choosing Between ByteDance's Video Models

Core Capabilities

falMODEL APIs

falSERVERLESS

falCOMPUTE