Vace: Professional Image-to-Video AI Generator

Wan Vace | [image-to-video]

Vace delivers controllable video-to-video transformation at $0.20 per video, combining source imagery, masking, and reference frames for precise edits. Trading flexibility for specificity, this isn't a general text-to-video model, it's a surgical editing tool that maintains spatial consistency across 81-240 frames. Built for creators who need frame-accurate control over existing footage rather than generating from scratch.

Use Cases: Video Reframing & Composition | Masked Region Editing | Multi-Reference Style Transfer

Performance

At $0.20 per video, Vace is a specialized editing endpoint rather than a volume generation tool, 5-10x more expensive than general video models but purpose-built for controllable transformations where maintaining source fidelity matters.

Metric	Result	Context
Frame Range	81-240 frames	Configurable output length; 81 frames for reference-only, up to 241 with source video
Resolution	Up to 720p	480p, 580p, 720p options with 16:9 or 9:16 aspect ratios
Frame Rate	5-24 fps	Adjustable for different motion requirements
Cost per Video	$0.20	5 generations per $1.00 on fal
Inference Steps	2-40 steps	Default 30; higher values improve quality at speed cost
Related Endpoints	Long Reframe, Video Edit	Specialized variants for extended reframing and targeted editing workflows

Precision Control Through Multi-Input Architecture

Vace breaks from standard text-to-video models by requiring explicit visual guidance through source video, mask definition, and optional reference images that work together to constrain the generation space. Where most video models interpret prompts loosely, Vace uses depth or inpainting tasks to preserve spatial relationships while transforming content.

What this means for you:

Surgical editing precision: Mask-based inpainting lets you replace specific regions (background, foreground elements) while maintaining untouched areas frame-by-frame, critical for professional editing workflows where partial regeneration beats full rewrites.
Multi-reference consistency: Multiple reference images guide style, character appearance, or environmental details across the entire sequence, solving the consistency problem that plagues standard video generation.
Depth-aware transformations: Depth task mode preserves spatial structure from source video while applying prompt-driven changes, maintaining camera motion and scene geometry through the edit.
Extended sequence support: 240-frame capability (10+ seconds at 24fps) handles longer clips than most video-to-video models, with frame rate control from 5-24fps for motion tuning.

Technical Specifications

Spec	Details
Architecture	Vace
Input Formats	Video (source), Video/Image (mask), Image (reference) via URL
Output Formats	MP4 video
Task Modes	Depth preservation, Inpainting
Max Duration	240 frames (10 seconds at 24fps)
License	Commercial use enabled

API Documentation | Quickstart Guide | Enterprise Pricing

How It Stacks Up

MiniMax Video 01 Live – Vace trades generation speed for edit precision at 4x the cost ($0.20 vs $0.05 per video). MiniMax excels at text-to-video creation from scratch with faster turnaround, ideal for concept exploration and rapid iteration workflows.

sync.so Lipsync 1.9.0 – Vace handles full-frame video transformation with spatial control through masking, while sync.so specializes in facial animation synchronization at comparable pricing. Choose sync.so for dialogue-driven content, Vace for compositional edits and environmental changes.

Wan 2.1 VACE Long Reframe – The Long Reframe variant extends Vace's capabilities for aspect ratio transformations and extended sequences, sharing the same $0.20 base architecture. Long Reframe optimizes specifically for recomposition workflows where maintaining subject framing across format changes matters more than content replacement.

fal-ai/wan-vace

Input

Result

What would you like to do next?

Logs

Wan Vace | [image-to-video]

Performance

Precision Control Through Multi-Input Architecture

Technical Specifications

How It Stacks Up