Vace Video to Video
Input
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.2 per video.
Logs
Wan Vace | [image-to-video]
Vace delivers controllable video-to-video transformation at $0.20 per video, combining source imagery, masking, and reference frames for precise edits. Trading flexibility for specificity, this isn't a general text-to-video model, it's a surgical editing tool that maintains spatial consistency across 81-240 frames. Built for creators who need frame-accurate control over existing footage rather than generating from scratch.
Use Cases: Video Reframing & Composition | Masked Region Editing | Multi-Reference Style Transfer
Performance
At $0.20 per video, Vace is a specialized editing endpoint rather than a volume generation tool, 5-10x more expensive than general video models but purpose-built for controllable transformations where maintaining source fidelity matters.
| Metric | Result | Context |
|---|---|---|
| Frame Range | 81-240 frames | Configurable output length; 81 frames for reference-only, up to 241 with source video |
| Resolution | Up to 720p | 480p, 580p, 720p options with 16:9 or 9:16 aspect ratios |
| Frame Rate | 5-24 fps | Adjustable for different motion requirements |
| Cost per Video | $0.20 | 5 generations per $1.00 on fal |
| Inference Steps | 2-40 steps | Default 30; higher values improve quality at speed cost |
| Related Endpoints | Long Reframe, Video Edit | Specialized variants for extended reframing and targeted editing workflows |
Precision Control Through Multi-Input Architecture
Vace breaks from standard text-to-video models by requiring explicit visual guidance through source video, mask definition, and optional reference images that work together to constrain the generation space. Where most video models interpret prompts loosely, Vace uses depth or inpainting tasks to preserve spatial relationships while transforming content.
What this means for you:
-
Surgical editing precision: Mask-based inpainting lets you replace specific regions (background, foreground elements) while maintaining untouched areas frame-by-frame, critical for professional editing workflows where partial regeneration beats full rewrites.
-
Multi-reference consistency: Multiple reference images guide style, character appearance, or environmental details across the entire sequence, solving the consistency problem that plagues standard video generation.
-
Depth-aware transformations: Depth task mode preserves spatial structure from source video while applying prompt-driven changes, maintaining camera motion and scene geometry through the edit.
-
Extended sequence support: 240-frame capability (10+ seconds at 24fps) handles longer clips than most video-to-video models, with frame rate control from 5-24fps for motion tuning.
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Vace |
| Input Formats | Video (source), Video/Image (mask), Image (reference) via URL |
| Output Formats | MP4 video |
| Task Modes | Depth preservation, Inpainting |
| Max Duration | 240 frames (10 seconds at 24fps) |
| License | Commercial use enabled |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
MiniMax Video 01 Live – Vace trades generation speed for edit precision at 4x the cost ($0.20 vs $0.05 per video). MiniMax excels at text-to-video creation from scratch with faster turnaround, ideal for concept exploration and rapid iteration workflows.
sync.so Lipsync 1.9.0 – Vace handles full-frame video transformation with spatial control through masking, while sync.so specializes in facial animation synchronization at comparable pricing. Choose sync.so for dialogue-driven content, Vace for compositional edits and environmental changes.
Wan 2.1 VACE Long Reframe – The Long Reframe variant extends Vace's capabilities for aspect ratio transformations and extended sequences, sharing the same $0.20 base architecture. Long Reframe optimizes specifically for recomposition workflows where maintaining subject framing across format changes matters more than content replacement.