Try New Grok Imagine here!

Vace Video to Video

fal-ai/wan-vace
Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.2 per video.

Logs

Wan Vace | [image-to-video]

Vace delivers controllable video-to-video transformation at $0.20 per video, combining source imagery, masking, and reference frames for precise edits. Trading flexibility for specificity, this isn't a general text-to-video model, it's a surgical editing tool that maintains spatial consistency across 81-240 frames. Built for creators who need frame-accurate control over existing footage rather than generating from scratch.

Use Cases: Video Reframing & Composition | Masked Region Editing | Multi-Reference Style Transfer


Performance

At $0.20 per video, Vace is a specialized editing endpoint rather than a volume generation tool, 5-10x more expensive than general video models but purpose-built for controllable transformations where maintaining source fidelity matters.

MetricResultContext
Frame Range81-240 framesConfigurable output length; 81 frames for reference-only, up to 241 with source video
ResolutionUp to 720p480p, 580p, 720p options with 16:9 or 9:16 aspect ratios
Frame Rate5-24 fpsAdjustable for different motion requirements
Cost per Video$0.205 generations per $1.00 on fal
Inference Steps2-40 stepsDefault 30; higher values improve quality at speed cost
Related EndpointsLong Reframe, Video EditSpecialized variants for extended reframing and targeted editing workflows

Precision Control Through Multi-Input Architecture

Vace breaks from standard text-to-video models by requiring explicit visual guidance through source video, mask definition, and optional reference images that work together to constrain the generation space. Where most video models interpret prompts loosely, Vace uses depth or inpainting tasks to preserve spatial relationships while transforming content.

What this means for you:

  • Surgical editing precision: Mask-based inpainting lets you replace specific regions (background, foreground elements) while maintaining untouched areas frame-by-frame, critical for professional editing workflows where partial regeneration beats full rewrites.

  • Multi-reference consistency: Multiple reference images guide style, character appearance, or environmental details across the entire sequence, solving the consistency problem that plagues standard video generation.

  • Depth-aware transformations: Depth task mode preserves spatial structure from source video while applying prompt-driven changes, maintaining camera motion and scene geometry through the edit.

  • Extended sequence support: 240-frame capability (10+ seconds at 24fps) handles longer clips than most video-to-video models, with frame rate control from 5-24fps for motion tuning.


Technical Specifications

SpecDetails
ArchitectureVace
Input FormatsVideo (source), Video/Image (mask), Image (reference) via URL
Output FormatsMP4 video
Task ModesDepth preservation, Inpainting
Max Duration240 frames (10 seconds at 24fps)
LicenseCommercial use enabled

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

MiniMax Video 01 Live – Vace trades generation speed for edit precision at 4x the cost ($0.20 vs $0.05 per video). MiniMax excels at text-to-video creation from scratch with faster turnaround, ideal for concept exploration and rapid iteration workflows.

sync.so Lipsync 1.9.0 – Vace handles full-frame video transformation with spatial control through masking, while sync.so specializes in facial animation synchronization at comparable pricing. Choose sync.so for dialogue-driven content, Vace for compositional edits and environmental changes.

Wan 2.1 VACE Long Reframe – The Long Reframe variant extends Vace's capabilities for aspect ratio transformations and extended sequences, sharing the same $0.20 base architecture. Long Reframe optimizes specifically for recomposition workflows where maintaining subject framing across format changes matters more than content replacement.