Run models all in one Sandbox 🏖️

fal-ai/vidu/q2/reference-to-video

Use the latest Vidu Q2 models which much more better quality and control on your videos.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

The model will cost 0.1 $ for every 360p video, 0.2 $ for every 520p video, 0.3 $ for 720 p video. For 1080 p the cost would be 0.2 $ along with 0.1 $ for every video second.

Logs

Vidu Q2 | [image-to-video]

Vidu Q2 delivers reference-consistent video generation from up to 7 input images at $0.10-$0.30 per video. Trading single-image generation for multi-reference character consistency, the model maintains subject appearance across complex motion sequences. Built for creators who need character continuity without manual keyframing.

Use Cases: Character Animation | Product Demos | Marketing Content


Performance

At $0.10 per 360p video (increasing with resolution and duration), Vidu Q2 positions as a cost-effective multi-reference video generator, 10x more economical than frame-by-frame editing workflows.

MetricResultContext
Reference ImagesUp to 7 imagesMulti-angle subject consistency vs single-image competitors
Resolution Range360p to 1080pVariable pricing: $0.10 (360p), $0.20 (520p), $0.30 (720p), $0.20 + $0.10/sec (1080p)
Duration Options1-8 secondsConfigurable in 1-second increments
Cost per Video$0.10-$0.30+Base cost varies by resolution; 1080p adds $0.10 per second
Aspect Ratios16:9, 9:16, 1:1Social-optimized formats included
Related EndpointsVidu Q1 Reference, Vidu Q2 ProQ1 predecessor and Pro variant for enhanced quality

Multi-Reference Subject Consistency

Unlike single-image video models, Vidu Q2 processes up to 7 reference images simultaneously to maintain character or product appearance across generated motion. The architecture extracts consistent visual features from multiple angles before synthesis.

What this means for you:

  • Character Continuity: Reference the same subject from front, side, and back views. The model maintains identity through 180° camera moves without manual intervention

  • Controllable Motion Amplitude: Specify "small," "medium," or "large" movement ranges to match scene requirements, from subtle product rotations to full character actions

  • Production-Ready Formats: Output 16:9 (landscape), 9:16 (vertical social), or 1:1 (square) aspect ratios without post-production cropping

  • Optional Audio Enhancement: 4-second videos support background music generation for social media deliverables


Technical Specifications

SpecDetails
ArchitectureVidu Q2 Reference-to-Video
Input FormatsImage URLs (up to 7), text prompts (max 3000 characters)
Output FormatsMP4 video
Resolution Options360p, 520p, 720p, 1080p
LicenseCommercial use permitted via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Vidu Q1 Reference-to-Video ($0.10-$0.30 vs $0.10-$0.30) – Vidu Q2 delivers enhanced quality and control over the Q1 predecessor at identical pricing. Q1 remains available for workflows prioritizing established generation patterns over the Q2 architecture improvements.

Vidu Q2 Pro ($0.20-$0.40+ vs $0.10-$0.30) – Vidu Q2 Reference-to-Video trades maximum quality for 2x cost efficiency at base resolutions. Q2 Pro prioritizes visual fidelity and motion smoothness for high-end production deliverables.

Fabric 1.0 Image to Video – Vidu Q2 emphasizes multi-reference consistency through up to 7 input images, while Fabric 1.0 focuses on single-image animation. The multi-reference approach suits character-driven content requiring appearance stability across complex motion.

MuseTalk Image to Video – Vidu Q2 handles full-body character animation and product visualization, while MuseTalk specializes in facial animation and lip-sync for talking head content. Choose based on whether your workflow needs character motion or dialogue synchronization.