fal-ai/vidu/q2/reference-to-video
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Result
What would you like to do next?
The model will cost 0.1 $ for every 360p video, 0.2 $ for every 520p video, 0.3 $ for 720 p video. For 1080 p the cost would be 0.2 $ along with 0.1 $ for every video second.
Logs
Vidu Q2 | [image-to-video]
Vidu Q2 delivers reference-consistent video generation from up to 7 input images at $0.10-$0.30 per video. Trading single-image generation for multi-reference character consistency, the model maintains subject appearance across complex motion sequences. Built for creators who need character continuity without manual keyframing.
Use Cases: Character Animation | Product Demos | Marketing Content
Performance
At $0.10 per 360p video (increasing with resolution and duration), Vidu Q2 positions as a cost-effective multi-reference video generator, 10x more economical than frame-by-frame editing workflows.
| Metric | Result | Context |
|---|---|---|
| Reference Images | Up to 7 images | Multi-angle subject consistency vs single-image competitors |
| Resolution Range | 360p to 1080p | Variable pricing: $0.10 (360p), $0.20 (520p), $0.30 (720p), $0.20 + $0.10/sec (1080p) |
| Duration Options | 1-8 seconds | Configurable in 1-second increments |
| Cost per Video | $0.10-$0.30+ | Base cost varies by resolution; 1080p adds $0.10 per second |
| Aspect Ratios | 16:9, 9:16, 1:1 | Social-optimized formats included |
| Related Endpoints | Vidu Q1 Reference, Vidu Q2 Pro | Q1 predecessor and Pro variant for enhanced quality |
Multi-Reference Subject Consistency
Unlike single-image video models, Vidu Q2 processes up to 7 reference images simultaneously to maintain character or product appearance across generated motion. The architecture extracts consistent visual features from multiple angles before synthesis.
What this means for you:
-
Character Continuity: Reference the same subject from front, side, and back views. The model maintains identity through 180° camera moves without manual intervention
-
Controllable Motion Amplitude: Specify "small," "medium," or "large" movement ranges to match scene requirements, from subtle product rotations to full character actions
-
Production-Ready Formats: Output 16:9 (landscape), 9:16 (vertical social), or 1:1 (square) aspect ratios without post-production cropping
-
Optional Audio Enhancement: 4-second videos support background music generation for social media deliverables
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Vidu Q2 Reference-to-Video |
| Input Formats | Image URLs (up to 7), text prompts (max 3000 characters) |
| Output Formats | MP4 video |
| Resolution Options | 360p, 520p, 720p, 1080p |
| License | Commercial use permitted via fal partnership |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Vidu Q1 Reference-to-Video ($0.10-$0.30 vs $0.10-$0.30) – Vidu Q2 delivers enhanced quality and control over the Q1 predecessor at identical pricing. Q1 remains available for workflows prioritizing established generation patterns over the Q2 architecture improvements.
Vidu Q2 Pro ($0.20-$0.40+ vs $0.10-$0.30) – Vidu Q2 Reference-to-Video trades maximum quality for 2x cost efficiency at base resolutions. Q2 Pro prioritizes visual fidelity and motion smoothness for high-end production deliverables.
Fabric 1.0 Image to Video – Vidu Q2 emphasizes multi-reference consistency through up to 7 input images, while Fabric 1.0 focuses on single-image animation. The multi-reference approach suits character-driven content requiring appearance stability across complex motion.
MuseTalk Image to Video – Vidu Q2 handles full-body character animation and product visualization, while MuseTalk specializes in facial animation and lip-sync for talking head content. Choose based on whether your workflow needs character motion or dialogue synchronization.


