Vidu Image to Image
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Vidu Q2 | [image-to-image]
Vidu's Q2 reference-to-image model processes up to 3 reference images simultaneously for character consistency at $0.10 per generation. Trading single-image workflows for multi-reference processing, it maintains subject appearance across variations while handling flexible scene composition. Built for character design iteration, product visualization with brand consistency, and creative workflows requiring visual continuity.
Use Cases: Character Design | Product Visualization | Brand Asset Creation
Performance That Scales
At $0.10 per image (10 generations per $1.00 on fal, Vidu trades speed for multi-reference processing capability, approximately 2.5x the cost of single-image workflows but eliminating manual compositing time.
| Metric | Result | Context |
|---|---|---|
| Reference Images | Up to 3 simultaneous | Maintains subject consistency across multiple source photos |
| Prompt Length | 1,500 characters max | Extended context for detailed scene descriptions |
| Cost per Image | $0.10 | 10 generations per $1.00 on fal |
| Aspect Ratios | 16:9, 9:16, 1:1 | Native support without post-processing |
| Related Endpoints | Vidu Image to Image | Standard single-reference variant for simpler workflows |
Multi-Reference Consistency Without Manual Editing
Standard image generation models process single prompts or images, forcing designers to manually composite results or accept inconsistent outputs. Vidu's reference-to-image architecture accepts up to 3 reference images simultaneously, combining them with natural language prompts to generate new images while preserving subject characteristics.
What this means for you:
- Character consistency: Generate multiple scenes with the same character without manual editing or style transfer workflows
- Flexible aspect ratios: Output in 16:9, 9:16, or 1:1 formats directly from the API, eliminating post-generation cropping
- Prompt-driven variation: Use up to 1,500 characters to describe scene changes while maintaining reference subject appearance
- Deterministic output: Control generation results with seed parameters for reproducible iterations
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Vidu Reference-to-Image |
| Input Formats | Multiple image URLs + text prompt |
| Output Formats | PNG, WebP, JPG |
| Max Prompt Length | 1,500 characters |
| License | Commercial use via fal partnership |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Vidu Image to Image ($0.039) – Vidu [image-to-image] processes 3 reference images simultaneously for character consistency at 2.5x the cost ($0.10 vs $0.039). The standard Vidu endpoint prioritizes speed and cost efficiency for single-reference workflows where multi-image consistency isn't required.
FASHN Virtual Try-On V1.5 ($0.05) – Vidu [image-to-image] handles general creative workflows with flexible subject types and scene composition. FASHN specializes in garment visualization on human models with precise clothing fit and drape simulation for e-commerce applications at half the cost.



