Vidu Image to Image

fal-ai/vidu/q2/reference-to-image
Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.1 per image.

Logs

Vidu Q2 | [image-to-image]

Vidu's Q2 reference-to-image model processes up to 3 reference images simultaneously for character consistency at $0.10 per generation. Trading single-image workflows for multi-reference processing, it maintains subject appearance across variations while handling flexible scene composition. Built for character design iteration, product visualization with brand consistency, and creative workflows requiring visual continuity.

Use Cases: Character Design | Product Visualization | Brand Asset Creation


Performance That Scales

At $0.10 per image (10 generations per $1.00 on fal, Vidu trades speed for multi-reference processing capability, approximately 2.5x the cost of single-image workflows but eliminating manual compositing time.

MetricResultContext
Reference ImagesUp to 3 simultaneousMaintains subject consistency across multiple source photos
Prompt Length1,500 characters maxExtended context for detailed scene descriptions
Cost per Image$0.1010 generations per $1.00 on fal
Aspect Ratios16:9, 9:16, 1:1Native support without post-processing
Related EndpointsVidu Image to ImageStandard single-reference variant for simpler workflows

Multi-Reference Consistency Without Manual Editing

Standard image generation models process single prompts or images, forcing designers to manually composite results or accept inconsistent outputs. Vidu's reference-to-image architecture accepts up to 3 reference images simultaneously, combining them with natural language prompts to generate new images while preserving subject characteristics.

What this means for you:

  • Character consistency: Generate multiple scenes with the same character without manual editing or style transfer workflows
  • Flexible aspect ratios: Output in 16:9, 9:16, or 1:1 formats directly from the API, eliminating post-generation cropping
  • Prompt-driven variation: Use up to 1,500 characters to describe scene changes while maintaining reference subject appearance
  • Deterministic output: Control generation results with seed parameters for reproducible iterations

Technical Specifications

SpecDetails
ArchitectureVidu Reference-to-Image
Input FormatsMultiple image URLs + text prompt
Output FormatsPNG, WebP, JPG
Max Prompt Length1,500 characters
LicenseCommercial use via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Vidu Image to Image ($0.039) – Vidu [image-to-image] processes 3 reference images simultaneously for character consistency at 2.5x the cost ($0.10 vs $0.039). The standard Vidu endpoint prioritizes speed and cost efficiency for single-reference workflows where multi-image consistency isn't required.

FASHN Virtual Try-On V1.5 ($0.05) – Vidu [image-to-image] handles general creative workflows with flexible subject types and scene composition. FASHN specializes in garment visualization on human models with precise clothing fit and drape simulation for e-commerce applications at half the cost.