Wan 2.6 delivers enhanced multi-modal video generation capabilities with three distinct pathways optimized for different use cases, from text-to-video creation to reference-based character consistency.
What Changed in Wan 2.6
The Wan 2.6 model on fal introduces multi-modal video generation with three specialized endpoints: text-to-video (T2V), reference-to-video (R2V), and image-to-video (I2V). Each pathway handles distinct production requirements, from generating videos from text descriptions to maintaining character consistency across multiple shots. The model supports resolutions up to 1080p, durations from 5 to 15 seconds depending on the mode, and native audio integration.
This guide covers implementation patterns, API specifications, and production considerations for developers building video generation into applications.
Core Capabilities and Constraints
Wan 2.6 provides three generation pathways with different technical specifications:
Text-to-Video (T2V)
Creates video from text prompts with the following parameters:
- Resolutions: 720p, 1080p (no 480p support)
- Aspect Ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Duration: 5, 10, or 15 seconds
- Multi-shot: Intelligent scene segmentation for narrative content
- Audio: Optional background audio integration
T2V works for storyboarding, concept visualization, and creating video content from scratch. The multi-shot capability segments longer prompts into coherent scenes rather than generating a single continuous shot.
Reference-to-Video (R2V)
Maintains subject consistency from reference videos:
- Resolutions: 720p, 1080p
- Aspect Ratios: 16:9, 9:16, 1:1, 4:3, 3:4
- Duration: 5 or 10 seconds (15 seconds not supported)
- References: 1 to 3 videos, tagged as @Video1, @Video2, @Video3
R2V preserves visual characteristics of people, animals, or objects across generated videos. This enables character continuity in multi-video projects but requires clean, well-lit reference footage.
Image-to-Video (I2V)
Animates static images into video sequences:
- Resolutions: 480p, 720p, 1080p
- Duration: 5, 10, or 15 seconds
- Image Requirements: 360px minimum dimensions, 2000px maximum, 100MB file size limit
- Audio: Optional background audio
- Multi-shot: Defaults to false (single continuous shot)[^3]
I2V animates product showcases, illustrations, or existing imagery. Source image quality directly impacts output fidelity; higher resolution source images recommended for production use.
falMODEL APIs
The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models
API Setup and Authentication
Install the fal client library:
pip install fal-client
Initialize with your API key:
import fal_client
fal = fal_client.FalClient(api_key="your_api_key")
All three endpoints require authentication via the fal API key in request headers. See the fal.ai documentation for complete authentication details.
Text-to-Video Implementation
Basic T2V request structure:
result = fal.subscribe("wan/v2.6/text-to-video", {
"prompt": "A serene mountain landscape with flowing rivers and changing seasons",
"duration": "10",
"resolution": "1080p",
"aspect_ratio": "16:9"
})
video_url = result["video"]["url"]
Prompt Structure for Multi-Shot Videos
For coherent multi-shot sequences, use temporal markers:
prompt = """
Overall scene description.
Shot 1 [0-3s] First scene details, camera movement, lighting.
Shot 2 [3-6s] Second scene details, transition elements.
Shot 3 [6-10s] Final scene details, resolution.
"""
Include visual specifics: lighting conditions, camera angles, movement direction, and style references. The enable_prompt_expansion parameter uses an LLM to enhance short prompts, adding context and detail automatically.
Reference-to-Video for Character Consistency
R2V maintains subject identity across generated videos:
result = fal.subscribe("wan/v2.6/reference-to-video", {
"prompt": "Dance battle between @Video1 and @Video2 in a futuristic neon cityscape",
"video_urls": [
"https://example.com/reference_video1.mp4",
"https://example.com/reference_video2.mp4"
],
"duration": "10",
"resolution": "1080p"
})
R2V Requirements
- Reference videos must be publicly accessible URLs
- Subjects should be clearly visible and well-lit
- Use 1 to 3 references maximum
- Tag references consistently in prompts (@Video1, @Video2, @Video3)
The model extracts subject characteristics from reference videos and applies them to the generated content. Performance degrades with poor lighting, occlusion, or low-resolution references.
Image-to-Video Animation
The Wan 2.6 image-to-video endpoint animates static images:
result = fal.subscribe("wan/v2.6/image-to-video", {
"prompt": "The car drives along a coastal highway at sunset",
"image_url": "https://example.com/car_image.jpg",
"duration": "5",
"resolution": "1080p"
})
I2V Optimization
- Start with high-resolution source images (minimum 360px, maximum 2000px per dimension)
- Describe physically plausible motion for the image content
- Consider composition; leave space for motion paths
- Multi-shot defaults to false; enable explicitly if needed
I2V performs best when the prompt describes motion that naturally extends from the static image rather than introducing entirely new elements.
Advanced Configuration
Audio Integration
Add background audio to generated videos:
result = fal.subscribe("wan/v2.6/text-to-video", {
"prompt": "Orchestra performing in a grand concert hall",
"audio_url": "https://example.com/orchestral_music.mp3",
"duration": "15"
})
Audio handling works as follows:
- If audio exceeds video duration, it's truncated
- If audio is shorter, remaining video is silent
- Supported formats: WAV, MP3
- Duration: 3 to 30 seconds
- File size: up to 15MB
Prompt Expansion
The enable_prompt_expansion parameter enhances prompts via LLM:
result = fal.subscribe("wan/v2.6/text-to-video", {
"prompt": "Car racing through desert",
"enable_prompt_expansion": True
})
expanded_prompt = result["actual_prompt"]
This adds cinematographic detail and context. Increases processing time slightly but improves output quality for brief prompts.
Reproducible Generation
Specify a seed for consistent outputs:
result = fal.subscribe("wan/v2.6/text-to-video", {
"prompt": "Futuristic cityscape with flying vehicles",
"seed": 42
})
Identical parameters with the same seed produce deterministic results, useful for A/B testing or iterative refinement.
Production Patterns
Error Handling
Implement error handling for API failures:
try:
result = fal.subscribe("wan/v2.6/text-to-video", {...})
except fal_client.ServerError as e:
# Handle 5xx errors
print(f"Server error: {e}")
except fal_client.ClientError as e:
# Handle 4xx errors
print(f"Client error: {e}")
Common failure modes include content moderation rejections, invalid parameters, and timeout errors for complex generations.
Progress Monitoring
Track generation progress for longer videos using the Queue API:
def on_progress(update):
if isinstance(update, fal_client.InProgress):
print(f"Processing: {update.progress}%")
result = fal.subscribe(
"wan/v2.6/text-to-video",
{...},
on_update=on_progress
)
Generation times vary by resolution, duration, and current system load.
Scaling Considerations
fal infrastructure provides sub-second cold start times, automatic concurrent request scaling, and multi-region availability1. This handles traffic spikes without manual intervention.
Troubleshooting
Content Moderation
If generation fails due to content moderation:
- Review prompt for potentially unsafe content
- Verify
enable_safety_checkeris set appropriately (defaults to true) - Use more specific language that clearly defines intended output
Performance Optimization
To reduce generation time:
- Use 720p instead of 1080p for faster processing
- Start with 5-second durations and extend as needed
- Run concurrent API calls for multiple variations rather than sequential requests
Capability Comparison Table
| Feature | T2V | R2V | I2V |
|---|---|---|---|
| 480p Support | No | No | Yes |
| 720p Support | Yes | Yes | Yes |
| 1080p Support | Yes | Yes | Yes |
| 15s Duration | Yes | No | Yes |
| Multi-shot Default | True | True | False |
| Reference Videos | No | 1-3 | No |
| Source Image | No | No | Required |
Implementation Priorities
For production deployment:
- Start with T2V for prototyping and understanding prompt patterns
- Add R2V if character consistency across videos is required
- Integrate I2V for animating existing assets or product imagery
- Implement error handling before scaling beyond development
- Monitor generation patterns by resolution and duration choices
Wan 2.6 on fal provides the infrastructure for production video generation with minimal development overhead. The three pathways address distinct use cases; choosing the appropriate mode for each application requirement determines both output quality and operational costs.
Recently Added
References
-
fal.ai. "What's the Best Way to Test a Generative AI Model?" fal.ai/learn, 2025. https://fal.ai/learn/devs/best-way-to-test-generative-ai-models ↩


















![FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a868a0f%2FzL7LNUIqnPPhZNy_PtHJq_330f66115240460788092cb9523b6aba.jpg&w=3840&q=75)
![FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8689a8%2Fbbcmo6U5xg_RxDXijtxNA_55df705e1b1b4535a90bccd70887680e.jpg&w=3840&q=75)



