fal-ai/wan/v2.2-14b/speech-to-video

Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications
Inference
Commercial use

Input

Type # to reference inputs.

Additional Settings

Customize your input with more control.

Result

Idle
This generation takes approximately 5m.

What would you like to do next?

Your request will cost $0.20 per video second for 720p, $0.15 per video second for 580p, $0.10 per video second for 480p. Video seconds are calculated at 16 frames per second.

Logs