sync.so -- lipsync 1.9.0-beta Video to Video

fal-ai/sync-lipsync
Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.7 per minute.

Logs

lipsync 1.9.0 (beta) | [video-to-video]

Sync.so's lipsync 1.9.0-beta delivers realistic lip synchronization at $0.70 per minute of video processed. Trading broad audio format flexibility for precision facial animation, this model handles duration mismatches through five sync modes: cut_off, loop, bounce, silence, and remap, solving the common problem of audio-video length conflicts that break most automated lipsync workflows.

Use Cases: Video Dubbing & Localization | Content Creator Audio Replacement | Marketing Video Personalization


Performance

At $0.70 per minute, lipsync 1.9.0-beta positions as a specialized video-to-video solution with flexible duration handling that prevents the audio cutoff issues common in fixed-length processors.

MetricResultContext
Sync Modes5 duration handling optionsCut_off, loop, bounce, silence, remap address audio/video length mismatches
Cost per Minute$0.701.43 minutes per $1.00 on fal
Input FormatsVideo: MP4, MOV, WebM, M4V, GIF / Audio: MP3, OGG, WAV, M4A, AACBroad format support for production workflows
Model Versions3 available (1.7.1, 1.8.0, 1.9.0-beta)Beta version with latest algorithm improvements
Related EndpointsSync Lipsync 2.0, Sync Lipsync ProNext-gen and performance-optimized variants

Duration-Flexible Lipsync Processing

Unlike standard lipsync tools that fail when audio and video lengths don't match, lipsync 1.9.0-beta implements five distinct sync modes that adapt to duration conflicts. The cut_off mode truncates excess content, loop repeats video to match longer audio, bounce creates palindrome playback, silence extends with frozen frames, and remap time-stretches the video, each solving different production scenarios where rigid length requirements would otherwise require manual pre-editing.

What this means for you:

  • No pre-processing required: Submit mismatched audio-video pairs directly through the fal API. The model handles duration conflicts through your chosen sync mode rather than rejecting inputs or producing broken output

  • Production format flexibility: Accepts MP4, MOV, WebM, M4V, GIF for video and MP3, OGG, WAV, M4A, AAC for audio, eliminating format conversion steps in multi-source workflows

  • Version control for quality: Access to three model versions (1.7.1, 1.8.0, 1.9.0-beta) lets you balance stability versus latest algorithm improvements based on project requirements

  • Predictable per-minute pricing: $0.70 per minute of video processed provides straightforward cost estimation for batch dubbing or localization projects regardless of audio length


Technical Specifications

SpecDetails
ArchitectureSync.so lipsync 1.9.0-beta
Input FormatsVideo: MP4, MOV, WebM, M4V, GIF / Audio: MP3, OGG, WAV, M4A, AAC
Output FormatsMP4 video with synchronized lip animation
Sync Modescut_off, loop, bounce, silence, remap
LicenseCommercial use via fal partnership

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Sync Lipsync 2.0 Video to Video – lipsync 1.9.0-beta trades next-generation algorithm refinements for established beta-tested performance at comparable pricing. Sync Lipsync 2.0 represents the production-ready evolution with enhanced facial tracking for workflows requiring the latest improvements.

Sync Lipsync Pro – lipsync 1.9.0-beta prioritizes flexible duration handling through five sync modes for mismatched content. Sync Lipsync Pro emphasizes processing speed and throughput optimization for high-volume batch operations where audio-video lengths are pre-aligned.

MiniMax Video 01 Live – lipsync 1.9.0-beta focuses on audio-driven facial animation for existing video content. MiniMax Video 01 Live generates complete video from text prompts, serving text-to-video creation rather than audio synchronization workflows.