Nano Banana 2 is now live! 🍌

LatentSync Video to Video

fal-ai/latentsync
LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle

Waiting for your input...

What would you like to do next?

Your request will cost $0.2 for videos up to 40 seconds. For longer videos, you will be charged $0.005 per second of output video.

Logs

Readme

LatentSync - Advanced AI Lip Sync Animation

LatentSync is a state-of-the-art video-to-video model that generates high-quality lip sync animations from audio using advanced algorithms. Perfect for applications requiring realistic synchronization between video and audio content.

Overview

LatentSync delivers professional-grade lip synchronization through an end-to-end framework based on audio-conditioned latent diffusion models. Created by ByteDance, this model excels at creating natural, smooth lip-sync effects without intermediate representations, supporting both real-life and anime character video processing.

Key Benefits

Transform your videos with LatentSync's powerful capabilities:

Realistic Synchronization

  • High-quality lip sync animations with natural mouth movements
  • Temporal consistency through TREPA (Temporal REPresentation Alignment)
  • Support for both real-life and animated characters

Developer Experience

  • Simple REST API with comprehensive SDKs
  • Straightforward video + audio input workflow
  • Detailed documentation and examples

Enterprise Ready

  • Production-grade reliability
  • Flexible pricing for videos of different lengths
  • Professional support available
Getting Started

Getting up and running with LatentSync takes just a few minutes. Here's how:

  1. Install the SDK for your platform:

JavaScript/TypeScript:

bash
npm install --save @fal-ai/client

Python:

bash
pip install fal-client
  1. Configure your credentials:
javascript
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY_HERE"
});
  1. Make your first API call:
javascript
const result = await fal.subscribe("fal-ai/latentsync", {
  input: {
    video_url: "https://example.com/your-video.mp4",
    audio_url: "https://example.com/your-audio.mp3"
  }
});

console.log(result.video.url);
Implementation Guide

LatentSync works with two primary inputs:

Video Input

  • Supported formats: MP4, MOV, WebM, M4V, GIF
  • Upload your source video containing the face/character to be synchronized

Audio Input

  • Supported formats: MP3, OGG, WAV, M4A, AAC
  • The audio file that will drive the lip synchronization
Error Handling

Always implement proper error handling:

javascript
try {
  const result = await fal.subscribe("fal-ai/latentsync", {
    input: { 
      video_url: "your-video-url",
      audio_url: "your-audio-url"
    }
  });
} catch (error) {
  console.error("Lip sync generation failed:", error.message);
  // Implement appropriate fallback behavior
}
API Parameters
  • `video_url` (required): URL of the input video
  • `audio_url` (required): URL of the audio file for lip synchronization

Additional settings can be customized through the control panel when available.

Technical Specifications

Architecture

  • End-to-end lip sync framework based on audio-conditioned latent diffusion models
  • Uses Whisper model to convert speech into audio embeddings
  • Integrates embeddings into U-Net through cross-attention layers
  • TREPA technology for enhanced temporal consistency

Performance

  • Processing time varies based on video length
  • Maintains high-resolution video quality
  • Smooth temporal consistency without frame discrepancies
Use Cases

LatentSync excels in various applications:

  • Film & Video Dubbing: Create perfect lip sync for dubbed content
  • Virtual Avatars: Animate digital characters with realistic speech
  • Gaming: Sync NPC dialogue for immersive experiences
  • Education: Create language learning content with accurate pronunciation visuals
  • Advertising: Generate lip-synced content for virtual spokespersons
Pricing and Usage

Transparent, duration-based pricing:

  • Up to 40 seconds: $0.20 per video
  • Longer videos: $0.005 per second of output video

View detailed pricing or contact sales for enterprise solutions.

Queue Management

For asynchronous processing:

javascript
// Submit request
const { request_id } = await fal.queue.submit("fal-ai/latentsync", {
  input: {
    video_url: "your-video-url",
    audio_url: "your-audio-url"
  }
});

// Check status
const status = await fal.queue.status("fal-ai/latentsync", {
  requestId: request_id
});

// Get result
const result = await fal.queue.result("fal-ai/latentsync", {
  requestId: request_id
});
Support and Resources

We're here to help you succeed with LatentSync:

About LatentSync

LatentSync represents a breakthrough in lip synchronization technology, diverging from previous diffusion-based methods by directly leveraging the capabilities of Stable Diffusion to model complex audio-visual correlations. The model is fully open source, providing researchers and developers the ability to reproduce and improve this technology.

Ready to Create Perfect Lip Sync?

Get started at fal.ai/login and start creating realistic lip sync animations today with LatentSync.