Skip to main content
Endpoint: POST https://fal.run/fal-ai/kling-video/ai-avatar/v2/pro Endpoint ID: fal-ai/kling-video/ai-avatar/v2/pro

Try it in the Playground

Run this model interactively with your own prompts.

Quick Start

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/kling-video/ai-avatar/v2/pro",
    arguments={
        "image_url": "https://storage.googleapis.com/falserverless/example_inputs/kling_ai_avatar_input.jpg",
        "audio_url": "https://v3.fal.media/files/rabbit/9_0ZG_geiWjZOmn9yscO6_output.mp3"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Input Schema

image_url
string
required
The URL of the image to use as your avatar
audio_url
string
required
The URL of the audio file.
prompt
string
default:"."
The prompt to use for the video generation. Default value: "."

Output Schema

video
File
required
The generated video
duration
float
required
Duration of the output video in seconds.

Input Example

{
  "image_url": "https://storage.googleapis.com/falserverless/example_inputs/kling_ai_avatar_input.jpg",
  "audio_url": "https://v3.fal.media/files/rabbit/9_0ZG_geiWjZOmn9yscO6_output.mp3",
  "prompt": "."
}

Output Example

{
  "video": {
    "url": "https://v3.fal.media/files/penguin/ln3x7H1p1jL0Pwo7675NI_output.mp4"
  }
}
Kuaishou’s Kling AI Avatar v2 Pro transforms static images into synchronized talking avatar videos at $0.115 per second of output. Trading simplicity for production-grade lip sync and motion quality, this premium endpoint handles realistic humans, animals, cartoons, and stylized characters without manual rigging. Built for content creators who need broadcast-quality avatar videos without the technical overhead of traditional animation pipelines. Built for: Marketing video production | Social media content | Character animation | Educational content | Podcast visualization

Audio-Driven Animation Without the Complexity

Kling AI Avatar v2 Pro uses audio-synchronized motion generation to animate any portrait or character image. Unlike traditional animation workflows that require rigging, keyframing, and manual lip sync adjustment, this model maps audio waveforms directly to facial movements and expressions. What this means for you:
  • Simple dual-input API: Upload one portrait photo (JPG, PNG, WebP, GIF, AVIF) plus one audio file (MP3, OGG, WAV, M4A, AAC) to generate synchronized avatar videos
  • Natural lip synchronization: Audio-driven facial animation matches speech patterns without manual keyframe adjustment
  • Multi-character support: Works across realistic humans, animals, cartoon styles, and stylized characters from the same endpoint
  • Production-ready output: Generate avatar videos suitable for commercial use at broadcast quality standards
  • Optional prompt refinement: Include text prompts to guide subtle aspects of the animation beyond audio synchronization

Performance That Scales

Pricing scales linearly with output duration, making cost predictable for batch production workflows.
MetricResultContext
Cost per Second$0.115Approximately 8.7 seconds of video per $1.00 on fal
Cost per Minute$6.90Predictable scaling for longer content
Standard Tier Cost$0.0562/secondKling AI Avatar v2 Standard at ~49% savings
Output DurationMatches audio lengthVideo automatically scaled to audio file duration

Technical Specifications

SpecDetails
ArchitectureKling AI Avatar v2 Pro
Image FormatsJPG, JPEG, PNG, WebP, GIF, AVIF
Audio FormatsMP3, OGG, WAV, M4A, AAC
Output FormatMP4 video with synchronized audio
Generation TypeImage-to-video with audio synchronization
LicenseCommercial use permitted (Partner)
API Documentation | Quickstart Guide

How It Stacks Up

Kling AI Avatar v2 Standard – Kling AI Avatar v2 Pro delivers enhanced facial detail and smoother lip-sync precision at 0.115/secondversusStandards0.115/second versus Standard's 0.0562/second. Choose Pro for professional productions where output quality justifies the 2x cost premium, Standard for high-volume workflows where cost efficiency matters more. Kling 2.5 Turbo Pro Image-to-Video – Kling AI Avatar v2 Pro specializes in audio-synchronized avatar animation with automatic lip sync and facial motion for talking head content. Kling 2.5 Turbo Pro handles general image-to-video animation at 0.35for5seconds(0.35 for 5 seconds (0.07/additional second) without audio synchronization, for broader motion graphics and scene animation workflows. Kling 2.1 Master Image-to-Video – Kling AI Avatar v2 Pro constrains generation around audio input for consistent character performance at 0.115/second.Kling2.1Masteremphasizesmaximumqualityandcinematicmotionat0.115/second. Kling 2.1 Master emphasizes maximum quality and cinematic motion at 1.40 for 5 seconds ($0.28/additional second) for high-fidelity general video generation without audio synchronization. Argil Avatars Audio-to-Video – Kling AI Avatar v2 Pro supports custom image input for any character style at 0.115/secondwithpremiumlipsyncquality.ArgilAvatarsusespretrainedavatartemplatesat0.115/second with premium lip-sync quality. Argil Avatars uses pre-trained avatar templates at 0.02/second for 5.75x cost savings when custom character appearance isn’t required.