- Image To Video
- Text To Video
- Motion Control
Endpoint:
POST https://fal.run/fal-ai/kling-video/v2.6/pro/image-to-video
Endpoint ID: fal-ai/kling-video/v2.6/pro/image-to-videoTry it in the Playground
Run this model interactively with your own prompts.
Quick Start
Input Schema
URL of the image to be used for the video
The duration of the generated video in seconds Default value:
"5"Possible values: 5, 10Default value:
"blur, distort, and low quality"Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase. Default value:
trueOptional Voice IDs for video generation. Reference voices in your prompt with <<<voice_1>>> and <<<voice_2>>> (maximum 2 voices per task). Get voice IDs from the kling video create-voice endpoint: https://fal.ai/models/fal-ai/kling-video/create-voice
URL of the image to be used for the end of the video
Output Schema
The generated video
Input Example
Output Example
Native Audio Generation Meets Fluid Motion
Kling 2.6 Pro’s architecture integrates speech synthesis directly into the video generation pipeline, supporting Chinese and English voice output with automatic translation for other languages. This contrasts with standard image-to-video models that require separate audio workflows and manual synchronization. What this means for you:- Synchronized audio-visual output: Generate videos with native speech that matches lip movements and scene timing, eliminating post-production audio alignment work
- Flexible duration control: Choose between 5-second or 10-second outputs based on content requirements and budget constraints
- Single-image animation: Transform static images into fluid video sequences with cinematic motion quality and scene continuity
- Prompt-driven speech: Embed dialogue directly in prompts (e.g., “A king walks slowly and says ‘My people, here I am!’”) for automatic voice generation with proper capitalization handling for English pronunciation
Performance That Scales
Kling 2.6 Pro prioritizes output quality and audio integration over generation speed, positioning as a production-focused solution rather than rapid iteration tool.| Metric | Result | Context |
|---|---|---|
| Duration Options | 5s or 10s | Configurable via API parameter |
| Cost per Second | 0.14 (with audio) | 5s video with audio = $0.70 total |
| Audio Languages | Chinese, English (native) + auto-translation | Uppercase for acronyms/proper nouns in English |
| Input Format | Single image URL | Accepts jpg, jpeg, png, webp, gif, avif |
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Kling 2.6 Pro |
| Input Formats | Image URL (jpg, jpeg, png, webp, gif, avif) |
| Output Formats | MP4 video with optional audio track |
| Duration Control | 5 or 10 seconds (configurable) |
| License | Commercial use permitted (Partner) |
How It Stacks Up
Kling Video Image to Video (v2.5-turbo) - Kling 2.6 Pro trades generation speed for native audio synthesis and enhanced motion quality, making it ideal for production workflows requiring integrated speech output. The v2.5-turbo variant prioritizes faster iteration cycles for teams testing concepts without audio requirements. Kling 1.6 Image to Video - Kling 2.6 Pro offers native audio generation and refined motion fidelity compared to the 1.6 baseline, positioning as the premium tier for broadcast-quality output. Version 1.6 remains viable for projects where audio integration isn’t critical. Kling 2.0 Master Image to Video - Kling 2.6 Pro extends the 2.0 architecture with improved speech synthesis capabilities and motion coherence. The 2.0 Master variant serves workflows requiring the previous generation’s specific characteristics or pricing structure. Kling 2.1 (standard) Image to Video - Kling 2.6 Pro delivers enhanced audio quality and cinematic motion compared to the 2.1 standard tier. The 2.1 standard remains cost-effective for projects where Pro-level audio fidelity isn’t essential.Related
- Kling Video v2.6 Motion Control [Standard] — Video Generation
- Kling Video v2.6 Text to Video — Video Generation
- Kling Video v2.6 Image to Video — Video Generation
Limitations
durationrestricted to:5,10aspect_ratiorestricted to:16:9,9:16,1:1cfg_scalerange: 0 to 1character_orientationrestricted to:image,video