POST https://fal.run/fal-ai/kling-video/o1/reference-to-video
Endpoint ID: fal-ai/kling-video/o1/reference-to-video
Try it in the Playground
Run this model interactively with your own prompts.
Quick Start
Multi-Reference Architecture for Consistent Generation
Kling O1 Reference uses a specialized reference-conditioning system that processes frontal images and multiple reference angles per element, then maintains their identity throughout generated video sequences. Unlike standard image-to-video models that treat input as a single keyframe, this architecture tracks multiple elements independently while preserving their visual characteristics across camera movements and scene changes. What this means for you:- Up to 7 simultaneous inputs: Combine tracked elements (characters/objects with frontal + reference angles), style reference images, and an optional start frame in a single generation. Reference them in prompts as @Element1, @Element2 for tracked objects or @Image1, @Image2 for style references and start frames.
- Element-level consistency: Each tracked element supports one frontal image plus multiple reference angles, ensuring characters and objects maintain identity through complex camera movements and transitions
- Flexible duration control: Generate 5-second (1.12) sequences at 16:9, 9:16, or 1:1 aspect ratios for platform-specific content optimization
- Prompt-driven scene control: Direct camera movements, lighting, and transitions through natural language while the model maintains element consistency. Specify “Take @Image1 as the start frame” to control the video’s opening frame.
Performance Scaling
Kling O1 Reference prioritizes multi-element consistency over generation speed, with pricing scaled to video duration.| Metric | Result | Context |
|---|---|---|
| Cost per Video | 1.12 (10s) | Based on $0.112 per second rate |
| Duration Options | 5s or 10s | Fixed durations for consistent output quality |
| Maximum Inputs | 7 total | Combined: elements + reference images + start frame |
| Aspect Ratios | 16:9, 9:16, 1:1 | Platform-optimized formats for social, web, and mobile |
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Kling O1 Reference Image to Video |
| Input Formats | Image URLs (JPG, JPEG, PNG, WebP, GIF, AVIF) + text prompts |
| Output Formats | MP4 video |
| Video Duration | 5 or 10 seconds |
| Aspect Ratios | 16:9, 9:16, 1:1 |
| Input Types | Elements (frontal + reference angles, tracked), Reference Images (style/appearance guides), Start Frame (optional first frame) |
| Prompt Syntax | @Element1, @Element2 for tracked objects; @Image1, @Image2 for references/start frame |
| License | Commercial use (Partner) |
How It Stacks Up
Kling Video Image to Video (v2.5-turbo) - Kling O1 Reference trades single-input simplicity for multi-element consistency, making it ideal for narratives requiring multiple characters or objects with stable identity across camera movements. Kling Video v2.5-turbo prioritizes single-image animation for straightforward transformations where element tracking isn’t required. Kling 1.6 Image to Video - Kling O1 Reference offers advanced element-level control through its reference system for complex multi-character scenes. Kling 1.6 provides established performance for standard image-to-video workflows without multi-reference capabilities.Related
- Kling O1 First Frame Last Frame to Video [Pro] — Video Generation
- Kling O1 Edit Video [Pro] — Video Generation
- Kling O1 Reference Video to Video [Pro] — Video Generation
- Kling O1 Reference Image to Video [Standard] — Video Generation
Capabilities
- Text prompt input
- Duration control
- Aspect ratio control
API Reference
Input Schema
Take @Element1, @Element2 to reference elements and @Image1, @Image2 to reference images in order.
Additional reference images for style/appearance. Reference in prompt as @Image1, @Image2, etc. Maximum 7 total (elements + reference images + start image).
Elements (characters/objects) to include in the video. Reference in prompt as @Element1, @Element2, etc. Maximum 7 total (elements + reference images + start image).
Video duration in seconds. Default value:
"5"Possible values: 3, 4, 5, 6, 7, 8, 9, 10The aspect ratio of the generated video frame. Default value:
"16:9"Possible values: 16:9, 9:16, 1:1Output Schema
The generated video.
Input Example
Output Example
Limitations
durationrestricted to:3,4,5,6,7,8,9,10aspect_ratiorestricted to:16:9,9:16,1:1