Kling O1 Reference Image to Video [Pro] Image to Video
Input
Type @ to reference relevant media.
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Reference as @Element1 in your prompt
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Reference as @Element2 in your prompt
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Logs
Kling O1: Reference Image to Video [image-to-video]
Kuaishou Technology's Kling O1 Reference transforms static images into consistent video sequences at $0.112 per second, supporting up to 7 simultaneous reference inputs. Trading single-input simplicity for multi-element consistency, it maintains stable character and object identity across complex compositions through specialized reference conditioning. Built for narrative sequences requiring character continuity, product demonstrations with consistent styling, and complex scene transitions where object identity must persist frame-to-frame.
Built for: Multi-character storytelling | Brand-consistent product videos | Complex scene transitions with stable elements
Multi-Reference Architecture for Consistent Generation
Kling O1 Reference uses a specialized reference-conditioning system that processes frontal images and multiple reference angles per element, then maintains their identity throughout generated video sequences. Unlike standard image-to-video models that treat input as a single keyframe, this architecture tracks multiple elements independently while preserving their visual characteristics across camera movements and scene changes.
What this means for you:
- Up to 7 simultaneous inputs: Combine tracked elements (characters/objects with frontal + reference angles), style reference images, and an optional start frame in a single generation. Reference them in prompts as @Element1, @Element2 for tracked objects or @Image1, @Image2 for style references and start frames.
- Element-level consistency: Each tracked element supports one frontal image plus multiple reference angles, ensuring characters and objects maintain identity through complex camera movements and transitions
- Flexible duration control: Generate 5-second ($0.56) or 10-second ($1.12) sequences at 16:9, 9:16, or 1:1 aspect ratios for platform-specific content optimization
- Prompt-driven scene control: Direct camera movements, lighting, and transitions through natural language while the model maintains element consistency. Specify "Take @Image1 as the start frame" to control the video's opening frame.
Example input structure: 2 tracked elements (character + object) + 2 style references + 1 start frame = 5 of 7 available inputs
Performance Scaling
Kling O1 Reference prioritizes multi-element consistency over generation speed, with pricing scaled to video duration.
| Metric | Result | Context |
|---|---|---|
| Cost per Video | $0.56 (5s) or $1.12 (10s) | Based on $0.112 per second rate |
| Duration Options | 5s or 10s | Fixed durations for consistent output quality |
| Maximum Inputs | 7 total | Combined: elements + reference images + start frame |
| Aspect Ratios | 16:9, 9:16, 1:1 | Platform-optimized formats for social, web, and mobile |
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Kling O1 Reference Image to Video |
| Input Formats | Image URLs (JPG, JPEG, PNG, WebP, GIF, AVIF) + text prompts |
| Output Formats | MP4 video |
| Video Duration | 5 or 10 seconds |
| Aspect Ratios | 16:9, 9:16, 1:1 |
| Input Types | Elements (frontal + reference angles, tracked), Reference Images (style/appearance guides), Start Frame (optional first frame) |
| Prompt Syntax | @Element1, @Element2 for tracked objects; @Image1, @Image2 for references/start frame |
| License | Commercial use (Partner) |
How It Stacks Up
Kling Video Image to Video (v2.5-turbo) - Kling O1 Reference trades single-input simplicity for multi-element consistency, making it ideal for narratives requiring multiple characters or objects with stable identity across camera movements. Kling Video v2.5-turbo prioritizes single-image animation for straightforward transformations where element tracking isn't required.
Kling 1.6 Image to Video - Kling O1 Reference offers advanced element-level control through its reference system for complex multi-character scenes. Kling 1.6 provides established performance for standard image-to-video workflows without multi-reference capabilities.




