Kling O1 Reference Image to Video [Pro] Image to Video

fal-ai/kling-video/o1/reference-to-video
Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
Inference
Commercial use
Partner

Input

Type @ to reference relevant media.

Element 1

Reference as @Element1 in your prompt

Element 2

Reference as @Element2 in your prompt

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.112 per second.

Logs

Kling O1: Reference Image to Video [image-to-video]

Kuaishou Technology's Kling O1 Reference transforms static images into consistent video sequences at $0.112 per second, supporting up to 7 simultaneous reference inputs. Trading single-input simplicity for multi-element consistency, it maintains stable character and object identity across complex compositions through specialized reference conditioning. Built for narrative sequences requiring character continuity, product demonstrations with consistent styling, and complex scene transitions where object identity must persist frame-to-frame.

Built for: Multi-character storytelling | Brand-consistent product videos | Complex scene transitions with stable elements


Multi-Reference Architecture for Consistent Generation

Kling O1 Reference uses a specialized reference-conditioning system that processes frontal images and multiple reference angles per element, then maintains their identity throughout generated video sequences. Unlike standard image-to-video models that treat input as a single keyframe, this architecture tracks multiple elements independently while preserving their visual characteristics across camera movements and scene changes.

What this means for you:

  • Up to 7 simultaneous inputs: Combine tracked elements (characters/objects with frontal + reference angles), style reference images, and an optional start frame in a single generation. Reference them in prompts as @Element1, @Element2 for tracked objects or @Image1, @Image2 for style references and start frames.
  • Element-level consistency: Each tracked element supports one frontal image plus multiple reference angles, ensuring characters and objects maintain identity through complex camera movements and transitions
  • Flexible duration control: Generate 5-second ($0.56) or 10-second ($1.12) sequences at 16:9, 9:16, or 1:1 aspect ratios for platform-specific content optimization
  • Prompt-driven scene control: Direct camera movements, lighting, and transitions through natural language while the model maintains element consistency. Specify "Take @Image1 as the start frame" to control the video's opening frame.

Example input structure: 2 tracked elements (character + object) + 2 style references + 1 start frame = 5 of 7 available inputs


Performance Scaling

Kling O1 Reference prioritizes multi-element consistency over generation speed, with pricing scaled to video duration.

MetricResultContext
Cost per Video$0.56 (5s) or $1.12 (10s)Based on $0.112 per second rate
Duration Options5s or 10sFixed durations for consistent output quality
Maximum Inputs7 totalCombined: elements + reference images + start frame
Aspect Ratios16:9, 9:16, 1:1Platform-optimized formats for social, web, and mobile

Technical Specifications

SpecDetails
ArchitectureKling O1 Reference Image to Video
Input FormatsImage URLs (JPG, JPEG, PNG, WebP, GIF, AVIF) + text prompts
Output FormatsMP4 video
Video Duration5 or 10 seconds
Aspect Ratios16:9, 9:16, 1:1
Input TypesElements (frontal + reference angles, tracked), Reference Images (style/appearance guides), Start Frame (optional first frame)
Prompt Syntax@Element1, @Element2 for tracked objects; @Image1, @Image2 for references/start frame
LicenseCommercial use (Partner)

API Documentation


How It Stacks Up

Kling Video Image to Video (v2.5-turbo) - Kling O1 Reference trades single-input simplicity for multi-element consistency, making it ideal for narratives requiring multiple characters or objects with stable identity across camera movements. Kling Video v2.5-turbo prioritizes single-image animation for straightforward transformations where element tracking isn't required.

Kling 1.6 Image to Video - Kling O1 Reference offers advanced element-level control through its reference system for complex multi-character scenes. Kling 1.6 provides established performance for standard image-to-video workflows without multi-reference capabilities.