fal-ai/kling-video/o1/video-to-video/reference

Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Inference

Commercial use

Partner

Schema

LLMs

Playground API

Input

Prompt*

Type # to reference inputs.

Video Url*

Video 1

Reference as @Video1 in your prompt

Hint: Drag and drop video files from your computer, video from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp4, mov, webm, m4v, gif

Image Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

Elements

Element 1

Reference as @Element1 in your prompt

Frontal Image Url*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Reference Image Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

2 images added

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

Your request will cost $0.168 per second.

Logs

Kling O1: Reference Video to Video [video generation]

Kuaishou's Kling O1 Omni delivers reference-guided video generation at $0.168 per second, preserving cinematic language across scene transitions. Trading general-purpose video generation for shot continuity control, the model maintains motion patterns and camera style from source footage. Built for filmmakers extending existing sequences, content creators maintaining visual consistency across cuts, and production teams generating coherent multi-shot narratives.

Built for: Sequential shot generation | Cinematic style preservation | Reference-guided video continuity

Reference-Driven Continuity Architecture

Kling O1 Omni approaches video-to-video generation through reference preservation rather than style transfer alone. Where standard video generation models interpret prompts independently, this architecture anchors new frames to input video characteristics. Camera movement, motion dynamics, and visual language remain consistent across generated shots.

What this means for you:

Cinematic continuity: Generate follow-up shots that maintain the motion style and camera language of your reference video, not just visual aesthetics. Use @Video1 in prompts to reference your input video: "Based on @Video1, generate the next shot. keep the style of the video"
Multi-modal input: Combine reference video with up to 4 additional elements (character images with frontal_image_url + reference_image_urls arrays) and style reference images for precise control over scene composition
Flexible duration: Output 5-second ($0.84) or 10-second ($1.68) clips at resolutions from 720px to 2160px. Output duration is independent of input duration (3-10 seconds)
Automatic aspect ratio: Default "auto" mode detects optimal aspect ratio from input video, or manually select 16:9, 9:16, or 1:1
Audio preservation: Option to retain original audio from reference video via keep_audio parameter, maintaining soundtrack continuity across generated sequences

Performance That Scales

Kling O1 Omni positions as a specialized tool for narrative video work, trading speed for reference-guided precision.

Metric	Result	Context
Cost per Video	$0.84 (5s) or $1.68 (10s)	Based on $0.168 per second rate
Duration Options	5-10 seconds output	Independent of 3-10 second input duration
Resolution Range	720-2160px	Supports HD to 4K output with aspect ratio control
Reference Capacity	Video + 4 elements/images	Combine source video with character/object references

Technical Specifications

Spec	Details
Architecture	Kling O1 Omni
Input Formats	Video: .mp4, .mov (3-10 seconds, max 200MB); Reference images: .jpg, .jpeg, .png, .webp, .gif, .avif
Output Formats	.mp4 video with optional audio
Aspect Ratios	auto (default), 16:9, 9:16, 1:1
Audio Handling	Optional audio preservation via keep_audio parameter (default: false)
Reference System	@Video1 for reference video, @Element1-4 for characters/objects (frontal + reference angles), @Image1-4 for style references
License	Commercial use via fal partner agreement

API Documentation

How It Stacks Up

Sora 2 Video to Video - Kling O1 Omni prioritizes reference-based continuity for maintaining cinematic language across shots, ideal for sequential narrative work where motion style consistency matters. Sora 2 Video to Video emphasizes remix capabilities with broader creative transformation options for exploratory video generation workflows.

Wan Video to Video - Kling O1 Omni's multi-element reference system (video + 4 additional inputs) enables precise character and object control for narrative sequences. Wan Video to Video offers alternative approaches to video transformation with different architectural priorities for style and motion handling.

AnimateDiff Video to Video - Kling O1 Omni maintains camera language and motion dynamics from reference footage for professional continuity requirements. AnimateDiff Video to Video provides animation-focused video generation with distinct motion synthesis capabilities suited for stylized content creation.