Kling O1 Reference Video to Video [Pro] Video to Video

fal-ai/kling-video/o1/video-to-video/reference
Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
Inference
Commercial use
Partner

Input

Type @ to reference relevant media.

Video 1

Reference as @Video1 in your prompt

Element 1

Reference as @Element1 in your prompt

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.168 per second.

Logs

Kling O1: Reference Video to Video [video generation]

Kuaishou's Kling O1 Omni delivers reference-guided video generation at $0.168 per second, preserving cinematic language across scene transitions. Trading general-purpose video generation for shot continuity control, the model maintains motion patterns and camera style from source footage. Built for filmmakers extending existing sequences, content creators maintaining visual consistency across cuts, and production teams generating coherent multi-shot narratives.

Built for: Sequential shot generation | Cinematic style preservation | Reference-guided video continuity


Reference-Driven Continuity Architecture

Kling O1 Omni approaches video-to-video generation through reference preservation rather than style transfer alone. Where standard video generation models interpret prompts independently, this architecture anchors new frames to input video characteristics. Camera movement, motion dynamics, and visual language remain consistent across generated shots.

What this means for you:

  • Cinematic continuity: Generate follow-up shots that maintain the motion style and camera language of your reference video, not just visual aesthetics. Use @Video1 in prompts to reference your input video: "Based on @Video1, generate the next shot. keep the style of the video"
  • Multi-modal input: Combine reference video with up to 4 additional elements (character images with frontal_image_url + reference_image_urls arrays) and style reference images for precise control over scene composition
  • Flexible duration: Output 5-second ($0.84) or 10-second ($1.68) clips at resolutions from 720px to 2160px. Output duration is independent of input duration (3-10 seconds)
  • Automatic aspect ratio: Default "auto" mode detects optimal aspect ratio from input video, or manually select 16:9, 9:16, or 1:1
  • Audio preservation: Option to retain original audio from reference video via keep_audio parameter, maintaining soundtrack continuity across generated sequences

Performance That Scales

Kling O1 Omni positions as a specialized tool for narrative video work, trading speed for reference-guided precision.

MetricResultContext
Cost per Video$0.84 (5s) or $1.68 (10s)Based on $0.168 per second rate
Duration Options5-10 seconds outputIndependent of 3-10 second input duration
Resolution Range720-2160pxSupports HD to 4K output with aspect ratio control
Reference CapacityVideo + 4 elements/imagesCombine source video with character/object references

Technical Specifications

SpecDetails
ArchitectureKling O1 Omni
Input FormatsVideo: .mp4, .mov (3-10 seconds, max 200MB); Reference images: .jpg, .jpeg, .png, .webp, .gif, .avif
Output Formats.mp4 video with optional audio
Aspect Ratiosauto (default), 16:9, 9:16, 1:1
Audio HandlingOptional audio preservation via keep_audio parameter (default: false)
Reference System@Video1 for reference video, @Element1-4 for characters/objects (frontal + reference angles), @Image1-4 for style references
LicenseCommercial use via fal partner agreement

API Documentation


How It Stacks Up

Sora 2 Video to Video - Kling O1 Omni prioritizes reference-based continuity for maintaining cinematic language across shots, ideal for sequential narrative work where motion style consistency matters. Sora 2 Video to Video emphasizes remix capabilities with broader creative transformation options for exploratory video generation workflows.

Wan Video to Video - Kling O1 Omni's multi-element reference system (video + 4 additional inputs) enables precise character and object control for narrative sequences. Wan Video to Video offers alternative approaches to video transformation with different architectural priorities for style and motion handling.

AnimateDiff Video to Video - Kling O1 Omni maintains camera language and motion dynamics from reference footage for professional continuity requirements. AnimateDiff Video to Video provides animation-focused video generation with distinct motion synthesis capabilities suited for stylized content creation.