fal-ai/kling-video/o1/video-to-video/edit

Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.

Inference

Commercial use

Partner

Schema

LLMs

Playground API

Input

Prompt*

Type @ to reference relevant media.

Video Url*

Video 1

Reference as @Video1 in your prompt

Hint: Drag and drop video files from your computer, video from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp4, mov, webm, m4v, gif

Image Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

Image 1

1 image added

Elements

Element 1

Reference as @Element1 in your prompt

Frontal Image Url*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Reference Image Urls

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

2 images added

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

Your request will cost $0.168 per second.

Logs

Kling O1 Edit: Video to Video [video editing]

Kuaishou's Kling O1 Edit delivers natural language video transformation at $0.168 per second, trading traditional masking workflows for prompt-based editing that preserves original motion structure. The model accepts up to 4 combined reference elements and images, enabling complex character swaps and environment transformations through simple text commands.

Built for: Character replacement in existing footage | Scene environment transformations | Style transfer while maintaining motion

Context-Aware Video Transformation Without Masking

Kling O1 Edit operates on a fundamentally different approach than frame-by-frame editing tools: it understands the entire motion structure of your input video and applies transformations that respect camera angles, movement patterns, and spatial relationships. Where traditional video editing requires manual masking and frame-level adjustments, this model interprets natural language instructions and applies them across the full video duration.

What this means for you:

Multi-reference editing: Combine up to 4 total elements and reference images in a single transformation, enabling complex character swaps with specific style references
Motion preservation: Original camera movements and subject motion remain intact while subjects, settings, and visual style transform according to your prompt
Natural language control: Direct the edit through conversational instructions rather than technical parameters. Example: "Replace the character with @Element1, maintaining the same movements and camera angles. Transform the landscape into @Image1"
Audio preservation: Choose to keep original audio from your source video or generate silent output through the keep_audio parameter
Element structure: Each element accepts one frontal image plus multiple reference angles (frontal_image_url + reference_image_urls array), giving the model comprehensive visual context for accurate transformations

Performance That Scales

Kling O1 Edit's per-second pricing model reflects the computational complexity of motion-preserving video transformation, with costs scaling directly to your input video duration.

Metric	Result	Context
Cost per Video	$0.50-$1.68	Based on 3-10 second input duration at $0.168/second
Input Duration	3-10 seconds	Supports .mp4, .mov, .webm, .m4v, .gif up to 200MB
Resolution Range	720-2160px	Accepts standard HD through 4K input resolutions
Reference Capacity	Up to 4 total	Combined limit for elements and style reference images

Technical Specifications

Spec	Details
Architecture	Kling O1 Edit
Input Formats	Video (.mp4, .mov, .webm, .m4v, .gif), reference images (.jpg, .jpeg, .png, .webp, .gif, .avif)
Output Formats	.mp4 video
Video Duration	3-10 seconds (output matches input duration)
Audio Handling	Optional audio preservation via keep_audio parameter (default: false)
Prompt Syntax	@Element1, @Element2 for tracked elements; @Image1, @Image2 for style references
License	Commercial use via fal partnership

API Documentation

How It Stacks Up

Sora 2 Video to Video - Kling O1 Edit prioritizes multi-reference element control with up to 4 combined inputs for complex character and environment transformations. Sora 2's remix capabilities emphasize broader creative reinterpretation and style transfer across longer video durations, ideal for narrative content that requires substantial visual reimagining.

Wan Video to Video - Kling O1 Edit's natural language interface eliminates technical parameter tuning, making it accessible for creators who want direct prompt-based control. Wan's video-to-video endpoint offers granular parameter control for users who need precise technical adjustments in their transformation workflows.

AnimateDiff Video to Video - Kling O1 Edit maintains original motion structure while transforming visual content, preserving the exact camera movements and subject actions from your source footage. AnimateDiff focuses on animation-style transformations and motion synthesis, serving creators building stylized or animated content from video references.