Run the latest models all in one Sandbox 🏖️

Kling Video 2.6 Motion Control Prompt Guide

Explore all models

Kling 2.6 Motion Control extracts choreography from reference videos and applies it to your character images. Prompts should describe context and environment rather than motion. Use Image Orientation for portrait animations with camera movement (max 10s), and Video Orientation for full-body performances (max 30s).

last updated
1/11/2026
edited by
Zachary Roth
read time
7 minutes
Kling Video 2.6 Motion Control Prompt Guide

Rethinking Video Prompts for Motion Transfer

Motion transfer differs fundamentally from text-to-video generation. Rather than interpreting abstract motion descriptions, Kling 2.6 Motion Control uses a reference video as the movement blueprint. Your character image supplies visual identity, the reference video provides choreography, and your prompt establishes contextual guidance that helps the model blend these elements coherently.

This three-input architecture shifts what effective prompts must accomplish. Research on video diffusion transformers demonstrates that motion patterns can be extracted from reference videos through attention mechanisms and applied to newly synthesized content while preserving appearance from source images.1 When crafting prompts for motion control, you provide environmental framing, lighting conditions, and stylistic direction that help the model render your character within the transferred motion pattern.

Quick Start

import { fal } from "@fal-ai/client";

const result = await fal.subscribe(
  "fal-ai/kling-video/v2.6/standard/motion-control",
  {
    input: {
      image_url: "https://example.com/character.png",
      video_url: "https://example.com/dance-reference.mp4",
      character_orientation: "video",
      prompt:
        "A hip-hop dancer performing in an urban environment, graffiti walls, golden hour lighting",
    },
  }
);

console.log(result.data.video.url);

falMODEL APIs

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

falSERVERLESS

Scale custom models and apps to thousands of GPUs instantly

falCOMPUTE

A fully controlled GPU cloud for enterprise AI training + research

API Parameters

The motion control endpoint accepts these primary parameters:

ParameterTypeRequiredDescription
image_urlstringYesCharacter image URL. Subject should have clear body proportions, avoid occlusion, occupy >5% of image area
video_urlstringYesReference video URL. Must contain realistic character with visible upper body or full body, including head
character_orientationenumYes"image" (max 10s) or "video" (max 30s)
promptstringNoEnvironmental and stylistic context
keep_original_soundbooleanNoPreserve reference video audio. Default: true

Input requirements: Images accept JPG, PNG, WEBP, GIF, AVIF formats. Videos accept MP4, MOV, WEBM, M4V, GIF formats.

Character Orientation Modes

The character_orientation parameter determines how the model interprets spatial information and constrains output duration.

Image Orientation preserves your reference image's pose and facing direction while adopting movements from the reference video. This mode excels when camera movements are the primary creative goal. Pans, tilts, and tracking shots perform well here. Maximum duration is 10 seconds.

Video Orientation transfers both motion and spatial orientation from the reference video. Body positioning, turns, and spatial relationships follow the reference video literally, making it appropriate for dance sequences, athletic movements, and elaborate choreography. Maximum duration extends to 30 seconds.

Structuring Effective Prompts

Since motion is already defined by your reference video, prompts function as scene-setting tools rather than motion descriptions. Focus on three elements:

Character Identity Enhancement

Reinforce or modify character identity from your reference image, particularly when details are ambiguous or clothing must adapt to motion:

  • "A professional ballet dancer in elegant attire"
  • "An elderly man with distinguished gray hair and formal suit"
  • "A young athlete wearing modern sportswear"

Environmental Context

Establish where the action occurs and under what conditions:

  • "performing on a spotlit theater stage with dramatic shadows"
  • "in a sunlit park with soft afternoon light filtering through trees"
  • "inside a modern dance studio with mirrored walls"

Style Modifiers

Elevate production quality with stylistic guidance:

  • "cinematic lighting, professional photography, 4K quality"
  • "soft natural lighting, documentary style, authentic atmosphere"

Pricing and Tier Selection

Motion control is available in two tiers:

TierEndpointCostBest For
Standardv2.6/standard/motion-control$0.07/secondPortraits, simple animations, iteration
Prov2.6/pro/motion-control$0.112/secondComplex choreography, production output

A 10-second Video Orientation generation costs $0.70 (Standard) or $1.12 (Pro). For iterative prompt development, use Standard tier, then switch to Pro for final renders if higher fidelity is required.

Common Failure Modes

Over-describing motion is the most frequent error. The reference video already defines movement. Prompts like "dancing energetically with spinning and jumping" are redundant. Focus on where and under what conditions rather than how the character moves.

Poor reference video quality undermines results regardless of prompt quality. The model requires reference videos with clear, unobstructed body positions. Inadequate lighting or partial occlusion cannot be compensated through prompt engineering.

Character-image incompatibility causes visual inconsistencies. If your reference video shows casual athletic movements but your character image wears formal attire, the model struggles to reconcile these elements. Bridge such gaps explicitly: "an elegant woman in a flowing gown adapted for movement."

Mismatched orientation mode produces suboptimal results. Using Image Orientation for complex dance routines limits motion transfer quality. Using Video Orientation for simple portrait animations may introduce unwanted character rotation.

Advanced Techniques

Layering descriptive elements produces richer results. Rather than "a dancer on stage," try "a graceful ballet dancer on a grand theater stage, soft pink lighting casting gentle shadows, audience seats visible in the darkness beyond."

Temporal consistency keywords help maintain visual stability. Include phrases like "consistent lighting," "steady camera," or "continuous motion" for smooth output without jarring transitions.

Style transfer through language enables aesthetic shifts without altering core motion. Adding "rendered in anime style" or "photorealistic with film grain" changes visual treatment while preserving choreography. Research on motion transfer demonstrates that decoupling appearance from motion enables applying movement patterns to arbitrary content while maintaining visual coherence.2

Iterative Workflow

Start each project by testing your reference video and character image combination with a minimal prompt. Something as simple as "a person performing" verifies that motion transfer functions correctly before investing time in detailed prompt crafting.

Once basic motion transfer appears satisfactory, incrementally add prompt elements. First add character identity details and generate. Then add environmental context and generate again. Finally layer in stylistic modifiers. This approach reveals which prompt elements actually improve results.

Maintain a prompt library organized by use case. When you discover a prompt structure that works well for portrait videos, save it as a template. This library becomes increasingly valuable as you identify patterns across different scenarios.

Production Considerations

Motion control prompts require a different mindset than text-to-video generation. You orchestrate the fusion of your character's identity with captured motion within a context defined through language. The reference video handles choreography. Your job is establishing the world in which that motion occurs, the character who performs it, and the visual style that unifies the composition.

For production workflows, consider using webhooks for long-running requests rather than blocking on results. The fal client supports queue-based submission for integration into automated pipelines.

Recently Added

References

  1. Pondaven, A., Siarohin, A., Tulyakov, S., Torr, P., & Pizzati, F. "Video Motion Transfer with Diffusion Transformers." CVPR 2025. https://arxiv.org/abs/2412.07776

  2. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. "First Order Motion Model for Image Animation." Conference on Neural Information Processing Systems (NeurIPS), 2019. https://aliaksandrsiarohin.github.io/first-order-model-website/

about the author
Zachary Roth
A generative media engineer with a focus on growth, Zach has deep expertise in building RAG architecture for complex content systems.

Related articles