Kling 2.6 Motion Control extracts choreography from reference videos and applies it to your character images. Prompts should describe context and environment rather than motion. Use Image Orientation for portrait animations with camera movement (max 10s), and Video Orientation for full-body performances (max 30s).
Rethinking Video Prompts for Motion Transfer
Motion transfer differs fundamentally from text-to-video generation. Rather than interpreting abstract motion descriptions, Kling 2.6 Motion Control uses a reference video as the movement blueprint. Your character image supplies visual identity, the reference video provides choreography, and your prompt establishes contextual guidance that helps the model blend these elements coherently.
This three-input architecture shifts what effective prompts must accomplish. Research on video diffusion transformers demonstrates that motion patterns can be extracted from reference videos through attention mechanisms and applied to newly synthesized content while preserving appearance from source images.1 When crafting prompts for motion control, you provide environmental framing, lighting conditions, and stylistic direction that help the model render your character within the transferred motion pattern.
Quick Start
import { fal } from "@fal-ai/client";
const result = await fal.subscribe(
"fal-ai/kling-video/v2.6/standard/motion-control",
{
input: {
image_url: "https://example.com/character.png",
video_url: "https://example.com/dance-reference.mp4",
character_orientation: "video",
prompt:
"A hip-hop dancer performing in an urban environment, graffiti walls, golden hour lighting",
},
}
);
console.log(result.data.video.url);
falMODEL APIs
The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models
API Parameters
The motion control endpoint accepts these primary parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
image_url | string | Yes | Character image URL. Subject should have clear body proportions, avoid occlusion, occupy >5% of image area |
video_url | string | Yes | Reference video URL. Must contain realistic character with visible upper body or full body, including head |
character_orientation | enum | Yes | "image" (max 10s) or "video" (max 30s) |
prompt | string | No | Environmental and stylistic context |
keep_original_sound | boolean | No | Preserve reference video audio. Default: true |
Input requirements: Images accept JPG, PNG, WEBP, GIF, AVIF formats. Videos accept MP4, MOV, WEBM, M4V, GIF formats.
Character Orientation Modes
The character_orientation parameter determines how the model interprets spatial information and constrains output duration.
Image Orientation preserves your reference image's pose and facing direction while adopting movements from the reference video. This mode excels when camera movements are the primary creative goal. Pans, tilts, and tracking shots perform well here. Maximum duration is 10 seconds.
Video Orientation transfers both motion and spatial orientation from the reference video. Body positioning, turns, and spatial relationships follow the reference video literally, making it appropriate for dance sequences, athletic movements, and elaborate choreography. Maximum duration extends to 30 seconds.
Structuring Effective Prompts
Since motion is already defined by your reference video, prompts function as scene-setting tools rather than motion descriptions. Focus on three elements:
Character Identity Enhancement
Reinforce or modify character identity from your reference image, particularly when details are ambiguous or clothing must adapt to motion:
- "A professional ballet dancer in elegant attire"
- "An elderly man with distinguished gray hair and formal suit"
- "A young athlete wearing modern sportswear"
Environmental Context
Establish where the action occurs and under what conditions:
- "performing on a spotlit theater stage with dramatic shadows"
- "in a sunlit park with soft afternoon light filtering through trees"
- "inside a modern dance studio with mirrored walls"
Style Modifiers
Elevate production quality with stylistic guidance:
- "cinematic lighting, professional photography, 4K quality"
- "soft natural lighting, documentary style, authentic atmosphere"
Pricing and Tier Selection
Motion control is available in two tiers:
| Tier | Endpoint | Cost | Best For |
|---|---|---|---|
| Standard | v2.6/standard/motion-control | $0.07/second | Portraits, simple animations, iteration |
| Pro | v2.6/pro/motion-control | $0.112/second | Complex choreography, production output |
A 10-second Video Orientation generation costs $0.70 (Standard) or $1.12 (Pro). For iterative prompt development, use Standard tier, then switch to Pro for final renders if higher fidelity is required.
Common Failure Modes
Over-describing motion is the most frequent error. The reference video already defines movement. Prompts like "dancing energetically with spinning and jumping" are redundant. Focus on where and under what conditions rather than how the character moves.
Poor reference video quality undermines results regardless of prompt quality. The model requires reference videos with clear, unobstructed body positions. Inadequate lighting or partial occlusion cannot be compensated through prompt engineering.
Character-image incompatibility causes visual inconsistencies. If your reference video shows casual athletic movements but your character image wears formal attire, the model struggles to reconcile these elements. Bridge such gaps explicitly: "an elegant woman in a flowing gown adapted for movement."
Mismatched orientation mode produces suboptimal results. Using Image Orientation for complex dance routines limits motion transfer quality. Using Video Orientation for simple portrait animations may introduce unwanted character rotation.
Advanced Techniques
Layering descriptive elements produces richer results. Rather than "a dancer on stage," try "a graceful ballet dancer on a grand theater stage, soft pink lighting casting gentle shadows, audience seats visible in the darkness beyond."
Temporal consistency keywords help maintain visual stability. Include phrases like "consistent lighting," "steady camera," or "continuous motion" for smooth output without jarring transitions.
Style transfer through language enables aesthetic shifts without altering core motion. Adding "rendered in anime style" or "photorealistic with film grain" changes visual treatment while preserving choreography. Research on motion transfer demonstrates that decoupling appearance from motion enables applying movement patterns to arbitrary content while maintaining visual coherence.2
Iterative Workflow
Start each project by testing your reference video and character image combination with a minimal prompt. Something as simple as "a person performing" verifies that motion transfer functions correctly before investing time in detailed prompt crafting.
Once basic motion transfer appears satisfactory, incrementally add prompt elements. First add character identity details and generate. Then add environmental context and generate again. Finally layer in stylistic modifiers. This approach reveals which prompt elements actually improve results.
Maintain a prompt library organized by use case. When you discover a prompt structure that works well for portrait videos, save it as a template. This library becomes increasingly valuable as you identify patterns across different scenarios.
Production Considerations
Motion control prompts require a different mindset than text-to-video generation. You orchestrate the fusion of your character's identity with captured motion within a context defined through language. The reference video handles choreography. Your job is establishing the world in which that motion occurs, the character who performs it, and the visual style that unifies the composition.
For production workflows, consider using webhooks for long-running requests rather than blocking on results. The fal client supports queue-based submission for integration into automated pipelines.
Recently Added
References
-
Pondaven, A., Siarohin, A., Tulyakov, S., Torr, P., & Pizzati, F. "Video Motion Transfer with Diffusion Transformers." CVPR 2025. https://arxiv.org/abs/2412.07776 ↩
-
Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. "First Order Motion Model for Image Animation." Conference on Neural Information Processing Systems (NeurIPS), 2019. https://aliaksandrsiarohin.github.io/first-order-model-website/ ↩

![Image-to-image editing with LoRA support for FLUX.2 [klein] 9B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aaeb2%2FFZOclk1jcZaVZAP_C12Qe_edbbb28567484c48bd205f24bafd6225.jpg&w=3840&q=75)
![Image-to-image editing with LoRA support for FLUX.2 [klein] 4B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aae07%2FWKhXnfsA7BNpDGwCXarGn_52f0f2fdac2c4fc78b2765b6c662222b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f49%2FnKsGN6UMAi6IjaYdkmILC_e20d2097bb984ad589518cf915fe54b4.jpg&w=3840&q=75)
![Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f3c%2F90FKDpwtSCZTqOu0jUI-V_64c1a6ec0f9343908d9efa61b7f2444b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f50%2FX8ffS5h55gcigsNZoNC7O_52e6b383ac214d2abe0a2e023f03de88.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f36%2FbYUAh_nzYUAUa_yCBkrP1_2dd84022eeda49e99db95e13fc588e47.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f40%2F-9rbLPCsz36IFb-4t3J2L_76750002c0db4ce899b77e98321ffe30.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f30%2FUwGq5qBE9zqd4r6QI7En0_082c2d0376a646378870218b6c0589f9.jpg&w=3840&q=75)








