Kandinsky5 Pro Image to Video Prompt Guide

What Makes Image-to-Video Prompts Different

The gap between forgettable AI video and content that holds attention starts with your prompt. Kandinsky5 Pro Image to Video operates differently than text-to-video models. It analyzes your input image and uses your prompt to determine temporal evolution, not to generate content from scratch.

Built on a 19B parameter diffusion model with Flow Matching architecture, the model delivers high-quality video from static images with strong prompt adherence¹. The technical capability exists. The constraint is instruction clarity.

Understanding how the model interprets prompts determines output quality. The system processes three distinct instruction types:

Camera movement: Cinematic terms like "camera slowly moves closer," "orbits clockwise," or "pans left to right" control virtual camera behavior². The model maintains visual coherence while executing these movements.
Subject behavior: Instructions describing what subjects should do, such as "stands still, eyes forward," "turns toward viewer," or "hair flowing in wind," guide animation decisions.
Environmental dynamics: Lighting shifts, weather changes, or atmospheric elements inform temporal evolution beyond motion alone.

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Prompt Structure That Produces Consistent Results

Effective Kandinsky5 Pro Image to Video prompts follow a three-part structure:

1. Subject State and Positioning Describe the current state of your main subject to anchor the model's understanding: "The white dragon warrior stands still" or "The woman sits at the cafe table."

2. Action or Movement Description Specify what should happen during the video duration. Be explicit about whether the subject moves, remains static, or transforms: "eyes full of determination and strength" or "slowly turns her head toward the window."

3. Camera Direction State camera instructions explicitly. This is where most users lose quality. Instead of leaving camera movement implicit, define it: "The camera slowly moves closer or circles around the warrior, highlighting the powerful presence and heroic spirit of the character."³

Complete example:

"The vintage car sits parked on an empty desert highway at sunset. Dust swirls around the wheels as heat waves shimmer from the asphalt. The camera slowly pulls back revealing the vast landscape while maintaining focus on the vehicle's chrome details catching golden hour light."

Proven Prompt Examples by Use Case

Portrait Animation: "The portrait subject maintains a neutral expression while subtle micro-movements bring life to the eyes and slight head tilt. Soft lighting shifts across the face as the camera performs a gentle push-in, creating intimacy and connection with the viewer."

This works because it balances stillness with subtle motion, avoiding uncanny valley effects. The lighting direction provides atmospheric guidance.

Product Showcase: "The product remains centered and stationary as the camera orbits 180 degrees from left to right at eye level. Dramatic lighting highlights textures and materials while the background maintains soft focus throughout the rotation."

Clear camera path with specific degree measurement removes ambiguity. The background focus instruction prevents competing visual elements.

Landscape/Environment: "The mountain landscape stretches into the distance under a partly cloudy sky. Clouds move slowly across the frame from right to left while the camera executes a subtle crane up movement, revealing more of the valley below. Natural lighting shifts as cloud shadows pass over the terrain."

Multiple motion elements (clouds, camera, lighting) create depth without overwhelming the generation process.

Parameter Configuration for Different Scenarios

Parameter	Default	When to Adjust	Recommended Range
Resolution	512P	Use 1024P for final outputs only	512P (prototyping), 1024P (production)
Inference Steps	28	Reduce for rapid iteration, increase for complex scenes	15-20 (testing), 35-40 (detail refinement)
Acceleration	regular	Disable only when artifacts appear	regular (standard), none (complex images)
Duration	5s	Not adjustable	5s (fixed)

Resolution Selection: Choose 512P for rapid prototyping and concept testing. Switch to 1024P for final outputs. Generation times vary based on queue depth and system load, but 512P provides cost advantages during experimentation.

Inference Steps: The default 28 steps balances quality and speed for most use cases. Reduce to 15-20 when generating multiple variations during exploration. Increase to 35-40 only when you need maximum detail in complex scenes with intricate motion¹.

Acceleration Settings: The "regular" mode balances quality and generation speed. Use "none" only when working with complex images where you've encountered artifacts with acceleration enabled.

Duration Constraints: The model supports only 5-second generation. Structure prompts to describe complete actions that fit this timeframe naturally. Think of capturing a moment rather than telling a story: "the subject turns and smiles" rather than "the subject walks across the room, sits down, and begins reading."

Advanced Techniques for Production Quality

Layered Motion Description: Describe multiple simultaneous motion layers: "The subject's hair moves gently in the breeze while their eyes track left to right following an unseen object. The camera maintains a steady medium shot with subtle handheld movement for natural realism."

Negative Space Utilization: Include instructions about what should NOT happen: "The background remains static and unfocused while all motion concentrates on the foreground subject. No camera shake or sudden movements."

Temporal Pacing Language: Use modifiers that convey speed and timing: "gradually," "suddenly," "smoothly," "abruptly." These help the model understand motion velocity. "The camera gradually pushes in" versus "The camera suddenly zooms forward" produce distinctly different results.

Cinematic Reference Points: Film terminology improves results: "shallow depth of field," "rack focus," "dolly shot," "crane movement." The model's training data includes cinematic content, making it responsive to professional video terminology.

Common Mistakes That Degrade Output Quality

Overcomplicating Motion: Requesting too many simultaneous actions confuses the model and produces muddled results. If your prompt describes more than three distinct motion elements, simplify. Quality of motion beats quantity.

Vague Camera Instructions: "The camera moves around" fails compared to "The camera orbits clockwise at a 45-degree angle." Specificity in camera direction yields predictable, professional results.

Ignoring Source Image Content: Your prompt must align with what actually exists in your input image. If your image shows a close-up portrait, prompting for "camera pulls back to reveal a full body shot" will disappoint. The model cannot generate information absent from the source.

Contradictory Instructions: "The subject remains perfectly still while dancing energetically" creates confusion. Ensure your prompt elements support rather than contradict each other.

Neglecting Lighting Continuity: If your source image has specific lighting, acknowledge it in your prompt: "maintaining the dramatic side lighting from the source image" helps preserve visual consistency.

Error Handling and Troubleshooting

When working with the Kandinsky5 Pro Image to Video API, you'll encounter standard error patterns. The most common issues and their resolutions:

Authentication Errors: If you receive authentication failures, verify your FAL_KEY environment variable is set correctly. The API requires this key for all requests.

Image URL Issues: The model requires a valid image_url parameter. If generations fail, confirm your image is publicly accessible and returns a valid image content type. Data URIs (base64) are supported as an alternative to hosted URLs.

Parameter Validation Errors: The API enforces strict parameter ranges. num_inference_steps must be between 1-40, resolution accepts only "512P" or "1024P", and duration is fixed at "5s". Invalid values return immediate validation errors.

Rate Limiting: During high-traffic periods, requests may queue. Use the onQueueUpdate callback to monitor request status. The queue system handles backpressure automatically—you don't need retry logic for standard queuing.

Generation Failures: If a generation completes but produces unexpected results, check your prompt against the source image. Misalignment between prompt instructions and image content is the primary cause of poor outputs.

Implementation Workflow on fal

Start with batch testing: generate multiple variations with slightly different prompts using the same source image. Document which prompt variations produce desired results, building a prompt library for future projects.

Use the queue management system to submit multiple requests simultaneously during exploration. This parallel processing approach means you can test five prompt variations concurrently.

Monitor results through the logging system to identify patterns. If certain camera movements consistently produce artifacts, adjust your approach before scaling production.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/kandinsky5-pro/image-to-video", {
  input: {
    prompt:
      "The portrait subject maintains a neutral expression while subtle micro-movements bring life to the eyes. The camera performs a gentle push-in, creating intimacy with the viewer.",
    image_url: "https://example.com/portrait.jpg",
    resolution: "512P",
    num_inference_steps: 28,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

Kandinsky5 Pro Image to Video Prompt Guide

What Makes Image-to-Video Prompts Different

falMODEL APIs

falSERVERLESS

falCOMPUTE