Kandinsky5 Pro Image to Video Developer Guide

Animating Static Images with Kandinsky5 Pro

The Kandinsky5 Pro Image to Video API converts static images into fluid video sequences through a single API call. Built on the Kandinsky architecture developed by Sber AI, this model extends their latent diffusion framework to temporal generation, enabling text-guided animation of existing visual assets.¹

For product teams building e-commerce animations, social content generators, or creative tools, this API eliminates manual animation work. A 5-second video at 512P costs $0.20; at 1024P, $0.60. The API accepts standard HTTP requests, processes images through diffusion-based temporal synthesis, and returns hosted MP4 URLs. fal's serverless infrastructure eliminates cold starts and handles cluster scaling automatically.

Quick Reference

Specification	Value
Output format	MP4 (H.264)
Duration	5 seconds (fixed)
Resolutions	512P, 1024P
Cost (512P)	$0.04/second ($0.20 per video)
Cost (1024P)	$0.12/second ($0.60 per video)
Accepted image formats	JPG, JPEG, PNG, WebP, GIF, AVIF

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

API Setup and Authentication

Create an account at fal and generate an API key from your dashboard. Store this key as an environment variable.

# Python
pip install fal-client
export FAL_KEY="your-api-key-here"

# JavaScript/TypeScript
npm install @fal-ai/serverless-client
export FAL_KEY="your-api-key-here"

Python Implementation

import fal_client

def generate_video(image_url, prompt, resolution="512P", num_steps=28):
    try:
        result = fal_client.subscribe(
            "fal-ai/kandinsky5-pro/image-to-video",
            arguments={
                "image_url": image_url,
                "prompt": prompt,
                "resolution": resolution,
                "duration": "5s",
                "num_inference_steps": num_steps,
                "acceleration": "regular"
            },
            with_logs=True
        )

        return {
            "success": True,
            "video_url": result["video"]["url"],
            "file_size": result["video"]["file_size"],
            "file_name": result["video"]["file_name"]
        }

    except Exception as e:
        return {"success": False, "error": str(e)}

The subscribe method handles asynchronous generation, waiting for completion and returning the result.

JavaScript Integration

For Node.js or browser applications:

import * as fal from "@fal-ai/serverless-client";

async function generateVideo(imageUrl, prompt, options = {}) {
  const { resolution = "512P", numInferenceSteps = 28 } = options;

  const result = await fal.subscribe("fal-ai/kandinsky5-pro/image-to-video", {
    input: {
      image_url: imageUrl,
      prompt: prompt,
      resolution: resolution,
      duration: "5s",
      num_inference_steps: numInferenceSteps,
      acceleration: "regular",
    },
    logs: true,
    onQueueUpdate: (update) => {
      if (update.status === "IN_PROGRESS") {
        console.log(`Generation progress: ${update.logs}`);
      }
    },
  });

  return { videoUrl: result.video.url, fileSize: result.video.file_size };
}

The onQueueUpdate callback enables real-time progress updates for displaying status indicators during generation.

Response Schema

The API returns a JSON object with the following structure:

{
  "video": {
    "url": "https://v3b.fal.media/files/.../output.mp4",
    "file_size": 22253751,
    "file_name": "output.mp4",
    "content_type": "application/octet-stream"
  }
}

The url field contains a publicly accessible link to the generated MP4. URLs remain valid for a limited time; download or cache videos promptly for persistent storage.

Image Input Requirements

The API accepts publicly accessible image URLs. Local file paths will not work. Upload images to cloud storage first (S3 pre-signed URLs, Cloudinary, or similar).

Supported formats: JPG, JPEG, PNG, WebP, GIF, AVIF. The input image serves as the first frame anchor, with the model generating motion that preserves the original composition. For optimal results, use images with clear subjects and sufficient resolution for your target output.

Request Parameters

Parameter	Default	Options	Effect
resolution	512P	512P, 1024P	Output quality and cost
num_inference_steps	28	1-40	Quality vs. speed tradeoff
duration	5s	5s	Fixed duration
acceleration	regular	none, regular	Speed optimization

num_inference_steps: The default of 28 balances quality and speed effectively. Reducing to 15-20 accelerates generation but may introduce artifacts. Values beyond 35 rarely improve output noticeably.

acceleration: Use "regular" for standard optimizations. Set to "none" for maximum quality when generation time is unconstrained.

Prompt Engineering

Effective prompts specify three elements: subject action, camera behavior, and atmospheric details. Use explicit motion verbs and single-shot instructions rather than multiple scene changes.

Example prompt patterns:

Product rotation: "The product rotates slowly on a white surface, camera holds steady, soft studio lighting"
Portrait animation: "The subject breathes naturally, eyes blink occasionally, subtle wind moves hair, camera static"
Scene atmosphere: "Gentle waves ripple across the water surface, clouds drift slowly overhead, golden hour lighting"

For minimal motion, specify "subtle movement" or "minimal animation" explicitly. The model interprets unqualified prompts as requests for moderate motion.²

Error Handling

Implement retry logic with exponential backoff for production applications:

import time

def generate_with_retry(image_url, prompt, max_retries=3):
    for attempt in range(max_retries):
        result = generate_video(image_url, prompt)

        if result["success"]:
            return result

        if attempt < max_retries - 1:
            wait_time = 2 ** attempt
            time.sleep(wait_time)

    return {"success": False, "error": "Max retries exceeded"}

Common failure modes include invalid image URLs, unsupported formats, and rate limiting. Validate inputs before submission and implement client-side throttling for user-facing applications.

Performance and Cost Optimization

Balance quality, speed, and cost through these approaches:

Resolution selection: Use 512P ($0.20/video) for mobile delivery, social posts, and previews. Reserve 1024P ($0.60/video) for desktop viewing and high-quality requirements.
Batch processing: For non-interactive workflows, submit multiple requests in parallel. The fal infrastructure handles concurrent requests efficiently.
Caching: Store video URLs for repeated image-prompt combinations. Use Redis or your database with appropriate TTL values.
Prompt testing: Develop a library of proven prompt patterns during development to reduce iteration cycles in production.

Production Deployment Checklist

Before deploying:

Store API keys in environment variables or secret management systems
Implement error handling with user-friendly messages
Configure monitoring for success rates, latency, and error patterns
Test with diverse image types and prompts
Set appropriate timeouts based on expected generation times
Implement queue systems for high-volume applications (recommended above 100 requests/hour)
Enable video caching to reduce redundant API calls
Establish rate limiting policies aligned with your usage tier

Integration Patterns

Start with the code examples provided here, then layer optimization strategies as usage scales. The API supports both prototype experiments and production applications processing substantial daily volumes.

Combine image-to-video capability with other generative tools for complete workflows: generate product images with FLUX1.1 [pro], animate them with Kandinsky5 Pro, then deliver through your existing media pipeline. For comparison, explore Luma Dream Machine or Kling 2.1 to evaluate which model best fits your specific use case and budget constraints.

The 5-second duration is a model constraint, not a billing unit. For longer content, generate multiple clips and concatenate them in post-production, or evaluate alternative models that support extended durations.