LTX Video-0.9.7 13B Distilled | Video to Video

fal-ai/ltx-video-13b-distilled/multiconditioning

Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B Distilled and custom LoRA

Inference

Commercial use

Schema

LLMs

Playground API

Input

Prompt*

Images

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Start Frame Number

Strength

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Start Frame Number

Strength

Videos

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

{
  "video": {
    "url": "https://storage.googleapis.com/falserverless/example_outputs/ltxv-multiconditioning-output.mp4"
  },
  "prompt": "A vibrant, abstract composition featuring a person with outstretched arms, rendered in a kaleidoscope of colors against a deep, dark background. The figure is composed of intricate, swirling patterns reminiscent of a mosaic, with hues of orange, yellow, blue, and green that evoke the style of artists such as Wassily Kandinsky or Bridget Riley. The camera zooms into the face striking portrait of a man, reimagined through the lens of old-school video-game graphics. The subject's face is rendered in a kaleidoscope of colors, with bold blues and reds set against a vibrant yellow backdrop. His dark hair is pulled back, framing his profile in a dramatic pose."
}

Your request will cost $0.04 per video. For $1 you can run this model approximately 25 times.

Logs

Readme

LTX Video-0.9.7 13B Distilled [multiconditioning]

Transform your existing videos and images into stunning new visual compositions using advanced multi-conditioning AI that seamlessly blends prompts, images, and video inputs into cohesive generated content.

Overview

LTX Video-0.9.7 13B Distilled delivers powerful video-to-video transformation capabilities through a 13-billion parameter architecture optimized for multi-conditional generation. Built to handle complex creative workflows, this model excels at combining text prompts with image and video inputs to produce high-quality video outputs. Whether you're transforming existing footage, applying artistic styles, or creating hybrid compositions, the model delivers professional results without requiring specialized video editing expertise.

Key Capabilities

Multi-conditional generation combining text, images, and video inputs
Flexible resolution support with 480p and 720p output options
Custom LoRA integration for specialized style and content control
Production-ready output with configurable frame rates and compression

Popular Use Cases

Video Style Transfer and Artistic Reimagining Transform existing video footage into entirely new artistic styles by combining source videos with style reference images and descriptive prompts. Apply kaleidoscopic effects, convert realistic footage into abstract compositions, or reimagine scenes through the lens of specific artistic movements like impressionism or pixel art aesthetics.

Content Variation and A/B Testing Generate multiple variations of video content for marketing campaigns by modifying specific visual elements while maintaining core composition. Test different color palettes, backgrounds, or stylistic treatments across the same base footage to optimize engagement and identify the most effective creative direction.

Hybrid Media Composition Create unique video content by blending multiple image references with video inputs and detailed text prompts. Construct complex visual narratives that seamlessly transition between different artistic styles, combine disparate visual elements into cohesive sequences, or generate videos that match specific brand guidelines and aesthetic requirements.

Getting Started

Getting up and running with LTX Video-0.9.7 13B Distilled takes just a few minutes. Here's how to begin:

Get your API key at https://fal.ai/login to access the model
Install the client library for your preferred programming language
Make your first API call to generate a video from your inputs

javascript
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY"
});

const result = await fal.subscribe("fal-ai/ltx-video-13b-distilled/multiconditioning", {
  input: {
    prompt: "A vibrant abstract portrait with swirling mosaic patterns in blues, oranges, and yellows, transforming into a striking close-up with bold geometric shapes"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  }
});

// Access the generated video URL
console.log(result.data.video.url);
console.log(result.requestId);

python
from fal_client import FalClient

client = FalClient("YOUR_FAL_KEY")

result = client.subscribe("fal-ai/ltx-video-13b-distilled/multiconditioning", {
    "prompt": "A vibrant abstract portrait with swirling mosaic patterns in blues, oranges, and yellows, transforming into a striking close-up with bold geometric shapes"
})

# Access the generated video URL
print(result['video']['url'])

Technical Specifications

Model Architecture

13 billion parameter distilled model optimized for video generation
Multi-conditional architecture supporting text, image, and video inputs
Two-pass inference system for enhanced quality and detail refinement
Custom LoRA support for specialized fine-tuning

Input Capabilities

Resolution options: 480p and 720p output
Aspect ratios: 9:16 (portrait), 1:1 (square), 16:9 (landscape), or automatic detection
Frame count: Configurable up to 121 frames per generation
Frame rate: Adjustable, default 30 fps
Multiple image and video conditioning inputs with frame-level control
Supported input formats: JPG, JPEG, PNG, WEBP, GIF, AVIF for images

Performance Features

Configurable inference steps for quality-speed balance
Built-in safety checker for content moderation
Automatic prompt expansion using language models
Constant rate factor (CRF) compression for optimized input processing (default: 35)
Reverse video playback option for creative effects

Best Practices

Achieve optimal results with these proven approaches:

Layer Your Conditioning Inputs Strategically When combining multiple images or videos, consider the temporal placement and strength of each input. Start with a base image at frame 0 with higher strength (0.7-0.9) to establish the foundational composition, then introduce additional conditioning at later frames with lower strength (0.3-0.6) to guide transitions. This approach creates smoother visual evolution rather than abrupt changes. For example, when transforming a portrait into an abstract style, use your source portrait at full strength initially, then introduce style references at mid-sequence frames with reduced strength to create gradual artistic transformation.

Craft Detailed, Compositional Prompts Instead of generic descriptions like "make it artistic," construct prompts that describe the complete visual composition, color palette, artistic movement, and camera behavior. A prompt like "A vibrant, abstract composition featuring a person with outstretched arms, rendered in a kaleidoscope of colors against a deep, dark background, with intricate swirling patterns in hues of orange, yellow, blue, and green. The camera slowly zooms into the face" provides clear guidance for coherent generation across all frames.

Balance Quality and Efficiency with Inference Steps The default two-pass system uses 8 steps per pass, which balances quality and generation time effectively. For faster iterations during creative exploration, reduce `first_pass_num_inference_steps` to 4-6 and `second_pass_num_inference_steps` to 4-6. For final production renders requiring maximum detail, increase both to 10-12 steps. The `first_pass_skip_final_steps` (default: 1) and `second_pass_skip_initial_steps` (default: 5) parameters control the division of labor between passes—the first pass handles broad composition while the second refines details.

Advanced Features

Custom LoRA Integration

Enhance your generations with specialized LoRA weights trained for specific styles, subjects, or artistic approaches. The model accepts multiple LoRA weights simultaneously, each with adjustable scale parameters to control their influence on the final output. This enables precise style control beyond what text prompts alone can achieve.

javascript
const result = await fal.subscribe("fal-ai/ltx-video-13b-distilled/multiconditioning", {
  input: {
    prompt: "A cinematic portrait in vintage film style",
    loras: [
      {
        path: "path/to/vintage-film-lora",
        scale: 0.8
      }
    ],
    resolution: "720p"
  }
});

Video Conditioning

Beyond image inputs, you can use existing video footage as conditioning to guide motion, composition, and temporal consistency. Specify the starting frame and strength for each video input to control how it influences different segments of your generated output.

javascript
const result = await fal.subscribe("fal-ai/ltx-video-13b-distilled/multiconditioning", {
  input: {
    prompt: "Transform this footage into a watercolor painting style",
    videos: [
      {
        video_url: "https://example.com/source-video.mp4",
        start_frame_number: 0,
        strength: 0.7
      }
    ],
    resolution: "720p",
    num_frames: 121
  }
});

File Upload Support

For optimal performance with the fal.ai platform, you can upload your conditioning images and videos directly to fal.ai's storage system. This approach reduces latency and ensures reliable access during generation, particularly important for larger video files.

javascript
import { fal } from "@fal-ai/client";

const file = new File([imageBlob], "reference.jpg", { type: "image/jpeg" });
const uploadedUrl = await fal.storage.upload(file);

const result = await fal.subscribe("fal-ai/ltx-video-13b-distilled/multiconditioning", {
  input: {
    prompt: "Your detailed prompt here",
    images: [
      {
        image_url: uploadedUrl,
        start_frame_number: 0,
        strength: 0.8
      }
    ]
  }
});

API Reference

typescript
interface LTXVideoInput {
  prompt: string; // Text description guiding video generation
  negative_prompt?: string; // Elements to avoid (default: "worst quality, inconsistent motion, blurry, jittery, distorted")
  images?: Array<{
    image_url: string; // URL or base64 data URI of conditioning image
    start_frame_number?: number; // Frame where this conditioning begins
    strength?: number; // Influence strength (0.0-1.0)
  }>;
  videos?: Array<{
    video_url: string; // URL or base64 data URI of conditioning video
    start_frame_number?: number; // Frame where this conditioning begins
    strength?: number; // Influence strength (0.0-1.0)
  }>;
  loras?: Array<{
    path: string; // Path to custom LoRA weights
    scale?: number; // LoRA influence scale
  }>;
  resolution?: "480p" | "720p"; // Output resolution (default: "720p")
  aspect_ratio?: "9:16" | "1:1" | "16:9" | "auto"; // Video aspect ratio (default: "auto")
  num_frames?: number; // Total frames to generate (default: 121)
  frame_rate?: number; // Output frame rate (default: 30)
  seed?: number; // Random seed for reproducibility
  first_pass_num_inference_steps?: number; // First pass quality steps (default: 8)
  first_pass_skip_final_steps?: number; // Steps to skip at end of first pass (default: 1)
  second_pass_num_inference_steps?: number; // Second pass refinement steps (default: 8)
  second_pass_skip_initial_steps?: number; // Steps to skip at start of second pass (default: 5)
  expand_prompt?: boolean; // Auto-expand prompt with LLM
  enable_safety_checker?: boolean; // Content safety filtering (default: true)
  reverse_video?: boolean; // Reverse output video playback
  constant_rate_factor?: number; // CRF for input compression (default: 35)
}

interface LTXVideoOutput {
  video: {
    url: string; // URL of the generated video file
  };
  prompt: string; // The final prompt used for generation
}

Pricing and Usage

This model costs $0.04 per video generation. For $1, you can generate approximately 25 videos, making it cost-effective for both experimentation and production workflows. The pricing applies regardless of resolution (480p or 720p), aspect ratio, or frame count within supported limits (up to 121 frames).

For high-volume projects or enterprise deployments, contact sales for enterprise solutions with custom pricing and support options.

View complete pricing details at https://fal.ai/pricing

Support and Resources

We're here to help you succeed with LTX Video-0.9.7 13B Distilled:

Documentation: Explore comprehensive guides and tutorials at fal.ai documentation
API Playground: Test the model interactively and experiment with parameters on the model playground page
API Reference: Access detailed endpoint documentation and code examples at the API documentation
Community Support: Connect with other developers on Discord and share your creations to discover new techniques and applications

Ready to transform your video content with multi-conditional AI generation? Sign up now at https://fal.ai and start creating with LTX Video-0.9.7 13B Distilled.

Get Started

LTX Video-0.9.7 13B Distilled Video to Video