LTX-2 Video Trainer User Guide

Training Custom Video Styles

The LTX-2 Video Trainer enables developers to create custom video generation models without managing GPU infrastructure. The trainer produces LoRA (Low-Rank Adaptation) weights that teach the base LTX-2 model specific visual styles, motion patterns, or domain-specific effects.¹

LTX-2 is Lightricks' 19-billion parameter Diffusion Transformer for audiovisual generation, with 14 billion parameters dedicated to video processing.² Rather than retraining this entire model, the trainer uses LoRA to inject trainable rank decomposition matrices while keeping base weights frozen. This reduces trainable parameters by orders of magnitude while preserving generation quality.

The trainer teaches the model to recognize and reproduce specific visual characteristics associated with your trigger phrase. Training completes in 20-40 minutes depending on dataset size and parameter configuration, costing $0.0048 per step (approximately $9.60 for the default 2000 steps).

Dataset Preparation

Dataset quality determines output quality. Prepare your training data before writing any code.

Requirements:

10-50 video files demonstrating the transformation you want to learn
Consistent format: use only videos (not mixed with images)
Supported formats: .mp4, .mov, .avi, .mkv
ZIP archive hosted at a publicly accessible URL

Quality criteria for effective training:

Videos should clearly demonstrate the target style or motion pattern
Maintain consistent quality across samples (similar resolution, lighting conditions)
Include variation in subjects while preserving the style you want to teach
Avoid corrupted files, extreme compression artifacts, or watermarks
Aim for 3-10 second clips that capture the essence of your desired output

Optional caption files (e.g., video001.txt paired with video001.mp4) provide additional guidance during training. For style transfers, captions describing the visual characteristics help the model associate your trigger phrase with specific attributes.

Two preprocessing parameters help with non-uniform datasets:

auto_scale_input: Enable when training videos have varying frame counts or rates; the trainer resamples to match target parameters
split_input_into_scenes: Enable for long training videos; splits content into separate scenes, multiplying your effective training samples

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Authentication and Setup

Generate an API key from your fal dashboard. Install the client library:

pip install fal-client

Configure authentication:

export FAL_KEY="your-api-key-here"

Training Request

Submit a training job with your dataset URL and configuration:

import fal_client

result = fal_client.subscribe(
    "fal-ai/ltx2-video-trainer",
    arguments={
        "training_data_url": "https://your-storage.com/training-videos.zip",
        "trigger_phrase": "in mystyle",
        "number_of_steps": 2000,
        "learning_rate": 0.0002,
        "rank": 32,
        "number_of_frames": 89,
        "frame_rate": 25,
        "resolution": "medium",
        "aspect_ratio": "1:1"
    },
    with_logs=True
)

The subscribe method blocks until training completes, streaming progress logs. For web applications requiring non-blocking execution, use fal_client.submit() and poll with fal_client.status().

Parameter Reference

Parameter	Default	Options	Guidance
rank	32	8, 16, 32, 64, 128	Higher values capture more detail; 32 balances quality and training time
number_of_steps	2000	100-20000	Increase for complex transformations; decrease for simple style transfers
learning_rate	0.0002	0.000001-1	Reduce if you see artifacts; increase if the model fails to capture your style
number_of_frames	89	9-121	Must satisfy frames % 8 == 1; 89 captures ~3.5 seconds at 25fps
resolution	medium	low, medium, high	Higher resolution increases training time significantly
aspect_ratio	1:1	16:9, 1:1, 9:16	Match your intended output format

Trigger phrase: Choose a distinctive string unlikely to appear in normal prompts (e.g., "in mystyle", "BRANDNAME effect"). This phrase activates your LoRA during inference.

Training Output

Successful training returns:

{
  "lora_weights_url": "https://fal-cdn.com/files/...",
  "config_file_url": "https://fal-cdn.com/files/...",
  "training_metrics": {
    "final_loss": 0.0234,
    "steps_completed": 2000
  }
}

Final loss values between 0.01-0.05 indicate successful training. Values above 0.1 suggest the model struggled to learn your dataset; consider increasing steps, adjusting learning rate, or improving dataset quality. Download the LoRA weights immediately as CDN URLs are temporary.

Using Trained LoRA for Inference

Apply your trained LoRA to generate videos using the LTX-2 inference endpoints. Multiple endpoints support custom LoRAs:

Text-to-video: fal-ai/ltx-2-19b/text-to-video/lora
Image-to-video: fal-ai/ltx-2-19b/image-to-video/lora
Video-to-video: fal-ai/ltx-2-19b/video-to-video/lora

Pass your LoRA weights URL in the loras parameter with a scale value (0.0-2.0) controlling influence strength. Start at 1.0 and adjust based on results: lower values for subtle effects, higher for stronger style application.

Distilled variants (fal-ai/ltx-2-19b/distilled/...) offer faster inference at slightly reduced quality, useful for iterative testing before final renders. Include your trigger phrase in the prompt to activate the trained style.

Error Handling

Common failure modes and solutions:

Invalid dataset format: Mixed videos and images, unsupported codecs, or corrupted files. Validate your ZIP structure before submitting.
Memory exhaustion: Resolution or frame count exceeds available resources. Reduce resolution or decrease number_of_frames.
URL access issues: Training data URL not publicly accessible. Verify the URL works in an incognito browser window.
Parameter validation: Frame count not satisfying modulo constraint. Valid values: 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121.

from fal_client.exceptions import FalClientException

try:
    result = fal_client.subscribe("fal-ai/ltx2-video-trainer", arguments={...})
except FalClientException as e:
    print(f"Training failed: {e}")

Production Deployment

For production systems, configure webhooks rather than polling:

handle = fal_client.submit(
    "fal-ai/ltx2-video-trainer",
    arguments={...},
    webhook_url="https://your-api.com/training-complete"
)

Store training metadata alongside LoRA weights for reproducibility and debugging. Download weights to your own infrastructure immediately after training completes; do not rely on temporary CDN URLs for long-term storage.

Cost and Performance

Configuration	Training Time	Cost
1000 steps, low resolution	~10 minutes	$4.80
2000 steps, medium resolution	~25 minutes	$9.60
3000 steps, high resolution	~40 minutes	$14.40

Actual training time varies based on queue position and dataset complexity. Start with lower step counts (1000-1500) for initial experiments, then scale up for production training runs once you validate your approach.

Further Resources

The LTX-2 ecosystem includes temporal upscalers and audio synchronization capabilities. Monitor the fal documentation for new inference endpoints and optimization techniques. For complex video generation workflows, consider combining trained LoRAs with camera control LoRAs available through the inference endpoints.