Run the latest models all in one Sandbox 🏖️

LTX-2 Video Trainer User Guide

Explore all models

Train custom LTX-2 video generation models in 30 minutes via API. Upload 10-50 training videos demonstrating your target style or motion, configure LoRA rank and training steps, then apply the resulting weights to LTX-2 inference endpoints.

last updated
1/14/2026
edited by
Zachary Roth
read time
6 minutes
LTX-2 Video Trainer User Guide

Training Custom Video Styles

The LTX-2 Video Trainer enables developers to create custom video generation models without managing GPU infrastructure. The trainer produces LoRA (Low-Rank Adaptation) weights that teach the base LTX-2 model specific visual styles, motion patterns, or domain-specific effects.1

LTX-2 is Lightricks' 19-billion parameter Diffusion Transformer for audiovisual generation, with 14 billion parameters dedicated to video processing.2 Rather than retraining this entire model, the trainer uses LoRA to inject trainable rank decomposition matrices while keeping base weights frozen. This reduces trainable parameters by orders of magnitude while preserving generation quality.

The trainer teaches the model to recognize and reproduce specific visual characteristics associated with your trigger phrase. Training completes in 20-40 minutes depending on dataset size and parameter configuration, costing $0.0048 per step (approximately $9.60 for the default 2000 steps).

Dataset Preparation

Dataset quality determines output quality. Prepare your training data before writing any code.

Requirements:

  • 10-50 video files demonstrating the transformation you want to learn
  • Consistent format: use only videos (not mixed with images)
  • Supported formats: .mp4, .mov, .avi, .mkv
  • ZIP archive hosted at a publicly accessible URL

Quality criteria for effective training:

  • Videos should clearly demonstrate the target style or motion pattern
  • Maintain consistent quality across samples (similar resolution, lighting conditions)
  • Include variation in subjects while preserving the style you want to teach
  • Avoid corrupted files, extreme compression artifacts, or watermarks
  • Aim for 3-10 second clips that capture the essence of your desired output

Optional caption files (e.g., video001.txt paired with video001.mp4) provide additional guidance during training. For style transfers, captions describing the visual characteristics help the model associate your trigger phrase with specific attributes.

Two preprocessing parameters help with non-uniform datasets:

  • auto_scale_input: Enable when training videos have varying frame counts or rates; the trainer resamples to match target parameters
  • split_input_into_scenes: Enable for long training videos; splits content into separate scenes, multiplying your effective training samples

falMODEL APIs

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

falSERVERLESS

Scale custom models and apps to thousands of GPUs instantly

falCOMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Authentication and Setup

Generate an API key from your fal dashboard. Install the client library:

pip install fal-client

Configure authentication:

export FAL_KEY="your-api-key-here"

Training Request

Submit a training job with your dataset URL and configuration:

import fal_client

result = fal_client.subscribe(
    "fal-ai/ltx2-video-trainer",
    arguments={
        "training_data_url": "https://your-storage.com/training-videos.zip",
        "trigger_phrase": "in mystyle",
        "number_of_steps": 2000,
        "learning_rate": 0.0002,
        "rank": 32,
        "number_of_frames": 89,
        "frame_rate": 25,
        "resolution": "medium",
        "aspect_ratio": "1:1"
    },
    with_logs=True
)

The subscribe method blocks until training completes, streaming progress logs. For web applications requiring non-blocking execution, use fal_client.submit() and poll with fal_client.status().

Parameter Reference

ParameterDefaultOptionsGuidance
rank328, 16, 32, 64, 128Higher values capture more detail; 32 balances quality and training time
number_of_steps2000100-20000Increase for complex transformations; decrease for simple style transfers
learning_rate0.00020.000001-1Reduce if you see artifacts; increase if the model fails to capture your style
number_of_frames899-121Must satisfy frames % 8 == 1; 89 captures ~3.5 seconds at 25fps
resolutionmediumlow, medium, highHigher resolution increases training time significantly
aspect_ratio1:116:9, 1:1, 9:16Match your intended output format

Trigger phrase: Choose a distinctive string unlikely to appear in normal prompts (e.g., "in mystyle", "BRANDNAME effect"). This phrase activates your LoRA during inference.

Training Output

Successful training returns:

{
  "lora_weights_url": "https://fal-cdn.com/files/...",
  "config_file_url": "https://fal-cdn.com/files/...",
  "training_metrics": {
    "final_loss": 0.0234,
    "steps_completed": 2000
  }
}

Final loss values between 0.01-0.05 indicate successful training. Values above 0.1 suggest the model struggled to learn your dataset; consider increasing steps, adjusting learning rate, or improving dataset quality. Download the LoRA weights immediately as CDN URLs are temporary.

Using Trained LoRA for Inference

Apply your trained LoRA to generate videos using the LTX-2 inference endpoints. Multiple endpoints support custom LoRAs:

  • Text-to-video: fal-ai/ltx-2-19b/text-to-video/lora
  • Image-to-video: fal-ai/ltx-2-19b/image-to-video/lora
  • Video-to-video: fal-ai/ltx-2-19b/video-to-video/lora

Pass your LoRA weights URL in the loras parameter with a scale value (0.0-2.0) controlling influence strength. Start at 1.0 and adjust based on results: lower values for subtle effects, higher for stronger style application.

Distilled variants (fal-ai/ltx-2-19b/distilled/...) offer faster inference at slightly reduced quality, useful for iterative testing before final renders. Include your trigger phrase in the prompt to activate the trained style.

Error Handling

Common failure modes and solutions:

  • Invalid dataset format: Mixed videos and images, unsupported codecs, or corrupted files. Validate your ZIP structure before submitting.
  • Memory exhaustion: Resolution or frame count exceeds available resources. Reduce resolution or decrease number_of_frames.
  • URL access issues: Training data URL not publicly accessible. Verify the URL works in an incognito browser window.
  • Parameter validation: Frame count not satisfying modulo constraint. Valid values: 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121.
from fal_client.exceptions import FalClientException

try:
    result = fal_client.subscribe("fal-ai/ltx2-video-trainer", arguments={...})
except FalClientException as e:
    print(f"Training failed: {e}")

Production Deployment

For production systems, configure webhooks rather than polling:

handle = fal_client.submit(
    "fal-ai/ltx2-video-trainer",
    arguments={...},
    webhook_url="https://your-api.com/training-complete"
)

Store training metadata alongside LoRA weights for reproducibility and debugging. Download weights to your own infrastructure immediately after training completes; do not rely on temporary CDN URLs for long-term storage.

Cost and Performance

ConfigurationTraining TimeCost
1000 steps, low resolution~10 minutes$4.80
2000 steps, medium resolution~25 minutes$9.60
3000 steps, high resolution~40 minutes$14.40

Actual training time varies based on queue position and dataset complexity. Start with lower step counts (1000-1500) for initial experiments, then scale up for production training runs once you validate your approach.

Further Resources

The LTX-2 ecosystem includes temporal upscalers and audio synchronization capabilities. Monitor the fal documentation for new inference endpoints and optimization techniques. For complex video generation workflows, consider combining trained LoRAs with camera control LoRAs available through the inference endpoints.

Recently Added

References

  1. Hu, E.J., et al. "LoRA: Low-Rank Adaptation of Large Language Models." arXiv preprint arXiv:2106.09685, 2021. https://arxiv.org/abs/2106.09685

  2. HaCohen, Y., et al. "LTX-2: Efficient Joint Audio-Visual Foundation Model." arXiv preprint arXiv:2601.03233, 2025. https://arxiv.org/abs/2601.03233

about the author
Zachary Roth
A generative media engineer with a focus on growth, Zach has deep expertise in building RAG architecture for complex content systems.

Related articles