LTX Video 2.0 Pro Text to Video

fal-ai/ltx-2/text-to-video
Create high-fidelity video with audio from text with LTX-2 Pro.
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.06 per second for 1080p, $0.12 per second for 1440p or $0.24 per second for 2160p.

Logs

LTX Video 2.0 Pro | [text-to-video]

Lightricks' LTXV-13B delivers high-fidelity video with synchronized audio at $0.06-$0.24 per second, processing 30x faster than comparable video generation systems through multiscale rendering. The model makes professional video production accessible on standard hardware without requiring enterprise GPU setups, targeting marketing teams and content creators who need 4K output at production scale.

Use Cases: Marketing content creation | Social media video production | Concept visualization and prototyping


Performance

LTX Video 2.0 Pro positions as a premium text-to-video solution with tiered pricing that scales with resolution needs, roughly 10x more cost-effective than enterprise alternatives when factoring in infrastructure requirements.

MetricResultContext
Processing Speed30x faster than competitorsMultiscale rendering processes structure before detail vs traditional frame-by-frame diffusion
Cost per Second$0.06 (1080p) / $0.12 (1440p) / $0.24 (2160p)16.7, 8.3, or 4.2 seconds per $1.00 respectively
Max Duration6-10 secondsConfigurable in 2-second increments
Resolution Range1080p to 2160p (4K)16:9 aspect ratio at 25-50 fps
Audio GenerationSynchronized soundtrackAutomatic audio synthesis included by default
Related EndpointsLTX Video 2.0 FastSpeed-optimized variant for rapid iteration workflows

Cinematic Quality at Production Scale

LTXV-13B uses multiscale rendering to generate video progressively, starting with low-resolution structure and refining detail in passes rather than processing every frame at full resolution simultaneously. This architectural choice means you get 4K output without the typical GPU memory bottlenecks.

What this means for you:

  • Synchronized audio-visual output: Generate video with matching soundtrack in a single inference, no separate audio generation or manual syncing required

  • Resolution flexibility: Scale from 1080p for social content ($0.06/sec) to 4K for broadcast-quality output ($0.24/sec) based on actual delivery requirements

  • Extended duration control: Configure 6, 8, or 10-second sequences with frame rate options at 25 or 50 fps for different motion quality needs

  • Hardware accessibility: Run professional video generation on consumer GPUs through efficient multiscale processing instead of requiring enterprise infrastructure


Technical Specifications

SpecDetails
ArchitectureLTXV-13B
Input FormatsText prompts with cinematic descriptors
Output FormatsMP4 video with audio (video/mp4)
Resolution Options1080p, 1440p, 2160p at 16:9 aspect ratio
Frame Rate25 fps or 50 fps configurable
Duration Range6-10 seconds per generation
LicenseCommercial use permitted (Partner tier)

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

LTX Video 2.0 Fast – LTX Video 2.0 Pro trades inference speed for maximum resolution and audio capabilities at higher cost per second. The Fast variant prioritizes rapid iteration for preview workflows where 4K output and synchronized audio aren't required.

Hunyuan Video V1.5 – LTX Video 2.0 Pro emphasizes hardware efficiency through multiscale rendering for consumer GPU deployment. Hunyuan Video V1.5 focuses on extended duration capabilities and different motion dynamics for narrative-driven video sequences.

Sora 2 (OpenAI) – LTX Video 2.0 Pro delivers 4K output with audio in 6-10 second sequences optimized for marketing and social content. Sora targets longer-form video up to 60 seconds with emphasis on physical world simulation for complex scene understanding.