LTX Video 2.0 Fast Text to Video

fal-ai/ltx-2/text-to-video/fast
Create high-fidelity video with audio from text with LTX-2 Fast
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.04 per second for 1080p, $0.08 per second for 1440p or $0.16 per second for 2160p.

Logs

LTX Video 2.0 Fast | [text-to-video]

Lightricks' LTXV 2.0 Fast delivers 6-20 second video generation with synchronized audio at $0.04 per second (1080p). Trading maximum resolution flexibility for speed optimization, this model runs 30x faster than traditional diffusion approaches through multiscale rendering architecture. Built for rapid prototyping workflows where iteration speed and cost efficiency outweigh 4K output requirements.

Use Cases: Social media content creation | Advertising concept testing | Rapid video prototyping


Performance

At $0.04-$0.16 per second depending on resolution, LTXV 2.0 Fast positions as a cost-optimized endpoint within the LTX Video 2.0 family, trading render time for accessibility in high-volume production environments.

MetricResultContext
Generation Speed30x faster than traditional diffusionMultiscale rendering vs standard latent diffusion
Duration Range6-20 secondsDurations over 10s require 25 FPS and 1080p
Cost per Second$0.04 (1080p), $0.08 (1440p), $0.16 (2160p)25 generations per $1.00 at 1080p/6s baseline
Audio GenerationNative synchronized audioIntegrated audio synthesis, no post-processing
Related EndpointsLTX Video 2.0 ProPro variant for extended duration and higher fidelity

Speed-Optimized Video Synthesis with Audio

LTXV 2.0 Fast uses multiscale rendering to generate video frames at multiple resolutions simultaneously, then combines them during final output. This contrasts with traditional latent diffusion models that process each frame sequentially at full resolution.

What this means for you:

  • Synchronized Audio Generation: Native audio synthesis eliminates separate sound design workflows. Prompt "cinematic depth, western mood" and receive matched audio automatically

  • Flexible Duration Control: Generate 6-20 second clips with frame-level precision at 25 or 50 FPS, though longer durations (12-20s) lock to 25 FPS at 1080p

  • Resolution Scaling: Choose 1080p, 1440p, or 2160p output with proportional cost scaling. 1440p doubles cost to $0.08/second, 4K quadruples to $0.16/second

  • 16:9 Native Output: Fixed aspect ratio optimized for standard video platforms, reducing post-production cropping for YouTube, Instagram Reels, and TikTok formats


Technical Specifications

SpecDetails
ArchitectureLTXV 2.0 Fast
Input FormatsText prompts (natural language)
Output FormatsMP4 video with synchronized audio
Resolution Options1080p, 1440p, 2160p at 16:9 aspect ratio
Frame Rates25 FPS (all durations), 50 FPS (6-10s only)
Duration Range6-20 seconds (over 10s requires 25 FPS and 1080p)
LicenseCommercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

LTX Video 2.0 Pro ($0.10/second at 1080p) – LTX Video 2.0 Fast trades extended duration capabilities and higher base fidelity for 2.5x cost efficiency at standard resolutions ($0.04 vs $0.10). Pro supports longer clips beyond 20 seconds and maintains quality at extended durations, ideal for narrative content where runtime flexibility justifies increased cost.

Hunyuan Video V1.5 (see pricing) – LTXV 2.0 Fast prioritizes integrated audio generation and rapid iteration speed through multiscale rendering. Hunyuan emphasizes maximum visual fidelity and complex scene composition for workflows where render time is secondary to output quality.

LongCat Video (see pricing) – LTXV 2.0 Fast offers native 1080p and higher output with synchronized audio at competitive pricing. LongCat targets 720p workflows optimized for mobile-first content where file size and bandwidth efficiency outweigh resolution requirements.