Text to Video APIs
Explore fal's Collection of Text to Video Generation APIs
fal is the best developer-friendly, one-stop shop for AI text-to-video models. Every major video generation model, from Veo 3.1 and Sora 2 to Kling 3.0 Pro and Seedance 2.0, runs through the same SDK and billing system. That means you can benchmark models, swap endpoints, and scale production without managing separate API contracts or rate limit tiers.
How do I generate video from a text prompt on fal?
A single `fal.subscribe` call takes the endpoint and prompt, returning the video URL once generation completes.
Install the client
bashnpm install --save @fal-ai/client
Set your API key
bashexport FAL_KEY="YOUR_API_KEY"
Generate a video
jsimport { fal } from "@fal-ai/client"; const result = await fal.subscribe("fal-ai/veo3.1", { input: { prompt: "A neon-lit alley in Tokyo at midnight, slow tracking shot" } }); console.log(result.data.video.url);
The same call shape works for Seedance 2.0, Kling 3.0 Pro, Sora 2, and the rest of the catalog. Only the endpoint string changes.
Video generations usually take 30 seconds to several minutes, so most production code submits via `fal.queue.submit` and waits on a webhook instead of holding the connection open.
Which fal video models produce cinematic-quality output?
Several models on this page are tuned for cinematic visuals with native audio.
- Veo 3.1 from Google delivers up to 4K resolution with synchronized audio, making it useful for high-end editorial work.
- Seedance 2.0 from ByteDance generates cinematic output with native audio and director-level camera control in a single pass.
- Sora 2 and Sora 2 Pro from OpenAI produce clips up to 20 seconds with synchronized audio at 720p or 1080p.
- Kling 3.0 Pro delivers cinematic visuals with fluid motion and native audio when
`generate_audio`is enabled, plus multi-shot support for scene-level cuts. - Kling 2.5 Turbo Pro is the motion-fluidity option in the Kling family, tuned for action-heavy scenes.
Which models suit short-form social vs longer narrative content?
For short social clips in 9:16 or 1:1, MiniMax Hailuo-02 Standard generates at 768p at one of the lower per-second rates. Wan 2.7 and Pixverse v6 cover the broader catalog of fast generation with scene fidelity. Veo 3.1 Fast and Veo 3 Fast prioritize speed over the flagship Veo variants, useful when iteration matters more than maximum fidelity.
For longer narrative scenes, Sora 2 Pro generates clips up to 20 seconds with synchronized audio, the longest single-pass output on the page. Seedance 2.0 supports multi-shot editing within a single generation, useful when a scene needs cuts without post-production stitching. Kling 3.0 Pro and Standard also support multi-shot generation with native audio.
Which models support native audio and reference-based generation?
Most newer models on this page generate audio alongside video natively.
- Veo 3.1 generates synchronized dialogue and environmental sound, with a
`generate_audio`toggle to skip audio for a lower per-second rate. - Seedance 2.0 generates audio and video together in a single pass via its unified multimodal architecture.
- Sora 2 and Sora 2 Pro generate dialogue lip-sync and ambient audio from the same text prompt.
- Kling 3.0 Pro and Standard include native audio when
`generate_audio`is enabled, with voice output in Chinese and English. - For reference-based generation, Seedance 2.0's reference-to-video endpoint accepts up to 9 images, 3 video clips, and 3 audio files in one call, referenced in the prompt as
`[Image1]`,`[Video1]`,`[Audio1]`.
How is video generation priced on fal?
Most video models on this page are priced per second of generated video, with rates scaling by resolution and whether audio is enabled.
| Model | Price |
|---|---|
| Kling 2.5 Turbo Pro | $0.07 / second |
| Seedance 2.0 (720p with audio) | $0.3034 / second |
| Veo 3.1 (720p/1080p with audio) | $0.40 / second |
With fal, there are no subscriptions and no minimum spend required. Credits are drawn down per generation.