- Text To Video
- Pro
Endpoint:
POST https://fal.run/fal-ai/sora-2/text-to-video
Endpoint ID: fal-ai/sora-2/text-to-videoTry it in the Playground
Run this model interactively with your own prompts.
Quick Start
Input Schema
The text prompt describing the video you want to generate
The resolution of the generated video Default value:
"720p"The aspect ratio of the generated video Default value:
"16:9"Possible values: 9:16, 16:9Duration of the generated video in seconds Default value:
"4"Possible values: 4, 8, 12, 16, 20Whether to delete the video after generation for privacy reasons. If True, the video cannot be used for remixing and will be permanently deleted. Default value:
trueThe model to use for the generation. When the default model is selected, the latest snapshot of the model will be used - otherwise, select a specific snapshot of the model. Default value:
"sora-2"Possible values: sora-2, sora-2-2025-12-08, sora-2-2025-10-06If enabled, the prompt (and image for image-to-video) will be checked for known intellectual property references and the request will be blocked if any are detected.
Up to two character IDs (from create-character) to use in the video. Refer to characters by name in the prompt. When set, only the OpenAI provider is used.
Output Schema
The generated video
The ID of the generated video
Thumbnail image for the video
Spritesheet image for the video
Input Example
Output Example
Performance
At 0.30/second for 720p), Sora 2 Pro is a premium text-to-video solution, trading cost efficiency for industry-leading output length and native audio synthesis.| Metric | Result | Context |
|---|---|---|
| Maximum Duration | Up to 25 seconds | 2.5x longer than most competing models (10s standard) |
| Resolution Options | 720p, 1080p | Two quality tiers with tiered pricing |
| Cost per Second | 0.50 (1080p) | Premium pricing for audio-enabled, extended-duration output |
| Aspect Ratios | 9:16, 16:9 | Vertical and horizontal formats for social and cinematic use |
| Audio Synthesis | Native audio generation | Synchronized dialogue, ambient sound, and environmental audio |
| Related Endpoints | Sora 2 Video to Video, Sora 2 Text to Video | Pro vs Standard tiers and remix capabilities |
Audio-First Video Generation
Sora 2 Pro breaks from traditional silent video generation by synthesizing audio alongside visual content. Dialogue lip-syncing, environmental sounds, and ambient audio emerge from the same text prompt that describes the scene. What this means for you:- Synchronized dialogue generation: Characters speak naturally with accurate lip-sync to match emotional tone and scene context, no separate audio track required
- Environmental audio integration: Ambient sounds (wind, traffic, footsteps) generate contextually based on visual elements described in your prompt
- Extended temporal coherence: 25-second maximum duration maintains visual and audio consistency across longer narrative arcs than standard 4-10 second models
- Flexible duration control: Generate 4, 8, or 12-second clips for rapid iteration, or push to 25 seconds for complete scene development
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Sora 2 Pro |
| Input Formats | Text prompts (natural language descriptions) |
| Output Formats | MP4 video with audio, optional thumbnail and spritesheet |
| Resolution Options | 720p (0.50/s) |
| Duration Range | 4, 8, 12 seconds (standard), up to 25 seconds (extended) |
| Aspect Ratios | 9:16 (vertical), 16:9 (horizontal) |
| License | Commercial use via OpenAI API or fal credits |
How It Stacks Up
Sora 2 Text to Video (Standard) – Sora 2 Pro trades cost efficiency for extended duration and audio synthesis at premium pricing. Standard Sora 2 remains ideal for rapid prototyping and shorter clips where audio isn’t required. Hunyuan Video V1.5 Text to Video – Sora 2 Pro prioritizes audio integration and extended temporal coherence (up to 25s) for narrative-driven content. Hunyuan Video V1.5 emphasizes cost-effective generation for standard-length clips without audio requirements. LongCat Video Text to Video – Sora 2 Pro delivers native audio synthesis and dialogue lip-syncing for production-ready scenes. LongCat Video focuses on visual-only generation with competitive pricing for silent video workflows.Related
- Sora 2 — Video Generation
Limitations
aspect_ratiorestricted to:9:16,16:9durationrestricted to:4,8,12,16,20modelrestricted to:sora-2,sora-2-2025-12-08,sora-2-2025-10-06resolutionrestricted to:720p,1080p,true_1080p