Veo 3.1 (Text to Video) API on fal

Run Veo 3.1 AI Video Generation API on fal

Cinema-Quality Video. With Sound.

Veo 3.1 is now available on fal. Google DeepMind's flagship video generation model with true 4K output, native audio, and a versatile set of generation modes for every workflow.

What's New in 3.1 vs Veo 3

Veo 3 (May 2025) introduced native audio generation — the feature that set it apart from every other AI video model. Veo 3.1 is a refinement of that foundation:

	Veo 3	Veo 3.1
Native audio	✅	✅
Dialogue & lip sync	✅	✅ Improved
A/V sync accuracy	Good	Better
Reference image adherence	Basic	Stronger
Multi-shot narrative control	Limited	Improved
Best for	Single cinematic shots	Dialogue scenes, multi-shot sequences

Bottom line: For one-off hero shots, Veo 3 and 3.1 are comparable. If you care about dialogue clarity, audio-visual sync, or building longer narratives via `extend-video`, 3.1 is the practical upgrade.

Key Features

4K Resolution

The first mainstream AI video model to support true 4K output. Generate at 720p, 1080p, or 4K with aspect ratios of 16:9 or 9:16 at 24 FPS. Every frame is sharp enough for professional delivery.

Native Audio

Generate rich audio alongside your video — natural dialogue with lip sync, ambient sound effects, and music — in multiple languages. Audio can be enabled or disabled per request. No post-production audio work needed.

Versatile Modes

Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Standard and Fast tiers for every mode give you the right speed-quality tradeoff.

Video Length & Extension

Each generation produces up to 8 seconds of video. Use the `extend-video` endpoint to chain extensions — up to 7 seconds per step, 20 steps maximum — enabling a total output of roughly 148 seconds (~2.5 minutes) from a single starting clip.

SynthID Watermarking

All videos generated with Veo 3.1 are invisibly watermarked with SynthID, Google's AI content detection technology. Watermarks are embedded in the video data and survive most re-encoding. This is relevant for teams with content disclosure requirements or enterprise compliance workflows. The watermark cannot be disabled.

Endpoints

Mode	Endpoint	Price
Text to Video	`fal-ai/veo3.1`	From $0.20/s
Text to Video (Fast)	`fal-ai/veo3.1/fast`	From $0.10/s
Image to Video	`fal-ai/veo3.1/image-to-video`	From $0.20/s
Image to Video (Fast)	`fal-ai/veo3.1/fast/image-to-video`	From $0.10/s
First/Last Frame to Video	`fal-ai/veo3.1/first-last-frame-to-video`	From $0.20/s
First/Last Frame to Video (Fast)	`fal-ai/veo3.1/fast/first-last-frame-to-video`	From $0.10/s
Reference to Video	`fal-ai/veo3.1/reference-to-video`	From $0.20/s
Extend Video	`fal-ai/veo3.1/extend-video`	From $0.20/s
Extend Video (Fast)	`fal-ai/veo3.1/fast/extend-video`	From $0.10/s

Detailed Pricing

Standard tier

Resolution	Without Audio	With Audio
720p / 1080p	$0.20/s	$0.40/s
4K	$0.40/s	$0.60/s

Fast tier

Resolution	Without Audio	With Audio
720p / 1080p	$0.10/s	$0.15/s
4K	$0.30/s	$0.35/s

Example: A 5-second 1080p video with audio costs $2.00 on Standard or $0.75 on Fast.

Veo 3.1 Model Tiers

The Veo 3.1 family has three tiers available across platforms:

Tier	Audio	Max Resolution	Best For
Standard	✅	4K	Production quality, cinematic output
Fast	✅	4K	Rapid iteration, prototyping
Light	❌	720p	High-volume, cost-sensitive workflows

Light is a budget tier available on Google's own platforms (not currently on fal.ai). It generates silent 720p clips at significantly lower cost — suited for ad variant testing, synthetic training data generation, or any workflow where volume matters more than quality ceiling.

Real-World Use Cases

Marketing & Advertising eToro produced 15 fully AI-generated versions of a single ad, each localized into a different language. Canva uses Veo to let users generate marketing and social media videos directly from its platform.

Film & Pre-Production Promise Studios uses Veo 3.1 for AI-powered storyboarding and previsualization. Razorfish took a campaign from script to near-cinematic video in a fraction of traditional production time.

Gaming & Interactive Media Volley powers in-game cinematics and dynamically generated narrative assets with Veo 3.1. OpusClip generates promotional motion graphics at scale for SMBs.

Enterprise Video Synthesia integrates Veo to generate contextually adaptive visuals alongside its AI avatars for personalized enterprise video content.

Prompting Best Practices

Veo 3.1 was trained specifically on video content with native audio — which makes it respond differently from image-to-video tools. Vague prompts produce vague results.

The Cinematic Formula


[Shot type] + [Subject] + [Action] + [Environment] + [Style/Mood] + [Audio cues]

Example:


Slow drone arc around a lone lighthouse at dusk, waves crashing against rocky cliffs,
golden-hour light, cinematic grain, 70mm lens feel, sound of distant foghorn and
breaking surf, melancholic tone.

Tips

Be specific about camera movement — "slow zoom," "handheld follow," "locked-off wide shot" all produce meaningfully different results
Describe audio explicitly — don't rely on Veo to infer it; state whether you want ambient sound, music, dialogue, or silence
Keep dialogue short — write conversational lines that fit within an 8-second reading window; long speeches get cut off or rushed
Specify lighting and mood — "overcast," "golden hour," "neon-lit," "chiaroscuro" all guide the model toward cinematic intent
Use film grammar — terms like "rack focus," "dolly in," "establishing shot," and "match cut" are understood
For multi-shot sequences — use consistent descriptors (same character description, same environment lighting) across prompts when chaining clips via `extend-video`

Example Prompts

Cinematic motion & hyper-realistic lighting

The white Lamborghini Countach drifts sharply around a corner and slides into a perfect park on a sunlit city street, smoke and tire screech filling the air, camera panning fast with cinematic motion blur, dust particles and heat haze, dynamic reflections on the car, hyper-realistic lighting, upbeat and energetic vibe.

Natural dialogue & character interaction

The man puts the net down as he turns and speaks to his apprentice, saying 'without patience, one cannot fish, and without fish, one will die' and then he smiles.

Camera movement & ambient audio

The camera pans around the house, mysterious music playing.

Epic scale & aerial cinematography

Slow drone shot around the colosseum as the naval battle takes place.

How Veo 3.1 Compares

	Veo 3.1	Sora 2	Kling 2.x	Runway Gen-4
Max resolution	4K	1080p	4K (premium)	720p native
Native audio	✅	✅	❌	❌
Frame rate	24 FPS	24–30 FPS	30 FPS	24 FPS
Max clip length	8s (extendable to ~148s)	20s	Up to 3 min	16s
API access	✅ fal.ai, Vertex AI	Limited	✅	✅
Watermarking	SynthID	C2PA	Varies	Varies
Best at	Cinematic quality, audio, 4K	Human motion, physics realism	Volume, identity consistency	Creative control, fast iteration

Quick Start (Python)

python
import fal_client

result = fal_client.run(
    "fal-ai/veo3.1",
    arguments={
        "prompt": "Cinematic drone shot over misty mountains",
        "resolution": "1080p",
        "audio": True,
    }
)

# result.video.url → your generated video

For Developers

fal.ai handles all infrastructure:

Serverless — scales to zero, scales to millions
Pay per second — no minimums
Python and JavaScript SDKs, plus REST API
No GPUs to manage

→ API Documentation | Get API Key

FAQ

What can I create with Veo 3.1? Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Supports 720p, 1080p, and 4K at 16:9 or 9:16. Videos up to 8 seconds per generation, extendable up to ~148 seconds total via `extend-video`.

What's the difference between Standard and Fast? Both tiers support all modes. Standard delivers higher visual and audio quality. Fast is optimized for speed and iteration. Both are available for every endpoint variant.

How does native audio work? Veo 3.1 generates synchronized audio alongside video — dialogue with lip sync, sound effects, ambient noise, and music. Audio can be enabled or disabled per request, and supports natural conversations in multiple languages.

What resolutions does Veo 3.1 support? 720p, 1080p, and 4K — the first mainstream AI video model with true 4K output. Available in 16:9 and 9:16 formats at 24 FPS.

How much does Veo 3.1 cost on fal.ai? Pay-per-second with no minimums. Standard: $0.20/s (720p/1080p) or $0.40/s (4K) without audio; $0.40/s or $0.60/s with audio. Fast: $0.10/s (720p/1080p) or $0.30/s (4K) without audio; $0.15/s or $0.35/s with audio.

How do I get started with the API? Install the fal.ai SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in three lines of code. The API is serverless — no GPUs to manage. See the API documentation for all parameters.

Can I use Veo 3.1 for commercial projects? Yes. Videos generated through the fal.ai API can be used in commercial projects. Check fal.ai's terms of service for full details.

Are my videos watermarked? Yes. All Veo 3.1 outputs are invisibly watermarked with Google's SynthID technology. The watermark persists through re-encoding and cannot be disabled.

How do I make videos longer than 8 seconds? Use the `extend-video` endpoint to add up to 7 seconds per extension step, up to 20 times — enabling roughly 148 seconds of total output from one starting clip.

What content will Veo 3.1 refuse to generate? The model blocks graphic violence, explicit content, characters resembling real individuals or celebrities, and content that promotes harm.

How does Veo 3.1 compare to Veo 3? Veo 3.1 improves on dialogue clarity, audio-visual sync, and reference image adherence. For single cinematic shots, the difference is subtle. For dialogue-heavy scenes or multi-shot sequences, 3.1 is noticeably better.

Is fal.ai the only way to access Veo 3.1? No. Veo 3.1 is also available through Google AI Studio, Vertex AI, the Gemini app, and Google Flow. fal.ai offers the simplest pay-per-second API access without needing a Google Cloud account.

fal-ai/veo3.1

Input

Result

What would you like to do next?

Logs