fal-ai/veo3.1
Input
Customize your input with more control.
Result
What would you like to do next?
For every second of video you generate you will be charged $0.20 without audio or $0.40 with audio for 720p or 1080p. At 4k, you will be charged $0.40 per second without audio, or $0.60 with. For example, a 5 second video at 1080p with audio on will cost $2.00.
Logs
Run Veo 3.1 AI Video Generation API on fal
Cinema-Quality Video. With Sound.
Veo 3.1 is now available on fal. Google DeepMind's flagship video generation model with true 4K output, native audio, and a versatile set of generation modes for every workflow.
What's New in 3.1 vs Veo 3
Veo 3 (May 2025) introduced native audio generation โ the feature that set it apart from every other AI video model. Veo 3.1 is a refinement of that foundation:
| Veo 3 | Veo 3.1 | |
|---|---|---|
| Native audio | โ | โ |
| Dialogue & lip sync | โ | โ Improved |
| A/V sync accuracy | Good | Better |
| Reference image adherence | Basic | Stronger |
| Multi-shot narrative control | Limited | Improved |
| Best for | Single cinematic shots | Dialogue scenes, multi-shot sequences |
Bottom line: For one-off hero shots, Veo 3 and 3.1 are comparable. If you care about dialogue clarity, audio-visual sync, or building longer narratives via
`extend-video`, 3.1 is the practical upgrade.
Key Features
4K Resolution
The first mainstream AI video model to support true 4K output. Generate at 720p, 1080p, or 4K with aspect ratios of 16:9 or 9:16 at 24 FPS. Every frame is sharp enough for professional delivery.
Native Audio
Generate rich audio alongside your video โ natural dialogue with lip sync, ambient sound effects, and music โ in multiple languages. Audio can be enabled or disabled per request. No post-production audio work needed.
Versatile Modes
Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Standard and Fast tiers for every mode give you the right speed-quality tradeoff.
Video Length & Extension
Each generation produces up to 8 seconds of video. Use the `extend-video` endpoint to chain extensions โ up to 7 seconds per step, 20 steps maximum โ enabling a total output of roughly 148 seconds (~2.5 minutes) from a single starting clip.
SynthID Watermarking
All videos generated with Veo 3.1 are invisibly watermarked with SynthID, Google's AI content detection technology. Watermarks are embedded in the video data and survive most re-encoding. This is relevant for teams with content disclosure requirements or enterprise compliance workflows. The watermark cannot be disabled.
Endpoints
| Mode | Endpoint | Price |
|---|---|---|
| Text to Video | `fal-ai/veo3.1` | From $0.20/s |
| Text to Video (Fast) | `fal-ai/veo3.1/fast` | From $0.10/s |
| Image to Video | `fal-ai/veo3.1/image-to-video` | From $0.20/s |
| Image to Video (Fast) | `fal-ai/veo3.1/fast/image-to-video` | From $0.10/s |
| First/Last Frame to Video | `fal-ai/veo3.1/first-last-frame-to-video` | From $0.20/s |
| First/Last Frame to Video (Fast) | `fal-ai/veo3.1/fast/first-last-frame-to-video` | From $0.10/s |
| Reference to Video | `fal-ai/veo3.1/reference-to-video` | From $0.20/s |
| Extend Video | `fal-ai/veo3.1/extend-video` | From $0.20/s |
| Extend Video (Fast) | `fal-ai/veo3.1/fast/extend-video` | From $0.10/s |
Detailed Pricing
Standard tier
| Resolution | Without Audio | With Audio |
|---|---|---|
| 720p / 1080p | $0.20/s | $0.40/s |
| 4K | $0.40/s | $0.60/s |
Fast tier
| Resolution | Without Audio | With Audio |
|---|---|---|
| 720p / 1080p | $0.10/s | $0.15/s |
| 4K | $0.30/s | $0.35/s |
Example: A 5-second 1080p video with audio costs $2.00 on Standard or $0.75 on Fast.
Veo 3.1 Model Tiers
The Veo 3.1 family has three tiers available across platforms:
| Tier | Audio | Max Resolution | Best For |
|---|---|---|---|
| Standard | โ | 4K | Production quality, cinematic output |
| Fast | โ | 4K | Rapid iteration, prototyping |
| Light | โ | 720p | High-volume, cost-sensitive workflows |
Light is a budget tier available on Google's own platforms (not currently on fal.ai). It generates silent 720p clips at significantly lower cost โ suited for ad variant testing, synthetic training data generation, or any workflow where volume matters more than quality ceiling.
How to Access Veo 3.1
Veo 3.1 is available through multiple platforms:
- fal.ai (this page) โ pay-per-second API, no minimums, serverless
- Google AI Studio โ browser-based prototyping
- Vertex AI โ enterprise-grade API with Google Cloud billing
- Google Gemini app โ consumer interface, subscription-based
- Google Flow โ dedicated video creation tool for multi-shot narratives, character continuity, and longer-form storytelling
For developers who want pay-per-second access without Google Cloud setup, fal.ai is the fastest path to the API.
Real-World Use Cases
Marketing & Advertising eToro produced 15 fully AI-generated versions of a single ad, each localized into a different language. Canva uses Veo to let users generate marketing and social media videos directly from its platform.
Film & Pre-Production Promise Studios uses Veo 3.1 for AI-powered storyboarding and previsualization. Razorfish took a campaign from script to near-cinematic video in a fraction of traditional production time.
Gaming & Interactive Media Volley powers in-game cinematics and dynamically generated narrative assets with Veo 3.1. OpusClip generates promotional motion graphics at scale for SMBs.
Enterprise Video Synthesia integrates Veo to generate contextually adaptive visuals alongside its AI avatars for personalized enterprise video content.
Prompting Best Practices
Veo 3.1 was trained specifically on video content with native audio โ which makes it respond differently from image-to-video tools. Vague prompts produce vague results.
The Cinematic Formula
[Shot type] + [Subject] + [Action] + [Environment] + [Style/Mood] + [Audio cues]
Example:
Slow drone arc around a lone lighthouse at dusk, waves crashing against rocky cliffs, golden-hour light, cinematic grain, 70mm lens feel, sound of distant foghorn and breaking surf, melancholic tone.
Tips
- Be specific about camera movement โ "slow zoom," "handheld follow," "locked-off wide shot" all produce meaningfully different results
- Describe audio explicitly โ don't rely on Veo to infer it; state whether you want ambient sound, music, dialogue, or silence
- Keep dialogue short โ write conversational lines that fit within an 8-second reading window; long speeches get cut off or rushed
- Specify lighting and mood โ "overcast," "golden hour," "neon-lit," "chiaroscuro" all guide the model toward cinematic intent
- Use film grammar โ terms like "rack focus," "dolly in," "establishing shot," and "match cut" are understood
- For multi-shot sequences โ use consistent descriptors (same character description, same environment lighting) across prompts when chaining clips via
`extend-video`
Example Prompts
Cinematic motion & hyper-realistic lighting
The white Lamborghini Countach drifts sharply around a corner and slides into a perfect park on a sunlit city street, smoke and tire screech filling the air, camera panning fast with cinematic motion blur, dust particles and heat haze, dynamic reflections on the car, hyper-realistic lighting, upbeat and energetic vibe.
Natural dialogue & character interaction
The man puts the net down as he turns and speaks to his apprentice, saying 'without patience, one cannot fish, and without fish, one will die' and then he smiles.
Camera movement & ambient audio
The camera pans around the house, mysterious music playing.
Epic scale & aerial cinematography
Slow drone shot around the colosseum as the naval battle takes place.
How Veo 3.1 Compares
| Veo 3.1 | Sora 2 | Kling 2.x | Runway Gen-4 | |
|---|---|---|---|---|
| Max resolution | 4K | 1080p | 4K (premium) | 720p native |
| Native audio | โ | โ | โ | โ |
| Frame rate | 24 FPS | 24โ30 FPS | 30 FPS | 24 FPS |
| Max clip length | 8s (extendable to ~148s) | 20s | Up to 3 min | 16s |
| API access | โ fal.ai, Vertex AI | Limited | โ | โ |
| Watermarking | SynthID | C2PA | Varies | Varies |
| Best at | Cinematic quality, audio, 4K | Human motion, physics realism | Volume, identity consistency | Creative control, fast iteration |
Known Limitations
- 8-second base clip โ complex scenes and long dialogue get cut off; plan around it with
`extend-video` - Garbled or rushed speech โ dialogue exceeding ~20 words in 8 seconds often sounds unnatural; keep lines short
- Prompt misinterpretation โ highly complex or multi-event scenes can miss intent; simplify and chain clips instead
- Audio inconsistency โ generated music and ambient noise can vary between runs with the same prompt; iterate
- No real people or celebrities โ the model will not generate characters resembling identifiable real individuals
- Single-scene per generation โ each clip is one continuous shot, not a multi-scene edit; longer narratives require chaining
- SynthID watermark is permanent โ all outputs are watermarked; this cannot be disabled
Content Policy
Veo 3.1 blocks requests involving:
- Graphic violence, gore, or warfare (unless clearly framed as fictional, e.g., "a period drama battle scene")
- Sexual or explicit content
- Characters resembling real, identifiable people or celebrities
- Content that promotes harm, harassment, or illegal activity
Outputs undergo safety evaluations and checks for memorized content to reduce privacy, copyright, and bias issues.
Quick Start (Python)
pythonimport fal_client result = fal_client.run( "fal-ai/veo3.1", arguments={ "prompt": "Cinematic drone shot over misty mountains", "resolution": "1080p", "audio": True, } ) # result.video.url โ your generated video
For Developers
fal.ai handles all infrastructure:
- Serverless โ scales to zero, scales to millions
- Pay per second โ no minimums
- Python and JavaScript SDKs, plus REST API
- No GPUs to manage
โ API Documentation | Get API Key
FAQ
What can I create with Veo 3.1?
Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Supports 720p, 1080p, and 4K at 16:9 or 9:16. Videos up to 8 seconds per generation, extendable up to ~148 seconds total via `extend-video`.
What's the difference between Standard and Fast? Both tiers support all modes. Standard delivers higher visual and audio quality. Fast is optimized for speed and iteration. Both are available for every endpoint variant.
How does native audio work? Veo 3.1 generates synchronized audio alongside video โ dialogue with lip sync, sound effects, ambient noise, and music. Audio can be enabled or disabled per request, and supports natural conversations in multiple languages.
What resolutions does Veo 3.1 support? 720p, 1080p, and 4K โ the first mainstream AI video model with true 4K output. Available in 16:9 and 9:16 formats at 24 FPS.
How much does Veo 3.1 cost on fal.ai? Pay-per-second with no minimums. Standard: $0.20/s (720p/1080p) or $0.40/s (4K) without audio; $0.40/s or $0.60/s with audio. Fast: $0.10/s (720p/1080p) or $0.30/s (4K) without audio; $0.15/s or $0.35/s with audio.
How do I get started with the API? Install the fal.ai SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in three lines of code. The API is serverless โ no GPUs to manage. See the API documentation for all parameters.
Can I use Veo 3.1 for commercial projects? Yes. Videos generated through the fal.ai API can be used in commercial projects. Check fal.ai's terms of service for full details.
Are my videos watermarked? Yes. All Veo 3.1 outputs are invisibly watermarked with Google's SynthID technology. The watermark persists through re-encoding and cannot be disabled.
How do I make videos longer than 8 seconds?
Use the `extend-video` endpoint to add up to 7 seconds per extension step, up to 20 times โ enabling roughly 148 seconds of total output from one starting clip.
What content will Veo 3.1 refuse to generate? The model blocks graphic violence, explicit content, characters resembling real individuals or celebrities, and content that promotes harm.
How does Veo 3.1 compare to Veo 3? Veo 3.1 improves on dialogue clarity, audio-visual sync, and reference image adherence. For single cinematic shots, the difference is subtle. For dialogue-heavy scenes or multi-shot sequences, 3.1 is noticeably better.
Is fal.ai the only way to access Veo 3.1? No. Veo 3.1 is also available through Google AI Studio, Vertex AI, the Gemini app, and Google Flow. fal.ai offers the simplest pay-per-second API access without needing a Google Cloud account.
Sources: fal.ai/models/fal-ai/veo3.1 ยท Google DeepMind ยท Google Cloud Blog