Seedance 2.0 is now on fal! ๐Ÿš€

fal-ai/veo3.1

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

For every second of video you generate you will be charged $0.20 without audio or $0.40 with audio for 720p or 1080p. At 4k, you will be charged $0.40 per second without audio, or $0.60 with. For example, a 5 second video at 1080p with audio on will cost $2.00.

Logs

Run Veo 3.1 AI Video Generation API on fal

Cinema-Quality Video. With Sound.

Veo 3.1 is now available on fal. Google DeepMind's flagship video generation model with true 4K output, native audio, and a versatile set of generation modes for every workflow.


What's New in 3.1 vs Veo 3

Veo 3 (May 2025) introduced native audio generation โ€” the feature that set it apart from every other AI video model. Veo 3.1 is a refinement of that foundation:

Veo 3Veo 3.1
Native audioโœ…โœ…
Dialogue & lip syncโœ…โœ… Improved
A/V sync accuracyGoodBetter
Reference image adherenceBasicStronger
Multi-shot narrative controlLimitedImproved
Best forSingle cinematic shotsDialogue scenes, multi-shot sequences

Bottom line: For one-off hero shots, Veo 3 and 3.1 are comparable. If you care about dialogue clarity, audio-visual sync, or building longer narratives via `extend-video`, 3.1 is the practical upgrade.


Key Features

4K Resolution

The first mainstream AI video model to support true 4K output. Generate at 720p, 1080p, or 4K with aspect ratios of 16:9 or 9:16 at 24 FPS. Every frame is sharp enough for professional delivery.

Native Audio

Generate rich audio alongside your video โ€” natural dialogue with lip sync, ambient sound effects, and music โ€” in multiple languages. Audio can be enabled or disabled per request. No post-production audio work needed.

Versatile Modes

Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Standard and Fast tiers for every mode give you the right speed-quality tradeoff.

Video Length & Extension

Each generation produces up to 8 seconds of video. Use the `extend-video` endpoint to chain extensions โ€” up to 7 seconds per step, 20 steps maximum โ€” enabling a total output of roughly 148 seconds (~2.5 minutes) from a single starting clip.

SynthID Watermarking

All videos generated with Veo 3.1 are invisibly watermarked with SynthID, Google's AI content detection technology. Watermarks are embedded in the video data and survive most re-encoding. This is relevant for teams with content disclosure requirements or enterprise compliance workflows. The watermark cannot be disabled.


Endpoints

ModeEndpointPrice
Text to Video`fal-ai/veo3.1`From $0.20/s
Text to Video (Fast)`fal-ai/veo3.1/fast`From $0.10/s
Image to Video`fal-ai/veo3.1/image-to-video`From $0.20/s
Image to Video (Fast)`fal-ai/veo3.1/fast/image-to-video`From $0.10/s
First/Last Frame to Video`fal-ai/veo3.1/first-last-frame-to-video`From $0.20/s
First/Last Frame to Video (Fast)`fal-ai/veo3.1/fast/first-last-frame-to-video`From $0.10/s
Reference to Video`fal-ai/veo3.1/reference-to-video`From $0.20/s
Extend Video`fal-ai/veo3.1/extend-video`From $0.20/s
Extend Video (Fast)`fal-ai/veo3.1/fast/extend-video`From $0.10/s
Detailed Pricing

Standard tier

ResolutionWithout AudioWith Audio
720p / 1080p$0.20/s$0.40/s
4K$0.40/s$0.60/s

Fast tier

ResolutionWithout AudioWith Audio
720p / 1080p$0.10/s$0.15/s
4K$0.30/s$0.35/s

Example: A 5-second 1080p video with audio costs $2.00 on Standard or $0.75 on Fast.


Veo 3.1 Model Tiers

The Veo 3.1 family has three tiers available across platforms:

TierAudioMax ResolutionBest For
Standardโœ…4KProduction quality, cinematic output
Fastโœ…4KRapid iteration, prototyping
LightโŒ720pHigh-volume, cost-sensitive workflows

Light is a budget tier available on Google's own platforms (not currently on fal.ai). It generates silent 720p clips at significantly lower cost โ€” suited for ad variant testing, synthetic training data generation, or any workflow where volume matters more than quality ceiling.


How to Access Veo 3.1

Veo 3.1 is available through multiple platforms:

  • fal.ai (this page) โ€” pay-per-second API, no minimums, serverless
  • Google AI Studio โ€” browser-based prototyping
  • Vertex AI โ€” enterprise-grade API with Google Cloud billing
  • Google Gemini app โ€” consumer interface, subscription-based
  • Google Flow โ€” dedicated video creation tool for multi-shot narratives, character continuity, and longer-form storytelling

For developers who want pay-per-second access without Google Cloud setup, fal.ai is the fastest path to the API.


Real-World Use Cases

Marketing & Advertising eToro produced 15 fully AI-generated versions of a single ad, each localized into a different language. Canva uses Veo to let users generate marketing and social media videos directly from its platform.

Film & Pre-Production Promise Studios uses Veo 3.1 for AI-powered storyboarding and previsualization. Razorfish took a campaign from script to near-cinematic video in a fraction of traditional production time.

Gaming & Interactive Media Volley powers in-game cinematics and dynamically generated narrative assets with Veo 3.1. OpusClip generates promotional motion graphics at scale for SMBs.

Enterprise Video Synthesia integrates Veo to generate contextually adaptive visuals alongside its AI avatars for personalized enterprise video content.


Prompting Best Practices

Veo 3.1 was trained specifically on video content with native audio โ€” which makes it respond differently from image-to-video tools. Vague prompts produce vague results.

The Cinematic Formula
[Shot type] + [Subject] + [Action] + [Environment] + [Style/Mood] + [Audio cues]

Example:

Slow drone arc around a lone lighthouse at dusk, waves crashing against rocky cliffs,
golden-hour light, cinematic grain, 70mm lens feel, sound of distant foghorn and
breaking surf, melancholic tone.
Tips
  • Be specific about camera movement โ€” "slow zoom," "handheld follow," "locked-off wide shot" all produce meaningfully different results
  • Describe audio explicitly โ€” don't rely on Veo to infer it; state whether you want ambient sound, music, dialogue, or silence
  • Keep dialogue short โ€” write conversational lines that fit within an 8-second reading window; long speeches get cut off or rushed
  • Specify lighting and mood โ€” "overcast," "golden hour," "neon-lit," "chiaroscuro" all guide the model toward cinematic intent
  • Use film grammar โ€” terms like "rack focus," "dolly in," "establishing shot," and "match cut" are understood
  • For multi-shot sequences โ€” use consistent descriptors (same character description, same environment lighting) across prompts when chaining clips via `extend-video`

Example Prompts

Cinematic motion & hyper-realistic lighting

The white Lamborghini Countach drifts sharply around a corner and slides into a perfect park on a sunlit city street, smoke and tire screech filling the air, camera panning fast with cinematic motion blur, dust particles and heat haze, dynamic reflections on the car, hyper-realistic lighting, upbeat and energetic vibe.

Natural dialogue & character interaction

The man puts the net down as he turns and speaks to his apprentice, saying 'without patience, one cannot fish, and without fish, one will die' and then he smiles.

Camera movement & ambient audio

The camera pans around the house, mysterious music playing.

Epic scale & aerial cinematography

Slow drone shot around the colosseum as the naval battle takes place.


How Veo 3.1 Compares

Veo 3.1Sora 2Kling 2.xRunway Gen-4
Max resolution4K1080p4K (premium)720p native
Native audioโœ…โœ…โŒโŒ
Frame rate24 FPS24โ€“30 FPS30 FPS24 FPS
Max clip length8s (extendable to ~148s)20sUp to 3 min16s
API accessโœ… fal.ai, Vertex AILimitedโœ…โœ…
WatermarkingSynthIDC2PAVariesVaries
Best atCinematic quality, audio, 4KHuman motion, physics realismVolume, identity consistencyCreative control, fast iteration

Known Limitations

  • 8-second base clip โ€” complex scenes and long dialogue get cut off; plan around it with `extend-video`
  • Garbled or rushed speech โ€” dialogue exceeding ~20 words in 8 seconds often sounds unnatural; keep lines short
  • Prompt misinterpretation โ€” highly complex or multi-event scenes can miss intent; simplify and chain clips instead
  • Audio inconsistency โ€” generated music and ambient noise can vary between runs with the same prompt; iterate
  • No real people or celebrities โ€” the model will not generate characters resembling identifiable real individuals
  • Single-scene per generation โ€” each clip is one continuous shot, not a multi-scene edit; longer narratives require chaining
  • SynthID watermark is permanent โ€” all outputs are watermarked; this cannot be disabled

Content Policy

Veo 3.1 blocks requests involving:

  • Graphic violence, gore, or warfare (unless clearly framed as fictional, e.g., "a period drama battle scene")
  • Sexual or explicit content
  • Characters resembling real, identifiable people or celebrities
  • Content that promotes harm, harassment, or illegal activity

Outputs undergo safety evaluations and checks for memorized content to reduce privacy, copyright, and bias issues.


Quick Start (Python)

python
import fal_client

result = fal_client.run(
    "fal-ai/veo3.1",
    arguments={
        "prompt": "Cinematic drone shot over misty mountains",
        "resolution": "1080p",
        "audio": True,
    }
)

# result.video.url โ†’ your generated video

For Developers

fal.ai handles all infrastructure:

  • Serverless โ€” scales to zero, scales to millions
  • Pay per second โ€” no minimums
  • Python and JavaScript SDKs, plus REST API
  • No GPUs to manage

โ†’ API Documentation | Get API Key


FAQ

What can I create with Veo 3.1? Text-to-video, image-to-video, first/last frame interpolation, reference-based generation, and video extension. Supports 720p, 1080p, and 4K at 16:9 or 9:16. Videos up to 8 seconds per generation, extendable up to ~148 seconds total via `extend-video`.

What's the difference between Standard and Fast? Both tiers support all modes. Standard delivers higher visual and audio quality. Fast is optimized for speed and iteration. Both are available for every endpoint variant.

How does native audio work? Veo 3.1 generates synchronized audio alongside video โ€” dialogue with lip sync, sound effects, ambient noise, and music. Audio can be enabled or disabled per request, and supports natural conversations in multiple languages.

What resolutions does Veo 3.1 support? 720p, 1080p, and 4K โ€” the first mainstream AI video model with true 4K output. Available in 16:9 and 9:16 formats at 24 FPS.

How much does Veo 3.1 cost on fal.ai? Pay-per-second with no minimums. Standard: $0.20/s (720p/1080p) or $0.40/s (4K) without audio; $0.40/s or $0.60/s with audio. Fast: $0.10/s (720p/1080p) or $0.30/s (4K) without audio; $0.15/s or $0.35/s with audio.

How do I get started with the API? Install the fal.ai SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in three lines of code. The API is serverless โ€” no GPUs to manage. See the API documentation for all parameters.

Can I use Veo 3.1 for commercial projects? Yes. Videos generated through the fal.ai API can be used in commercial projects. Check fal.ai's terms of service for full details.

Are my videos watermarked? Yes. All Veo 3.1 outputs are invisibly watermarked with Google's SynthID technology. The watermark persists through re-encoding and cannot be disabled.

How do I make videos longer than 8 seconds? Use the `extend-video` endpoint to add up to 7 seconds per extension step, up to 20 times โ€” enabling roughly 148 seconds of total output from one starting clip.

What content will Veo 3.1 refuse to generate? The model blocks graphic violence, explicit content, characters resembling real individuals or celebrities, and content that promotes harm.

How does Veo 3.1 compare to Veo 3? Veo 3.1 improves on dialogue clarity, audio-visual sync, and reference image adherence. For single cinematic shots, the difference is subtle. For dialogue-heavy scenes or multi-shot sequences, 3.1 is noticeably better.

Is fal.ai the only way to access Veo 3.1? No. Veo 3.1 is also available through Google AI Studio, Vertex AI, the Gemini app, and Google Flow. fal.ai offers the simplest pay-per-second API access without needing a Google Cloud account.


Sources: fal.ai/models/fal-ai/veo3.1 ยท Google DeepMind ยท Google Cloud Blog