Nano Banana 2 is here 🍌 4x faster, lower cost, better quality
Available now on fal.ai

Kling 3.0Generate Cinema, Not Clips


The Next Frontier: Kling 3.0

Long Format Videos

Extended Cinematic Pacing

Generate 3 to 15 seconds of video natively, and chain multiple shots together with multi-shot storyboarding to build full scenes. Each shot can have its own prompt, so you can control pacing, transitions, and narrative flow across an entire sequence.

Visual Drift Killer

Reliable Subject Consistency

Element referencing lets you lock a character's appearance using a reference image, so they stay on-model across every shot. Multi-character coreference keeps 3 or more characters distinct in the same scene without blending faces or outfits.

Cinematic Motion

Physics-Driven Realism

Camera movements like dolly zooms, tracking shots, and rack focuses behave like real cinematography. Fabric drapes, hair moves, and liquids flow with natural weight. The result is footage that feels shot, not generated.


Examples

See what Kling 3.0 can create

Copy any prompt below and try it yourself in the playground.

Cinematic camera & expression shift

"Dolly zoom-in effect, with a lighting shift to blue, as the man's expression turns from worried to horrified"

Natural conversation & cinematography

"Close-up of a woman talking on a train, natural window light, handheld camera feel, shallow depth of field"

Dynamic focus shift & narrative tension

"The warrior turns around as the focus shifts to a monster standing opposite him. He draws his sword, ready to begin"

Epic scale & camera movement

"Aerial drone shot slowly revealing a massive futuristic city at sunrise, lens flare, ultra-wide angle"

For Developers

A few lines of code.
Cinematic video.

fal.ai handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPUs to manage.

  • Serverless: scales to zero, scales to millions
  • Pay per second, no minimums
  • Python and JavaScript SDKs, plus REST API
import fal_client

result = fal_client.run(
  "fal-ai/kling-video/o3/pro/text-to-video",
  arguments={
    "prompt": "A knight wearing weathered armor,
                cinematic, dramatic lighting",
  }
)

# result.video.url → your generated video
FAQ

Common questions about Kling 3.0

What can I create with Kling 3.0?

Kling 3.0 supports text-to-video, image-to-video, start and end frame-to-video, element referencing (including video character reference), multi-shot storyboarding, and native audio generation. The Omni (O3) variant adds multi-image element building with voice input, video element referencing, and multi-character coreference for 3+ characters. Both Kling O3 and V3 output up to 1080p with flexible durations from 3 to 15 seconds.

What's the difference between Kling V3 and Kling O3?

Kling V3 (VIDEO 3.0) is the upgrade from VIDEO 2.6, adding multi-shot storyboarding, element referencing, multi-character coreference, multilingual audio (Chinese, English, Japanese, Korean, Spanish), and 15-second output. Kling O3 (VIDEO 3.0 Omni) is the upgrade from O1, adding native audio, multi-shot support, video element referencing with visual and audio capture, and voice control for elements. O3 is best for reference-heavy workflows with character consistency; V3 is best for prompt-driven cinematic generation. Both models come in Standard and Pro tiers. Pro offers higher quality output with longer inference times; Standard is faster and more cost-effective for iteration and prototyping.

How does multi-shot storyboarding work?

Kling 3.0 can automatically break your prompt into multiple shots with different camera angles and compositions. You can also take precise control at the shot level, specifying duration, shot size, perspective, narrative content, and camera movements for each shot. This lets you create structured, multi-shot narratives in a single generation rather than stitching clips together.

How does element referencing work?

You can upload images or even a 3-8 second video of a character, and the model will extract core character traits, appearance, and voice. This ensures consistent characters across multiple generations. O3 supports multi-image element building with voice as an additional input, so your characters maintain both visual and audio consistency.

What languages does native audio support?

Native audio supports Chinese, English, Japanese, Korean, and Spanish, including regional dialects and accents. You can have multi-character scenes where each character speaks in a different language or dialect. The audio engine handles sound effects, dialogue, and singing with natural lip synchronization.

How much does Kling 3.0 cost on fal.ai?

Pricing is pay-per-second with no minimums or subscriptions. Text-to-video starts at $0.168/s (Standard, audio off) and goes up to $0.392/s (V3 Pro with voice control). For example, a 5-second video on O3 Standard with audio costs $1.12, while a 5-second V3 Pro video with audio and voice control costs $1.96. Other modes like image-to-video and element referencing have their own rates. Check each endpoint's playground page for full details.

How do I get started with the API?

Install the fal.ai SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in three lines of code. The API is serverless, so no GPUs to manage, no infrastructure to set up. Check the API documentation for your chosen endpoint to see all available parameters.

Can I use Kling 3.0 for commercial projects?

Yes. Videos generated through the fal.ai API can be used in commercial projects. Check fal.ai's terms of service for full details on usage rights and licensing.

Ready to create?

Start generating cinematic AI video with Kling 3.0 on fal.ai.