fal-ai/kling-video/v3/4k/text-to-video

Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling

Inference

Commercial use

Partner

Schema

LLMs

Playground API Examples

Input

Prompt

Tight close-up portrait of a 20 years old Gen Z fashion model, early twenties, dewy glass-skin complexion with a light dusting of faux freckles, glossy bitten-lip stain in a muted berry tone, fluffy laminated brows, and a single pearl-studded graphic liner flick in chrome silver across one eyelid. Her hair is slicked back into a sleek low bun with two soft face-framing tendrils, small silver butterfly clips catching the light. She wears oversized vintage-style chrome chandelier earrings and a sheer mesh high-neck top layered under a cropped leather moto jacket. The camera holds an extreme close-up on her face, then slowly arcs around her in a subtle 45-degree orbit as she tilts her chin down, cuts her eyes directly into the lens with a confident deadpan stare, and exhales softly. A gentle breeze lifts the loose strands of hair across her cheekbone. Lighting is soft neon-tinged — cool lavender key light from the left blending into a warm peach rim light from the right, creating a duotone gradient across her skin. Background is an out-of-focus wash of deep magenta and teal bokeh. Cinematic editorial fashion aesthetic, shallow depth of field, 85mm lens compression, subtle film grain, photorealistic, 24fps, reminiscent of a modern i-D Magazine or Vogue Beauty film.

Duration

Multi Prompt

Generate Audio

Shot Type

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

For every second of video you generated, you will be charged $0.42 regardless of whether audio is on or off. For example, a 5s video will cost $2.10.

Logs

Run Kling Video V3 4K Text To Video API on fal

Kling's Native 4K is the world's first AI video model with native 4K output — cinema-grade visuals generated in a single step, with no post-production upscaling or third-party tools required. Built for: Production-ready visuals, large-screen displays, high-definition playback, and professional workflows where clarity, detail, and cinematic texture are non-negotiable.

Pricing

Kling V3.0 in 4K mode is billed per second of generated video.

Configuration	Price per second
4K mode, without native audio generation	$0.42
4K mode, with native audio generation (without voice control)	$0.42

A 5-second clip at 4K therefore costs $2.10; a 10-second clip costs $4.20.

Features

Kling V3 4K is the industry's first native 4K text-to-video model. It produces sharpened, detail-rich footage that meets commercial 4K standards directly from a text prompt, with no upscaling step required. Every frame is rendered with sophisticated lighting, atmosphere, and exceptional clarity, so output is ready for high-end delivery without a post pipeline. The model also maintains stable reference consistency during 4K generation — element features, stylistic expression, color, lighting, and overall mood from reference content are faithfully preserved. Native audio is generated in Chinese and English (other languages are translated to English), durations run from 3 to 15 seconds, and multi-shot storytelling is available through the `multi_prompt` interface. If you want to learn more visit our kling v3 page.

Default prompt template

Scene: [where this happens, time of day, background, environment]

Subject: [who or what is the main focus, action, motion]

Important details: [camera movement, lens, lighting, color grade, atmosphere, pacing]

Audio: [dialogue, ambient sound, music cues — if `generate_audio` is enabled]

Use case: [cinematic trailer / product spot / music video / editorial clip / concept reel]

Constraints: [no watermark / no logos / preserve subject identity / steady camera]

Technical Specifications

Spec	Details
Architecture	Kling Video V3 (Native 4K)
Input Formats	Text prompt, or a list of prompts for multi-shot generation
Output Format	MP4 video via URL
Resolution	Native 4K, no post-processing upscale
Duration Range	3 to 15 seconds
Aspect Ratios	16:9, 9:16, 1:1
Audio	Native audio generation (Chinese / English)
License	Commercial use via fal Partner agreement

API Documentation

What's New in Kling V3 4K

Industry-First Native 4K

One-click export for professional-grade 4K video. Output goes straight from the model at commercial 4K resolution — no separate upscaling pipeline, no quality degradation from chained models, and no third-party tools.

Cinema-Grade Clarity

Ultra-clear visuals that faithfully capture every intricate detail. Sharpness, atmosphere, and lighting hit the bar for large-screen display and professional production workflows out of the box.

Richer color gradations and smoother transitions give footage a deeper sense of dimension. Fewer banding artifacts and cleaner highlight-to-shadow rolloff make the model suitable for cinematic grading.

More Realistic Subjects

Faithful skin textures, more natural facial expressions, and convincing material response (fabric, hair, metal, liquid). Useful where human subjects and close-ups are central to the shot.

Stable Reference Consistency

During 4K generation the model preserves element features, stylistic expression, color, lighting, and overall mood from reference content — striking a balance between high-quality output and visual consistency across shots.

Native Audio Generation

Audio is produced alongside the video in the same request. Supports Chinese and English speech with correct pronunciation; other languages are translated to English automatically. Use lowercase for conversational English and uppercase for acronyms and proper nouns.

Multi-Shot Composition

Pass a list of prompts via `multi_prompt` to build a sequenced clip with distinct shots and per-shot durations. Choose `customize` to control each shot yourself, or `intelligent` to let the model plan the cut.

Efficient Workflow

Skip the render → upscale → export loop. A single API call produces a delivery-ready 4K asset.

Quick Start

Install the client

bash
npm install --save @fal-ai/client

Set your API key

bash
export FAL_KEY="YOUR_API_KEY"

Text to video

javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/kling-video/v3/4k/text-to-video", {
  input: {
    prompt: "Close-up of glowing fireflies dancing in a dark forest at twilight. Soft bioluminescent particles float through the air. Shallow depth of field, bokeh lights in background. Magical atmosphere, gentle movement.",
    duration: "5",
    aspect_ratio: "16:9",
    generate_audio: true,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.video.url);

Multi-shot generation

javascript
const result = await fal.subscribe("fal-ai/kling-video/v3/4k/text-to-video", {
  input: {
    multi_prompt: [
      { prompt: "Wide establishing shot of a Tokyo street at night, rain falling.", duration: "4" },
      { prompt: "Close-up of a neon sign reflected in a puddle.", duration: "3" },
      { prompt: "A lone figure walking away under an umbrella, warm streetlight.", duration: "5" },
    ],
    shot_type: "customize",
    aspect_ratio: "16:9",
    generate_audio: true,
  },
});

API Reference

Input

Parameter	Type	Default	Description
`prompt`	string	optional	Text prompt for video generation. Either `prompt` or `multi_prompt` must be provided, not both
`multi_prompt`	array	optional	List of per-shot prompts with individual durations for multi-shot generation
`duration`	enum	`"5"`	Video duration in seconds. One of `"3"`–`"15"`
`generate_audio`	boolean	`true`	Generate native audio alongside the video
`shot_type`	enum	`"customize"`	`customize` (user-defined shots) or `intelligent` (model-planned cut)
`aspect_ratio`	enum	`"16:9"`	`16:9`, `9:16`, or `1:1`
`negative_prompt`	string	`"blur, distort, and low quality"`	Attributes to avoid
`cfg_scale`	float	`0.5`	Prompt adherence strength, range `0`–`1`

Output

json
{
  "video": {
    "file_name": "output.mp4",
    "content_type": "video/mp4",
    "url": "https://v3b.fal.media/files/...",
    "file_size": 8062911
  }
}

Use Cases

Commercial and advertising -- Delivery-ready 4K spots and product films without an external upscaling stage.

Cinematic and short-form content -- Trailers, concept reels, and stylized sequences with multi-shot composition via `multi_prompt`.

Large-screen and broadcast -- Content mastered for high-definition playback, cinema projection, and professional production pipelines.

Music and social video -- 9:16 and 1:1 formats for vertical platforms with native audio baked into the render.

Editorial and branded storytelling -- Multi-shot narratives where each beat has its own prompt and duration, with stable reference consistency across cuts.

Multilingual campaigns -- Native Chinese and English speech generation, with automatic translation for other source languages.

Long-Running Requests

Video generation is a long-running job. Use the Queue API to submit asynchronously and retrieve results via webhook or polling.

javascript
const { request_id } = await fal.queue.submit("fal-ai/kling-video/v3/4k/text-to-video", {
  input: { prompt: "..." },
  webhookUrl: "https://your-server.com/webhook",
});

const status = await fal.queue.status("fal-ai/kling-video/v3/4k/text-to-video", {
  requestId: request_id,
  logs: true,
});

const result = await fal.queue.result("fal-ai/kling-video/v3/4k/text-to-video", {
  requestId: request_id,
});

Notes

Provide exactly one of `prompt` or `multi_prompt` — not both
For English speech in `generate_audio`, use lowercase for regular words and uppercase for acronyms and proper nouns
Non-English / non-Chinese audio prompts are translated to English automatically
`cfg_scale` trades prompt adherence against motion freedom; lower values allow more creative variation
When running client-side code, never expose your `FAL_KEY`. Use a server-side proxy instead

cURL

bash
curl --request POST \
  --url https://fal.run/fal-ai/kling-video/v3/4k/text-to-video \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "Close-up of glowing fireflies dancing in a dark forest at twilight.",
    "duration": "5",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }'

Python

python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "fal-ai/kling-video/v3/4k/text-to-video",
    arguments={
        "prompt": "Close-up of glowing fireflies dancing in a dark forest at twilight.",
        "duration": "5",
        "aspect_ratio": "16:9",
        "generate_audio": True,
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

fal-ai/kling-video/v3/4k/text-to-video

Input

Result

What would you like to do next?

Logs

Run Kling Video V3 4K Text To Video API on fal

Pricing

Features

Default prompt template

Technical Specifications

What's New in Kling V3 4K

Industry-First Native 4K

Cinema-Grade Clarity

Greater Refinement

More Realistic Subjects

Stable Reference Consistency

Native Audio Generation

Multi-Shot Composition

Efficient Workflow

Quick Start

Install the client

Set your API key

Text to video

Multi-shot generation

API Reference

Input

Output

Use Cases

Long-Running Requests

Notes

cURL

Python