GPT Image 2 is now on fal

fal-ai/kling-video/o3/4k/reference-to-video

Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling
Inference
Commercial use
Partner

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

For every second of video you generated, you will be charged $0.42 regardless of whether audio is on or off. For example, a 5s video will cost $2.10.

Logs

Run Kling Video O3 4K Reference To Video API on fal

Kling's Native 4K is the world's first AI video model with native 4K output — cinema-grade visuals generated in a single step, with no post-production upscaling or third-party tools required. The O3 4K reference-to-video endpoint composes a clip from multiple references — characters, objects, and style images — addressed directly in the prompt via `@Element1`, `@Image1`, and so on. Built for: Combining specific characters, props, and style references into a single 4K clip — ideal for storyboarding with known subjects, brand-consistent spots, multi-character scenes, and look-development work.


Pricing

Kling V3-Omni in 4K mode is billed per second of generated video.

ConfigurationPrice per second
4K mode, without video input, without native audio generation$0.42
4K mode, without video input, with native audio generation$0.42

A 5-second clip at 4K therefore costs $2.10; a 10-second clip costs $4.20.

Reference inputs passed via `elements` as image sets (`frontal_image_url` + `reference_image_urls`) or via `image_urls` do not count as a "video input" for billing purposes. Video-based reference elements (`video_url` inside `elements`) may be priced under a separate tier — confirm on fal.ai pricing before a production rollout.


Features

Kling O3 4K Reference-to-Video composes a cinema-grade 4K clip from a set of references rather than a single source frame. Pass `elements` for characters or objects (each a frontal+reference image set or a reference video) and `image_urls` for style and appearance references, then address them in the prompt as `@Element1`, `@Element2`, `@Image1`, `@Image2`, and so on. Combined, `elements + image_urls` may total up to 7 references. Optionally anchor the clip with `start_image_url` and `end_image_url`. Durations run from 3 to 15 seconds, aspect ratios cover 16:9, 9:16, and 1:1, audio is opt-in via `generate_audio`, and reference consistency is maintained throughout 4K generation so subjects and style stay faithful across the entire clip. If you want to learn more visit our kling o3 reference-to-video page.


Default prompt template

Scene: [where this happens, time of day, background, environment, style cues]

Subjects: [@Element1, @Element2, ... — who enters, what they do, how they interact]

Style references: [@Image1, @Image2, ... — palette, lighting style, art direction to follow]

Camera: [static / follow / push / pull / pan / framing choices]

Important details: [pacing, atmosphere, effects, material response]

Audio: [dialogue, ambient sound, music cues — if `generate_audio` is enabled]

Constraints: [preserve element identity / preserve style / no watermark / no logos]


Technical Specifications

SpecDetails
ArchitectureKling Video O3 (Native 4K)
Input FormatsText prompt or multi-shot prompt list, up to 7 combined references (elements + style images), optional start and end frame images
Output FormatMP4 video via URL
ResolutionNative 4K, no post-processing upscale
Duration Range3 to 15 seconds
Aspect Ratios16:9, 9:16, 1:1
AudioOptional native audio generation
LicenseCommercial use via fal Partner agreement

API Documentation


What's New in Kling O3 4K Reference-to-Video

Industry-First Native 4K

One-click export for professional-grade 4K video. Output goes straight from the model at commercial 4K resolution — no separate upscaling pipeline, no quality degradation from chained models, and no third-party tools.

Multi-Reference Composition

Combine up to 7 references — any mix of `elements` (characters/objects) and `image_urls` (style/appearance). Each is addressable by position in the prompt: `@Element1`, `@Element2`, `@Image1`, `@Image2`, and so on. Useful when a scene needs specific characters, props, and a specific look at the same time.

Character and Object Elements

`elements` accept either an image set (frontal + reference images) or an entire reference video. The model extracts identity, silhouette, wardrobe, and styling from these references and keeps them consistent across the generated clip.

Style and Appearance References

`image_urls` drive palette, lighting, material feel, and overall art direction without acting as a specific subject. Pair with `@Image1` references in the prompt to steer the look.

Optional Start and End Frames

`start_image_url` and `end_image_url` anchor the clip's first and last frame when you need a specific opening or closing state. Both are optional — use them for transitions, reveals, and match cuts.

Cinema-Grade Clarity and Refinement

Ultra-clear visuals, richer color gradations, and smoother transitions. Sharpness, atmosphere, and lighting hit the bar for large-screen display and professional production workflows out of the box.

Stable Reference Consistency

Throughout 4K generation, element features, stylistic expression, color, lighting, and overall mood remain faithful to the provided references — key when a scene must hold a specific look or subject identity across shots.

Multi-Shot Composition

Pass a list of prompts via `multi_prompt` to build a sequenced clip with distinct shots. `shot_type` controls whether cuts are user-defined (`customize`) or planned by the model.

Opt-In Native Audio

`generate_audio` defaults to `false` — turn it on when you want speech or ambient sound rendered with the video. Supports Chinese and English; other languages are translated to English automatically.


Quick Start

Install the client
bash
npm install --save @fal-ai/client
Set your API key
bash
export FAL_KEY="YOUR_API_KEY"
Reference to video
javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", {
  input: {
    prompt: "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.",
    elements: [
      { video_url: "..." },
      {
        frontal_image_url: "...",
        reference_image_urls: ["..."],
      },
    ],
    duration: "8",
    aspect_ratio: "16:9",
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data.video.url);
Elements + style references
javascript
const result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", {
  input: {
    prompt: "@Element1 walks through the scene in the palette and mood of @Image1 and @Image2.",
    elements: [
      {
        frontal_image_url: "...",
        reference_image_urls: ["..."],
      },
    ],
    image_urls: [
      "...",
      "...",
    ],
    duration: "6",
    aspect_ratio: "16:9",
  },
});
Start + end frame with references
javascript
const result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", {
  input: {
    prompt: "@Element1 transitions from the opening state to the closing state, cinematic camera.",
    start_image_url: "...",
    end_image_url: "...",
    elements: [
      {
        frontal_image_url: "...",
        reference_image_urls: ["..."],
      },
    ],
    duration: "8",
    aspect_ratio: "16:9",
  },
});
Multi-shot with references
javascript
const result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", {
  input: {
    multi_prompt: [
      { prompt: "@Element1 enters from the left, styled like @Image1.", duration: "3" },
      { prompt: "@Element2 enters from the right and meets @Element1.", duration: "3" },
      { prompt: "They walk forward together into the distance.", duration: "4" },
    ],
    shot_type: "customize",
    elements: [
      { video_url: "..." },
      { video_url: "..." },
    ],
    image_urls: ["..."],
    aspect_ratio: "16:9",
    generate_audio: true,
  },
});

API Reference

Input
ParameterTypeDefaultDescription
`prompt`stringoptionalText prompt for video generation. Either `prompt` or `multi_prompt` must be provided, not both
`multi_prompt`arrayoptionalList of per-shot prompts for multi-shot generation
`elements`arrayoptionalCharacters/objects. Each entry is either an image set (`frontal_image_url` + `reference_image_urls`) or a `video_url`. Reference in prompt as `@Element1`, `@Element2`, etc.
`image_urls`arrayoptionalStyle/appearance reference images. Reference in prompt as `@Image1`, `@Image2`, etc.
`start_image_url`stringoptionalImage used as the first frame
`end_image_url`stringoptionalImage used as the last frame
`duration`enum`"5"`Video duration in seconds. One of `"3"``"15"`
`aspect_ratio`enum`"16:9"``16:9`, `9:16`, or `1:1`
`generate_audio`boolean`false`Generate native audio alongside the video
`shot_type`string`"customize"`Multi-shot mode, used with `multi_prompt`

Combined reference limit: `elements.length + image_urls.length ≤ 7`.

Element structure

Each entry in `elements` takes one of two shapes:

json
{
  "frontal_image_url": "https://.../subject_front.png",
  "reference_image_urls": [
    "https://.../subject_back.png",
    "https://.../subject_side.png"
  ]
}

or

json
{
  "video_url": "https://.../reference_clip.mp4"
}

Elements are referenced positionally in prompts as `@Element1`, `@Element2`, etc. Style references in `image_urls` are addressed as `@Image1`, `@Image2`, etc.

Output
json
{
  "video": {
    "file_name": "output.mp4",
    "content_type": "video/mp4",
    "url": "https://v3b.fal.media/files/...",
    "file_size": 18468404
  }
}

Use Cases

Brand-consistent spots -- Combine a specific character, a specific product, and a specific art-direction reference in one 4K clip.

Storyboarding with known subjects -- Drop in character and prop references and iterate on blocking, staging, and camera language.

Multi-character scenes -- Address each subject explicitly with `@Element1`, `@Element2` so each keeps their identity across the shot.

Look-development -- Lock a palette or lighting style with `image_urls` references and explore motion within that look.

Transition shots with references -- Use `start_image_url` + `end_image_url` to drive a specific transition while keeping referenced subjects consistent.

Multi-shot reference reels -- Build sequenced clips where the same characters and style references persist across beats via `multi_prompt`.


Long-Running Requests

Video generation is a long-running job. Use the Queue API to submit asynchronously and retrieve results via webhook or polling.

javascript
const { request_id } = await fal.queue.submit("fal-ai/kling-video/o3/4k/reference-to-video", {
  input: { prompt: "...", elements: [/* ... */] },
  webhookUrl: "https://your-server.com/webhook",
});

const status = await fal.queue.status("fal-ai/kling-video/o3/4k/reference-to-video", {
  requestId: request_id,
  logs: true,
});

const result = await fal.queue.result("fal-ai/kling-video/o3/4k/reference-to-video", {
  requestId: request_id,
});

File Inputs

The endpoint accepts publicly reachable image and video URLs for `elements`, `image_urls`, `start_image_url`, and `end_image_url`. For files that are not publicly accessible, upload them first using the fal storage API:

javascript
import { fal } from "@fal-ai/client";

const file = new File([imageBuffer], "reference.png", { type: "image/png" });
const url = await fal.storage.upload(file);

// Use the returned URL in elements, image_urls, start_image_url, or end_image_url

Notes

  • Provide exactly one of `prompt` or `multi_prompt` — not both
  • Combined references are capped at 7: `elements.length + image_urls.length ≤ 7`
  • References are positional — the first entry in `elements` is `@Element1`, the first in `image_urls` is `@Image1`, and so on
  • `elements` drive subject identity (characters/objects); `image_urls` drive style and look — mixing both is supported
  • `generate_audio` is off by default — set it to `true` to enable speech and ambient sound
  • For English speech, use lowercase for regular words and uppercase for acronyms and proper nouns
  • Non-English / non-Chinese audio prompts are translated to English automatically
  • When running client-side code, never expose your `FAL_KEY`. Use a server-side proxy instead

cURL

bash
curl --request POST \
  --url https://fal.run/fal-ai/kling-video/o3/4k/reference-to-video \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.",
    "duration": "8",
    "aspect_ratio": "16:9"
  }'

Python

python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
            print(log["message"])

result = fal_client.subscribe(
    "fal-ai/kling-video/o3/4k/reference-to-video",
    arguments={
        "prompt": "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.",
        "duration": "8",
        "aspect_ratio": "16:9",
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)