fal-ai/kling-video/o3/4k/reference-to-video
Input
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Hint: Drag and drop video files from your computer, video from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp4, mov, webm, m4v, gif
Customize your input with more control.
Result
What would you like to do next?
For every second of video you generated, you will be charged $0.42 regardless of whether audio is on or off. For example, a 5s video will cost $2.10.
Logs
Run Kling Video O3 4K Reference To Video API on fal
Kling's Native 4K is the world's first AI video model with native 4K output — cinema-grade visuals generated in a single step, with no post-production upscaling or third-party tools required. The O3 4K reference-to-video endpoint composes a clip from multiple references — characters, objects, and style images — addressed directly in the prompt via `@Element1`, `@Image1`, and so on.
Built for: Combining specific characters, props, and style references into a single 4K clip — ideal for storyboarding with known subjects, brand-consistent spots, multi-character scenes, and look-development work.
Pricing
Kling V3-Omni in 4K mode is billed per second of generated video.
| Configuration | Price per second |
|---|---|
| 4K mode, without video input, without native audio generation | $0.42 |
| 4K mode, without video input, with native audio generation | $0.42 |
A 5-second clip at 4K therefore costs $2.10; a 10-second clip costs $4.20.
Reference inputs passed via
`elements`as image sets (`frontal_image_url`+`reference_image_urls`) or via`image_urls`do not count as a "video input" for billing purposes. Video-based reference elements (`video_url`inside`elements`) may be priced under a separate tier — confirm on fal.ai pricing before a production rollout.
Features
Kling O3 4K Reference-to-Video composes a cinema-grade 4K clip from a set of references rather than a single source frame. Pass `elements` for characters or objects (each a frontal+reference image set or a reference video) and `image_urls` for style and appearance references, then address them in the prompt as `@Element1`, `@Element2`, `@Image1`, `@Image2`, and so on. Combined, `elements + image_urls` may total up to 7 references. Optionally anchor the clip with `start_image_url` and `end_image_url`. Durations run from 3 to 15 seconds, aspect ratios cover 16:9, 9:16, and 1:1, audio is opt-in via `generate_audio`, and reference consistency is maintained throughout 4K generation so subjects and style stay faithful across the entire clip. If you want to learn more visit our kling o3 reference-to-video page.
Default prompt template
Scene: [where this happens, time of day, background, environment, style cues]
Subjects: [@Element1, @Element2, ... — who enters, what they do, how they interact]
Style references: [@Image1, @Image2, ... — palette, lighting style, art direction to follow]
Camera: [static / follow / push / pull / pan / framing choices]
Important details: [pacing, atmosphere, effects, material response]
Audio:
[dialogue, ambient sound, music cues — if `generate_audio` is enabled]
Constraints: [preserve element identity / preserve style / no watermark / no logos]
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Kling Video O3 (Native 4K) |
| Input Formats | Text prompt or multi-shot prompt list, up to 7 combined references (elements + style images), optional start and end frame images |
| Output Format | MP4 video via URL |
| Resolution | Native 4K, no post-processing upscale |
| Duration Range | 3 to 15 seconds |
| Aspect Ratios | 16:9, 9:16, 1:1 |
| Audio | Optional native audio generation |
| License | Commercial use via fal Partner agreement |
What's New in Kling O3 4K Reference-to-Video
Industry-First Native 4K
One-click export for professional-grade 4K video. Output goes straight from the model at commercial 4K resolution — no separate upscaling pipeline, no quality degradation from chained models, and no third-party tools.
Multi-Reference Composition
Combine up to 7 references — any mix of `elements` (characters/objects) and `image_urls` (style/appearance). Each is addressable by position in the prompt: `@Element1`, `@Element2`, `@Image1`, `@Image2`, and so on. Useful when a scene needs specific characters, props, and a specific look at the same time.
Character and Object Elements
`elements` accept either an image set (frontal + reference images) or an entire reference video. The model extracts identity, silhouette, wardrobe, and styling from these references and keeps them consistent across the generated clip.
Style and Appearance References
`image_urls` drive palette, lighting, material feel, and overall art direction without acting as a specific subject. Pair with `@Image1` references in the prompt to steer the look.
Optional Start and End Frames
`start_image_url` and `end_image_url` anchor the clip's first and last frame when you need a specific opening or closing state. Both are optional — use them for transitions, reveals, and match cuts.
Cinema-Grade Clarity and Refinement
Ultra-clear visuals, richer color gradations, and smoother transitions. Sharpness, atmosphere, and lighting hit the bar for large-screen display and professional production workflows out of the box.
Stable Reference Consistency
Throughout 4K generation, element features, stylistic expression, color, lighting, and overall mood remain faithful to the provided references — key when a scene must hold a specific look or subject identity across shots.
Multi-Shot Composition
Pass a list of prompts via `multi_prompt` to build a sequenced clip with distinct shots. `shot_type` controls whether cuts are user-defined (`customize`) or planned by the model.
Opt-In Native Audio
`generate_audio` defaults to `false` — turn it on when you want speech or ambient sound rendered with the video. Supports Chinese and English; other languages are translated to English automatically.
Quick Start
Install the client
bashnpm install --save @fal-ai/client
Set your API key
bashexport FAL_KEY="YOUR_API_KEY"
Reference to video
javascriptimport { fal } from "@fal-ai/client"; const result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", { input: { prompt: "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.", elements: [ { video_url: "..." }, { frontal_image_url: "...", reference_image_urls: ["..."], }, ], duration: "8", aspect_ratio: "16:9", }, logs: true, onQueueUpdate: (update) => { if (update.status === "IN_PROGRESS") { update.logs.map((log) => log.message).forEach(console.log); } }, }); console.log(result.data.video.url);
Elements + style references
javascriptconst result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", { input: { prompt: "@Element1 walks through the scene in the palette and mood of @Image1 and @Image2.", elements: [ { frontal_image_url: "...", reference_image_urls: ["..."], }, ], image_urls: [ "...", "...", ], duration: "6", aspect_ratio: "16:9", }, });
Start + end frame with references
javascriptconst result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", { input: { prompt: "@Element1 transitions from the opening state to the closing state, cinematic camera.", start_image_url: "...", end_image_url: "...", elements: [ { frontal_image_url: "...", reference_image_urls: ["..."], }, ], duration: "8", aspect_ratio: "16:9", }, });
Multi-shot with references
javascriptconst result = await fal.subscribe("fal-ai/kling-video/o3/4k/reference-to-video", { input: { multi_prompt: [ { prompt: "@Element1 enters from the left, styled like @Image1.", duration: "3" }, { prompt: "@Element2 enters from the right and meets @Element1.", duration: "3" }, { prompt: "They walk forward together into the distance.", duration: "4" }, ], shot_type: "customize", elements: [ { video_url: "..." }, { video_url: "..." }, ], image_urls: ["..."], aspect_ratio: "16:9", generate_audio: true, }, });
API Reference
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
`prompt` | string | optional | Text prompt for video generation. Either `prompt` or `multi_prompt` must be provided, not both |
`multi_prompt` | array | optional | List of per-shot prompts for multi-shot generation |
`elements` | array | optional | Characters/objects. Each entry is either an image set (`frontal_image_url` + `reference_image_urls`) or a `video_url`. Reference in prompt as `@Element1`, `@Element2`, etc. |
`image_urls` | array | optional | Style/appearance reference images. Reference in prompt as `@Image1`, `@Image2`, etc. |
`start_image_url` | string | optional | Image used as the first frame |
`end_image_url` | string | optional | Image used as the last frame |
`duration` | enum | `"5"` | Video duration in seconds. One of `"3"`–`"15"` |
`aspect_ratio` | enum | `"16:9"` | `16:9`, `9:16`, or `1:1` |
`generate_audio` | boolean | `false` | Generate native audio alongside the video |
`shot_type` | string | `"customize"` | Multi-shot mode, used with `multi_prompt` |
Combined reference limit:
`elements.length + image_urls.length ≤ 7`.
Element structure
Each entry in `elements` takes one of two shapes:
json{ "frontal_image_url": "https://.../subject_front.png", "reference_image_urls": [ "https://.../subject_back.png", "https://.../subject_side.png" ] }
or
json{ "video_url": "https://.../reference_clip.mp4" }
Elements are referenced positionally in prompts as `@Element1`, `@Element2`, etc. Style references in `image_urls` are addressed as `@Image1`, `@Image2`, etc.
Output
json{ "video": { "file_name": "output.mp4", "content_type": "video/mp4", "url": "https://v3b.fal.media/files/...", "file_size": 18468404 } }
Use Cases
Brand-consistent spots -- Combine a specific character, a specific product, and a specific art-direction reference in one 4K clip.
Storyboarding with known subjects -- Drop in character and prop references and iterate on blocking, staging, and camera language.
Multi-character scenes -- Address each subject explicitly with `@Element1`, `@Element2` so each keeps their identity across the shot.
Look-development -- Lock a palette or lighting style with `image_urls` references and explore motion within that look.
Transition shots with references -- Use `start_image_url` + `end_image_url` to drive a specific transition while keeping referenced subjects consistent.
Multi-shot reference reels -- Build sequenced clips where the same characters and style references persist across beats via `multi_prompt`.
Long-Running Requests
Video generation is a long-running job. Use the Queue API to submit asynchronously and retrieve results via webhook or polling.
javascriptconst { request_id } = await fal.queue.submit("fal-ai/kling-video/o3/4k/reference-to-video", { input: { prompt: "...", elements: [/* ... */] }, webhookUrl: "https://your-server.com/webhook", }); const status = await fal.queue.status("fal-ai/kling-video/o3/4k/reference-to-video", { requestId: request_id, logs: true, }); const result = await fal.queue.result("fal-ai/kling-video/o3/4k/reference-to-video", { requestId: request_id, });
File Inputs
The endpoint accepts publicly reachable image and video URLs for `elements`, `image_urls`, `start_image_url`, and `end_image_url`. For files that are not publicly accessible, upload them first using the fal storage API:
javascriptimport { fal } from "@fal-ai/client"; const file = new File([imageBuffer], "reference.png", { type: "image/png" }); const url = await fal.storage.upload(file); // Use the returned URL in elements, image_urls, start_image_url, or end_image_url
Notes
- Provide exactly one of
`prompt`or`multi_prompt`— not both - Combined references are capped at 7:
`elements.length + image_urls.length ≤ 7` - References are positional — the first entry in
`elements`is`@Element1`, the first in`image_urls`is`@Image1`, and so on `elements`drive subject identity (characters/objects);`image_urls`drive style and look — mixing both is supported`generate_audio`is off by default — set it to`true`to enable speech and ambient sound- For English speech, use lowercase for regular words and uppercase for acronyms and proper nouns
- Non-English / non-Chinese audio prompts are translated to English automatically
- When running client-side code, never expose your
`FAL_KEY`. Use a server-side proxy instead
cURL
bashcurl --request POST \ --url https://fal.run/fal-ai/kling-video/o3/4k/reference-to-video \ --header "Authorization: Key $FAL_KEY" \ --header "Content-Type: application/json" \ --data '{ "prompt": "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.", "duration": "8", "aspect_ratio": "16:9" }'
Python
pythonimport fal_client def on_queue_update(update): if isinstance(update, fal_client.InProgress): for log in update.logs: print(log["message"]) result = fal_client.subscribe( "fal-ai/kling-video/o3/4k/reference-to-video", arguments={ "prompt": "@Element1 and @Element2 enter the scene from two sides. The elephant starts to play with the ball.", "duration": "8", "aspect_ratio": "16:9", }, with_logs=True, on_queue_update=on_queue_update, ) print(result)
