nvidia/cosmos-3-super/image-to-video

Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs.

Learn more about Cosmos

Inference

Commercial use

Schema

LLMs

Playground API Examples

Input

Prompt*

Type # to reference inputs.

Image Url*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "video": {
    "url": "https://v3b.fal.media/files/b/0a9c9ce3/_we4xHLsW24J0vFf1Pg17_l6PSMqTD.mp4",
    "content_type": "video/mp4",
    "file_name": "_we4xHLsW24J0vFf1Pg17_l6PSMqTD.mp4",
    "file_size": 8798798,
    "width": 832,
    "height": 480,
    "fps": 24,
    "duration": 7.875,
    "num_frames": 189
  },
  "seed": 611163708
}

Your request will cost $0.05 per second of generated video, rounded up. Agentic generation is billed for each candidate video generated

nvidia/cosmos-3-super/image-to-video

Input

Result

What would you like to do next?

Logs