nvidia/cosmos-3-super/image-to-video
Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs.
Inference
Commercial use
Input
Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Additional Settings
Customize your input with more control.
Result
Idle
What would you like to do next?
Your request will cost $0.05 per second of generated video, rounded up. Agentic generation is billed for each candidate video generated