nvidia/cosmos-3-super/text-to-image
Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs.
Inference
Commercial use
Input
Additional Settings
Customize your input with more control.
Result
Idle
What would you like to do next?
Your request will cost $0.04 per generated image. Prompt expansion adds $0.02 per request when enabled. Agentic generation bills for every candidate image generated during selection, not just the final returned image.
