Question 1

What is Cosmos 3 Super?

Accepted Answer

Cosmos 3 is a collection of omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs. Cosmos 3 Super is NVIDIA's 64B model in the family, and its Text-to-Image and Image-to-Video fine-tuned variants rank #1 among open-weights models on the Artificial Analysis leaderboards.

Question 2

What can I do with Cosmos 3 Super on fal?

Accepted Answer

Two endpoints are available on fal: text-to-image, which generates images from a prompt, and image-to-video, which animates a reference image into video with coherent, physically grounded motion. Both run through fal's serverless API with no GPUs to manage.

Question 3

How does Cosmos 3 Super rank on the Artificial Analysis leaderboards?

Accepted Answer

Cosmos 3 Super's fine-tuned variants rank #1 among open-weights models on both Artificial Analysis leaderboards: #1 in Text-to-Image (https://artificialanalysis.ai/image/leaderboard/text-to-image?audio-output=false&open-weights=true) and #1 in Image-to-Video, No Audio (https://artificialanalysis.ai/video/leaderboard/image-to-video?audio-output=false&open-weights=true).

Question 4

How does the Cosmos 3 architecture work?

Accepted Answer

Cosmos 3 uses a single Mixture-of-Transformers that pairs an autoregressive reasoner with a diffusion generator. The reasoner interprets and plans, while the generator produces pixels. This unified design lets one model span language, image, video, audio, and action for Physical AI.

Question 5

What are the Cosmos 3 variants?

Accepted Answer

The Cosmos 3 family comes in two base sizes: Nano (16B, an 8B reasoner tower plus an 8B generator tower) and Super (64B, a 32B reasoner tower plus a 32B generator tower). The Super model also has Text-to-Image and Image-to-Video fine-tuned variants, which are the versions ranked on the Artificial Analysis Arena leaderboards.

Question 6

How much does Cosmos 3 Super cost on fal?

Accepted Answer

Pricing is pay-per-use with no minimums or subscriptions. Text-to-image costs $0.04 per generated image, with optional prompt expansion adding $0.02 per request. Image-to-video is billed at $0.05 per second of generated video, rounded up.

Question 7

Why does Cosmos 3 use structured JSON prompts?

Accepted Answer

Cosmos 3 generators take structured JSON prompts rather than plain text, which is what reproduces the leaderboard-topping results. Prompt upsampling converts a simple prompt into that structured form. It can be handled by an external agentic harness or by the model's own reasoner branch, so Cosmos 3 can also run self-contained.

Question 8

How do I get started with the API?

Accepted Answer

Install the fal SDK (Python or JavaScript), grab an API key from your dashboard at https://fal.ai/dashboard/keys, and make your first request in a few lines of code. The API is serverless, so there are no GPUs to manage and no infrastructure to set up. Check the API documentation for all available parameters.

Question 9

Can I use Cosmos 3 Super for commercial projects?

Accepted Answer

Yes. Content generated through the fal API can be used in commercial projects. Check fal's terms of service at https://fal.ai/legal/terms-of-service for full details on usage rights and licensing.

Cosmos 3 SuperOmnimodal Generation Across Image, Video, and Action

What Makes NVIDIA Cosmos 3 Super Different

Top Open-Weights Quality, Text to Image

Top Open-Weights Quality, Image to Video

One Model Across Image, Video, Audio, and Action

Text-to-image and image-to-video in one API

See what Cosmos 3 Super can create

Text-to-image: physically grounded scene

Image-to-video: natural motion and physics

Text-to-image: intricate structure and detail

Image-to-video: parts assembling in mid-air

How to access the Cosmos 3 Super API

Common questions about Cosmos 3 Super

Get in touch about Cosmos 3 Super

Contact Sales