NVIDIA Nemotron™ 3 Nano Omni Handles everything the agent needs to see and hear.
Nemotron 3 Nano Omni is an open, efficient multimodal foundation model that sees, hears, and reads across text, images, video, and audio. Built as a sub-agent for enterprise agent systems, with up to 256K context and up to 9× higher throughput than stitched-together perception pipelines.
Start building with the Nemotron 3 Nano Omni API
One model, four endpoints. Reason over text, images, audio, or video with the same unified multimodal architecture.

Open, efficient reasoning model from NVIDIA. 30B A3B hybrid Transformer-Mamba MoE, built for enterprise agentic workflows.

Video reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts video plus a prompt and returns text.

Vision reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts an image plus a prompt and returns text.

Audio reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts audio plus a prompt and returns text.
How to get access to Nemotron 3 Nano Omni API
The client API handles the async submit protocol, streams status updates, and returns the final response when the request is complete. Pick a modality below to see a working example.
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("nvidia/nemotron-3-nano-omni", {
input: {
prompt: "Summarize the key capabilities of a multimodal agent.",
},
logs: true,
onQueueUpdate: (update) => {
if (update.status === "IN_PROGRESS") {
update.logs.map((log) => log.message).forEach(console.log);
}
},
});
console.log(result.data);
console.log(result.requestId);One Model, Every Modality
Sustained reasoning across long inputs
Built for long-running agent workflows where context must persist across video timelines, multi-document inputs, and ongoing conversations. Up to 256K tokens support continuous reasoning without brittle chunking strategies.
Agent-grade efficiency on every modality
A unified multimodal MoE architecture collapses separate vision and speech stacks into one model, with Efficient Video Sampling (EVS) letting agents process longer videos in the same pass. Up to 9× higher throughput for video reasoning versus stitched pipelines.
Run anywhere on NVIDIA's open ecosystem
Open weights for deployment with full data control, open-source post-training, open synthetic datasets, and open recipes for customization. Available on Hugging Face, supported across leading inference platforms, and packaged as NVIDIA NIM.
What developers can build with Nemotron 3 Nano Omni
Omni unlocks a new class of multimodal agents. A few of the patterns teams are already building on fal.
Screen-Aware Agents
Omni powers the perception loop for agents navigating GUIs: reading screens, understanding UI state over time, and validating outcomes while execution agents handle the actions. This collapses vision and reasoning into a single loop instead of splitting perception across separate pipelines.
Reason Across PDFs, Charts, and Tables
Process PDFs, slide decks, financial tables, and screenshots in a single pass. Pull structured answers out of mixed-layout documents without pre-parsing or per-format pipelines.
Analyze Calls, Clips, and Live Feeds
For customer service, research, and monitoring workflows, Omni maintains continuous audio-video context, tying what was said, shown, and documented into a single reasoning stream instead of disconnected summaries.
Temporal Context Without the Overhead
Summarize, segment, and reason over long video inputs—without the complexity of multi-stage pipelines. Ideal for monitoring, archiving, and in-product video search.
Unified Retrieval Across Modalities
Build retrieval systems that answer questions over mixed corpora: transcripts alongside slides, user guides alongside screen captures, without forcing each input type through a separate model.
Replace Fragmented Model Stacks
Consolidate OCR, ASR, vision, and reasoning services into a single endpoint. Fewer hops, fewer failure modes, lower total cost of ownership for production agent systems.
Nemotron 3 Nano Omni API Integration Steps
Get up and running in minutes. No GPUs to manage, no infrastructure to set up.
- 1Install the client
Pick your package manager. For Python, use pip.
npm install --save @fal-ai/client
- 2Create an account on fal
Sign up to get access to the dashboard and your API keys.
- 3Get your API key
Locate your API credentials in the developer dashboard. Set
FAL_KEYas an environment variable in your runtime. - 4Submit a request
Use
fal.subscribe()to send a prompt (and an optional image, audio, or video URL) to the matching endpoint. The client handles the async queue, streams progress viaonQueueUpdate, and returns the model's text response when inference is complete.
No setup required
Open any of the four Nemotron 3 Nano Omni endpoints in the playground and run a prompt against text, image, audio, or video inputs without writing a line of code.
Open Playground →Integrate via API
Grab an API key from your dashboard and wire Nemotron 3 Nano Omni into your agent in a few lines of code. Python and JavaScript SDKs available, plus a REST API for any language.
Get API Key →Common questions about Nemotron 3 Nano Omni
What is Nemotron 3 Nano Omni?
Nemotron 3 Nano Omni is an open, efficient multimodal foundation model from NVIDIA, built to power sub-agents that understand and reason across audio, video, images, documents, and text in enterprise agent systems. Combining vision and audio encoders into a unified architecture eliminates the need for separate perception models, simplifying agent development and cutting orchestration overhead. A hybrid Transformer-Mamba MoE design (30B A3B) drives inference efficiency for always-on agents.
What inputs does Nemotron 3 Nano Omni support?
Text, images, video, and audio. Output is text. The model combines vision and audio encoders into a unified architecture, so mixed inputs like screenshots + transcripts + video frames can be reasoned about in the same request.
How long is the context window?
Nemotron 3 Nano Omni supports up to 256K tokens of context. That's enough to sustain long-running agent loops, reason across video timelines, and hold multi-document context without chunking.
What kind of efficiency gains does Nemotron 3 Nano Omni offer?
Collapsing multiple specialized models into a single multimodal system delivers up to 9× higher throughput, reduces orchestration overhead, and simplifies video reasoning—eliminating stitched pipelines.
Is Nemotron 3 Nano Omni available on fal?
Yes. Nemotron 3 Nano Omni is available at launch on fal via the playground and API. Contact sales for enterprise access and volume pricing.
What can I build with Nemotron 3 Nano Omni?
Computer-use agents that read UIs, document-intelligence systems over PDFs and charts, audio + video agents for support and research, multimodal retrieval, and agent infrastructure that consolidates OCR, ASR, and vision into a single endpoint.
How widely adopted is the Nemotron 3 family?
The Nemotron 3 family of open models has seen nearly 47 million downloads in the last 12 months, and 6 of the top 12 trending text models on Hugging Face come from the family. Developers choose Nemotron because it behaves predictably, runs efficiently, and integrates cleanly into real systems.
What are the model specifications?
Model card name: Nemotron-3-Nano-Omni-30B-A3B-Reasoning. Size: 30B A3B. Architecture: Mixture of Experts with a hybrid Transformer-Mamba backbone, 3D convolution (Conv3D) layers for temporal-spatial video data, and Efficient Video Sampling (EVS) for long videos. Built on NVIDIA technology including CRADIO, Parakeet, and Nemotron 3 Nano. Context length: 256K. Quantization: FP8 and NVFP4. Supported GPUs: B200, H100, H200, A100, L40S, DGX Spark, and RTX 6000.
How does Nemotron 3 Nano Omni compare to other Nemotron 3 variants?
Nemotron Nano Omni (30B A3B), the model on this page, is a highly efficient multimodal model delivering advanced reasoning and understanding with industry-leading accuracy. Nemotron Nano (30B A3B) is the most cost-efficient model, focusing on targeted tasks to deliver high accuracy at low inference cost. Nemotron Super (120B A12B) is optimized for running many collaborating agents per application on a single GPU, delivering high accuracy for reasoning, tool calling, and instruction following for complex tasks. Nemotron Ultra (~500B A50B) is the best reasoning engine for mission-critical applications that demand maximum capability over multi-step workflows, with guaranteed consistency across conversations and results.
Can I use Nemotron 3 Nano Omni for commercial projects?
Yes. Outputs from Nemotron 3 Nano Omnion fal can be used in commercial projects. Check fal's terms of service for full details on usage rights and licensing.
How do I get started with the API?
Install the fal SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in a few lines of code. Serverless, no GPUs to manage.
Start building with Nemotron 3 Nano Omni on fal
Nemotron 3 Nano Omni is live on fal today. Jump into the playground or contact sales for enterprise access and volume pricing.
Ready to transform your enterprise with AI?
Take the first step towards AI-driven innovation. Our team of ML engineers is ready to help you prototype, develop, and scale your AI solutions.

