Happy Horse 1.0 is now on fal
Available at launch on fal

NVIDIA Nemotron 3 Nano Omni Handles everything the agent needs to see and hear.

Nemotron 3 Nano Omni is an open, efficient multimodal foundation model that sees, hears, and reads across text, images, video, and audio. Built as a sub-agent for enterprise agent systems, with up to 256K context and up to 9× higher throughput than stitched-together perception pipelines.



API Documentation

How to get access to Nemotron 3 Nano Omni API

The client API handles the async submit protocol, streams status updates, and returns the final response when the request is complete. Pick a modality below to see a working example.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("nvidia/nemotron-3-nano-omni", {
  input: {
    prompt: "Summarize the key capabilities of a multimodal agent.",
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});

console.log(result.data);
console.log(result.requestId);

One Model, Every Modality

256K
Token Context

Sustained reasoning across long inputs

Built for long-running agent workflows where context must persist across video timelines, multi-document inputs, and ongoing conversations. Up to 256K tokens support continuous reasoning without brittle chunking strategies.

Up to 9×
Higher Throughput

Agent-grade efficiency on every modality

A unified multimodal MoE architecture collapses separate vision and speech stacks into one model, with Efficient Video Sampling (EVS) letting agents process longer videos in the same pass. Up to 9× higher throughput for video reasoning versus stitched pipelines.

Open
Weights & Recipes

Run anywhere on NVIDIA's open ecosystem

Open weights for deployment with full data control, open-source post-training, open synthetic datasets, and open recipes for customization. Available on Hugging Face, supported across leading inference platforms, and packaged as NVIDIA NIM.


Use Cases

What developers can build with Nemotron 3 Nano Omni

Omni unlocks a new class of multimodal agents. A few of the patterns teams are already building on fal.

Computer Use

Screen-Aware Agents

Omni powers the perception loop for agents navigating GUIs: reading screens, understanding UI state over time, and validating outcomes while execution agents handle the actions. This collapses vision and reasoning into a single loop instead of splitting perception across separate pipelines.

Document Intelligence

Reason Across PDFs, Charts, and Tables

Process PDFs, slide decks, financial tables, and screenshots in a single pass. Pull structured answers out of mixed-layout documents without pre-parsing or per-format pipelines.

Audio + Video Agents

Analyze Calls, Clips, and Live Feeds

For customer service, research, and monitoring workflows, Omni maintains continuous audio-video context, tying what was said, shown, and documented into a single reasoning stream instead of disconnected summaries.

Video Understanding

Temporal Context Without the Overhead

Summarize, segment, and reason over long video inputs—without the complexity of multi-stage pipelines. Ideal for monitoring, archiving, and in-product video search.

Multimodal RAG

Unified Retrieval Across Modalities

Build retrieval systems that answer questions over mixed corpora: transcripts alongside slides, user guides alongside screen captures, without forcing each input type through a separate model.

Agent Infrastructure

Replace Fragmented Model Stacks

Consolidate OCR, ASR, vision, and reasoning services into a single endpoint. Fewer hops, fewer failure modes, lower total cost of ownership for production agent systems.


Getting Started

Nemotron 3 Nano Omni API Integration Steps

Get up and running in minutes. No GPUs to manage, no infrastructure to set up.

  1. 1
    Install the client

    Pick your package manager. For Python, use pip.

    npm install --save @fal-ai/client
  2. 2
    Create an account on fal

    Sign up to get access to the dashboard and your API keys.

  3. 3
    Get your API key

    Locate your API credentials in the developer dashboard. Set FAL_KEY as an environment variable in your runtime.

  4. 4
    Submit a request

    Use fal.subscribe() to send a prompt (and an optional image, audio, or video URL) to the matching endpoint. The client handles the async queue, streams progress via onQueueUpdate, and returns the model's text response when inference is complete.

Try it now

No setup required

Open any of the four Nemotron 3 Nano Omni endpoints in the playground and run a prompt against text, image, audio, or video inputs without writing a line of code.

Open Playground →

For developers

Integrate via API

Grab an API key from your dashboard and wire Nemotron 3 Nano Omni into your agent in a few lines of code. Python and JavaScript SDKs available, plus a REST API for any language.

Get API Key →
FAQ

Common questions about Nemotron 3 Nano Omni

What is Nemotron 3 Nano Omni?

Nemotron 3 Nano Omni is an open, efficient multimodal foundation model from NVIDIA, built to power sub-agents that understand and reason across audio, video, images, documents, and text in enterprise agent systems. Combining vision and audio encoders into a unified architecture eliminates the need for separate perception models, simplifying agent development and cutting orchestration overhead. A hybrid Transformer-Mamba MoE design (30B A3B) drives inference efficiency for always-on agents.

What inputs does Nemotron 3 Nano Omni support?

Text, images, video, and audio. Output is text. The model combines vision and audio encoders into a unified architecture, so mixed inputs like screenshots + transcripts + video frames can be reasoned about in the same request.

How long is the context window?

Nemotron 3 Nano Omni supports up to 256K tokens of context. That's enough to sustain long-running agent loops, reason across video timelines, and hold multi-document context without chunking.

What kind of efficiency gains does Nemotron 3 Nano Omni offer?

Collapsing multiple specialized models into a single multimodal system delivers up to 9× higher throughput, reduces orchestration overhead, and simplifies video reasoning—eliminating stitched pipelines.

Is Nemotron 3 Nano Omni available on fal?

Yes. Nemotron 3 Nano Omni is available at launch on fal via the playground and API. Contact sales for enterprise access and volume pricing.

What can I build with Nemotron 3 Nano Omni?

Computer-use agents that read UIs, document-intelligence systems over PDFs and charts, audio + video agents for support and research, multimodal retrieval, and agent infrastructure that consolidates OCR, ASR, and vision into a single endpoint.

How widely adopted is the Nemotron 3 family?

The Nemotron 3 family of open models has seen nearly 47 million downloads in the last 12 months, and 6 of the top 12 trending text models on Hugging Face come from the family. Developers choose Nemotron because it behaves predictably, runs efficiently, and integrates cleanly into real systems.

What are the model specifications?

Model card name: Nemotron-3-Nano-Omni-30B-A3B-Reasoning. Size: 30B A3B. Architecture: Mixture of Experts with a hybrid Transformer-Mamba backbone, 3D convolution (Conv3D) layers for temporal-spatial video data, and Efficient Video Sampling (EVS) for long videos. Built on NVIDIA technology including CRADIO, Parakeet, and Nemotron 3 Nano. Context length: 256K. Quantization: FP8 and NVFP4. Supported GPUs: B200, H100, H200, A100, L40S, DGX Spark, and RTX 6000.

How does Nemotron 3 Nano Omni compare to other Nemotron 3 variants?

Nemotron Nano Omni (30B A3B), the model on this page, is a highly efficient multimodal model delivering advanced reasoning and understanding with industry-leading accuracy. Nemotron Nano (30B A3B) is the most cost-efficient model, focusing on targeted tasks to deliver high accuracy at low inference cost. Nemotron Super (120B A12B) is optimized for running many collaborating agents per application on a single GPU, delivering high accuracy for reasoning, tool calling, and instruction following for complex tasks. Nemotron Ultra (~500B A50B) is the best reasoning engine for mission-critical applications that demand maximum capability over multi-step workflows, with guaranteed consistency across conversations and results.

Can I use Nemotron 3 Nano Omni for commercial projects?

Yes. Outputs from Nemotron 3 Nano Omnion fal can be used in commercial projects. Check fal's terms of service for full details on usage rights and licensing.

How do I get started with the API?

Install the fal SDK (Python or JavaScript), grab an API key from your dashboard, and make your first request in a few lines of code. Serverless, no GPUs to manage.

Start building with Nemotron 3 Nano Omni on fal

Nemotron 3 Nano Omni is live on fal today. Jump into the playground or contact sales for enterprise access and volume pricing.

Ready to transform your enterprise with AI?

Take the first step towards AI-driven innovation. Our team of ML engineers is ready to help you prototype, develop, and scale your AI solutions.

Enterprise Contact Form