# Nemotron 3 Nano Omni

> Video reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts video plus a prompt and returns text.


## Overview

- **Endpoint**: `https://fal.run/nvidia/nemotron-3-nano-omni/video`
- **Model ID**: `nvidia/nemotron-3-nano-omni/video`
- **Category**: video-to-text
- **Kind**: inference
**Tags**: nemotron, nvidia, video-to-text, video-understanding, video-reasoning, reasoning, agentic, agents, open-weights, hybrid-moe, mamba, 30b-a3b



## Pricing

- **Price**: $0.006 per 1000 tokens

For more details, see [fal.ai pricing](https://fal.ai/pricing).

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:


- **`prompt`** (`string`, _required_):
  Text prompt to send to the model. English only.
  - Examples: "Summarize the key capabilities of a multimodal agent."

- **`system_prompt`** (`string`, _optional_):
  Optional system prompt to steer the model. If omitted, the reasoning_mode control token is used as the system message.
  - Examples: "You are a concise enterprise assistant."

- **`reasoning_mode`** (`ReasoningModeEnum`, _optional_):
  Whether the model should emit an explicit reasoning trace. `no_think` returns a direct answer; `think` returns chain-of-thought followed by the final answer. Default value: `"no_think"`
  - Default: `"no_think"`
  - Options: `"think"`, `"no_think"`

- **`max_tokens`** (`integer`, _optional_):
  Maximum number of tokens to generate. Default value: `1024`
  - Default: `1024`
  - Range: `1` to `20000`

- **`temperature`** (`float`, _optional_):
  Sampling temperature. Lower is more deterministic. Default value: `0.7`
  - Default: `0.7`
  - Range: `0` to `2`

- **`top_p`** (`float`, _optional_):
  Nucleus sampling probability mass. Default value: `0.95`
  - Default: `0.95`
  - Range: `0` to `1`

- **`video_url`** (`string`, _required_):
  URL of the video to reason about. mp4, up to 1080p, max 2 minutes.
  - Examples: "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"



**Required Parameters Example**:

```json
{
  "prompt": "Summarize the key capabilities of a multimodal agent.",
  "video_url": "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"
}
```

**Full Example**:

```json
{
  "prompt": "Summarize the key capabilities of a multimodal agent.",
  "system_prompt": "You are a concise enterprise assistant.",
  "reasoning_mode": "no_think",
  "max_tokens": 1024,
  "temperature": 0.7,
  "top_p": 0.95,
  "video_url": "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"
}
```


### Output Schema

The API returns the following output format:

- **`output`** (`string`, _required_):
  Generated text response.
  - Examples: "The image shows a golden retriever puppy sitting on a wooden floor."

- **`finish_reason`** (`string`, _optional_):
  Reason generation stopped. Default value: `"stop"`
  - Default: `"stop"`
  - Examples: "stop", "length"

- **`usage`** (`UsageInfo`, _required_):
  Token usage for the request.
  - Examples: {"output_tokens":87,"input_tokens":412}



**Example Response**:

```json
{
  "output": "The image shows a golden retriever puppy sitting on a wooden floor.",
  "finish_reason": "stop",
  "usage": {
    "output_tokens": 87,
    "input_tokens": 412
  }
}
```


## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://fal.run/nvidia/nemotron-3-nano-omni/video \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "prompt": "Summarize the key capabilities of a multimodal agent.",
     "video_url": "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"
   }'
```

### Python

Ensure you have the Python client installed:

```bash
pip install fal-client
```

Then use the API client to make requests:

```python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "nvidia/nemotron-3-nano-omni/video",
    arguments={
        "prompt": "Summarize the key capabilities of a multimodal agent.",
        "video_url": "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)
```

### JavaScript

Ensure you have the JavaScript client installed:

```bash
npm install --save @fal-ai/client
```

Then use the API client to make requests:

```javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("nvidia/nemotron-3-nano-omni/video", {
  input: {
    prompt: "Summarize the key capabilities of a multimodal agent.",
    video_url: "https://storage.googleapis.com/falserverless/example_inputs/nemotron-3-nano-omni/video_in.mp4"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);
```


## Additional Resources

### Documentation

- [Model Playground](https://fal.ai/models/nvidia/nemotron-3-nano-omni/video)
- [API Documentation](https://fal.ai/models/nvidia/nemotron-3-nano-omni/video/api)
- [OpenAPI Schema](https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=nvidia/nemotron-3-nano-omni/video)

### fal.ai Platform

- [Platform Documentation](https://docs.fal.ai)
- [Python Client](https://docs.fal.ai/clients/python)
- [JavaScript Client](https://docs.fal.ai/clients/javascript)
