# Nemotron Diffusion Vlm

> Nemotron-Labs-Diffusion-VLM-8B is the vision-language extension of the Nemotron-Labs-Diffusion family.


## Overview

- **Endpoint**: `https://fal.run/fal-ai/nemotron-diffusion-vlm`
- **Model ID**: `fal-ai/nemotron-diffusion-vlm`
- **Category**: vision
- **Kind**: inference
**Tags**: utility, editing


## Pricing

- **Price**: $0.005 per 1000 tokens

For more details, see [fal.ai pricing](https://fal.ai/pricing).

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:


- **`image_url`** (`string`, _required_):
  URL of the image to be processed.
  - Examples: "https://storage.googleapis.com/falserverless/example_inputs/dog.png"

- **`prompt`** (`string`, _required_):
  Prompt to answer about the image.
  - Examples: "Describe the image in one short sentence."

- **`max_tokens`** (`integer`, _optional_):
  Maximum number of tokens to generate. Default value: `512`
  - Default: `512`
  - Range: `1` to `8192`

- **`num_inference_steps`** (`integer`, _optional_):
  Number of diffusion decoding steps. Defaults to 256, rounded up only when omitted and required by the upstream block schedule. Explicit values must be at least max_tokens / block_length and divisible by max_tokens / block_length. Default value: `256`
  - Default: `256`
  - Range: `1` to `1024`

- **`inference_steps`** (`integer`, _optional_):
  Hidden alias for num_inference_steps. Default value: `256`
  - Default: `256`
  - Range: `1` to `1024`

- **`block_length`** (`integer`, _optional_):
  Block length used by diffusion decoding. Default value: `32`
  - Default: `32`
  - Range: `8` to `128`

- **`threshold`** (`float`, _optional_):
  Confidence threshold used by diffusion decoding. Default value: `0.9`
  - Default: `0.9`
  - Range: `0` to `1`


**Required Parameters Example**:

```json
{
  "image_url": "https://storage.googleapis.com/falserverless/example_inputs/dog.png",
  "prompt": "Describe the image in one short sentence."
}
```

**Full Example**:

```json
{
  "image_url": "https://storage.googleapis.com/falserverless/example_inputs/dog.png",
  "prompt": "Describe the image in one short sentence.",
  "max_tokens": 512,
  "num_inference_steps": 256,
  "inference_steps": 256,
  "block_length": 32,
  "threshold": 0.9
}
```


### Output Schema

The API returns the following output format:

- **`output`** (`string`, _required_):
  Generated answer.
  - Examples: "The image shows a dog sitting outdoors."

- **`usage`** (`NemotronDiffusionVLMUsage`, _required_):
  Token and diffusion decoding usage information.

- **`timings`** (`NemotronDiffusionVLMTimings`, _required_):
  Request timing breakdown in seconds.


**Example Response**:

```json
{
  "output": "The image shows a dog sitting outdoors.",
  "usage": {},
  "timings": {}
}
```


## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://fal.run/fal-ai/nemotron-diffusion-vlm \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "image_url": "https://storage.googleapis.com/falserverless/example_inputs/dog.png",
     "prompt": "Describe the image in one short sentence."
   }'
```

### Python

Ensure you have the Python client installed:

```bash
pip install fal-client
```

Then use the API client to make requests:

```python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/nemotron-diffusion-vlm",
    arguments={
        "image_url": "https://storage.googleapis.com/falserverless/example_inputs/dog.png",
        "prompt": "Describe the image in one short sentence."
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)
```

### JavaScript

Ensure you have the JavaScript client installed:

```bash
npm install --save @fal-ai/client
```

Then use the API client to make requests:

```javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/nemotron-diffusion-vlm", {
  input: {
    image_url: "https://storage.googleapis.com/falserverless/example_inputs/dog.png",
    prompt: "Describe the image in one short sentence."
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);
```


## Additional Resources

### Documentation

- [Model Playground](https://fal.ai/models/fal-ai/nemotron-diffusion-vlm)
- [API Documentation](https://fal.ai/models/fal-ai/nemotron-diffusion-vlm/api)
- [OpenAPI Schema](https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=fal-ai/nemotron-diffusion-vlm)

### fal.ai Platform

- [Platform Documentation](https://docs.fal.ai)
- [Python Client](https://docs.fal.ai/clients/python)
- [JavaScript Client](https://docs.fal.ai/clients/javascript)