# Marlin

> Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?


## Overview

- **Endpoint**: `https://fal.run/fal-ai/marlin`
- **Model ID**: `fal-ai/marlin`
- **Category**: vision
- **Kind**: inference
**Tags**: utility, editing



## Pricing

- **Price**: $0.015 per 1000 tokens

For more details, see [fal.ai pricing](https://fal.ai/pricing).

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:


- **`video_url`** (`string`, _required_):
  URL of the video to caption. Up to ~2 minutes is supported.
  - Examples: "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4"

- **`prompt`** (`string`, _required_):
  Caption prompt sent to the model. The example value is Marlin's canonical training prompt — overriding usually degrades output quality.
  - Examples: "Provide a spatial description of this clip followed by time-ranged events.\nFor each event, give the time range as <start - end> and a short description."

- **`max_tokens`** (`integer`, _optional_):
  Maximum number of tokens to generate for the caption. Default value: `2048`
  - Default: `2048`
  - Range: `64` to `4096`

- **`do_sample`** (`boolean`, _optional_):
  If true, sample with temperature/top_p; if false, use greedy decoding.
  - Default: `false`

- **`temperature`** (`float`, _optional_):
  Sampling temperature. Only used when do_sample is true. Default value: `1`
  - Default: `1`
  - Range: `0` to `2`

- **`top_p`** (`float`, _optional_):
  Nucleus sampling threshold. Only used when do_sample is true. Default value: `1`
  - Default: `1`



**Required Parameters Example**:

```json
{
  "video_url": "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4",
  "prompt": "Provide a spatial description of this clip followed by time-ranged events.\nFor each event, give the time range as <start - end> and a short description."
}
```

**Full Example**:

```json
{
  "video_url": "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4",
  "prompt": "Provide a spatial description of this clip followed by time-ranged events.\nFor each event, give the time range as <start - end> and a short description.",
  "max_tokens": 2048,
  "temperature": 1,
  "top_p": 1
}
```


### Output Schema

The API returns the following output format:

- **`scene`** (`string`, _required_):
  Spatial description of the clip.

- **`events`** (`list<EventSegment>`, _required_):
  Time-ranged events parsed from the caption.
  - Array of EventSegment
  - Examples: [{"start":0,"text":"a person waves","end":1.5}]

- **`text`** (`string`, _required_):
  Full post-thinking caption text (Scene + Events) as returned by the model.



**Example Response**:

```json
{
  "scene": "",
  "events": [
    {
      "start": 0,
      "text": "a person waves",
      "end": 1.5
    }
  ],
  "text": ""
}
```


## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://fal.run/fal-ai/marlin \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "video_url": "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4",
     "prompt": "Provide a spatial description of this clip followed by time-ranged events.\nFor each event, give the time range as <start - end> and a short description."
   }'
```

### Python

Ensure you have the Python client installed:

```bash
pip install fal-client
```

Then use the API client to make requests:

```python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/marlin",
    arguments={
        "video_url": "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4",
        "prompt": "Provide a spatial description of this clip followed by time-ranged events.
    For each event, give the time range as <start - end> and a short description."
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)
```

### JavaScript

Ensure you have the JavaScript client installed:

```bash
npm install --save @fal-ai/client
```

Then use the API client to make requests:

```javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/marlin", {
  input: {
    video_url: "https://v3b.fal.media/files/b/0a913346/ZbEaRKcU1dMNYkHl9g1Zz_T4QEyOJ3R3WzuQS9.mp4",
    prompt: "Provide a spatial description of this clip followed by time-ranged events.
  For each event, give the time range as <start - end> and a short description."
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);
```


## Additional Resources

### Documentation

- [Model Playground](https://fal.ai/models/fal-ai/marlin)
- [API Documentation](https://fal.ai/models/fal-ai/marlin/api)
- [OpenAPI Schema](https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=fal-ai/marlin)

### fal.ai Platform

- [Platform Documentation](https://docs.fal.ai)
- [Python Client](https://docs.fal.ai/clients/python)
- [JavaScript Client](https://docs.fal.ai/clients/javascript)
