# Stable Avatar

> Stable Avatar generates audio-driven video avatars up to five minutes long


## Overview

- **Endpoint**: `https://fal.run/fal-ai/stable-avatar`
- **Model ID**: `fal-ai/stable-avatar`
- **Category**: audio-to-video
- **Kind**: inference
**Tags**: stable-avatar, talking-head, audio-to-video



## Pricing

Your request will cost **$0.10 per generated second**, with a minimum of 4 seconds.

For more details, see [fal.ai pricing](https://fal.ai/pricing).

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:


- **`image_url`** (`string`, _required_):
  The URL of the image to use as a reference for the video generation.
  - Examples: "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png"

- **`audio_url`** (`string`, _required_):
  The URL of the audio to use as a reference for the video generation.
  - Examples: "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3"

- **`prompt`** (`string`, _required_):
  The prompt to use for the video generation.
  - Examples: "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature."

- **`aspect_ratio`** (`AspectRatioEnum`, _optional_):
  The aspect ratio of the video to generate. If 'auto', the aspect ratio will be determined by the reference image. Default value: `"auto"`
  - Default: `"auto"`
  - Options: `"16:9"`, `"1:1"`, `"9:16"`, `"auto"`

- **`guidance_scale`** (`float`, _optional_):
  The guidance scale to use for the video generation. Default value: `5`
  - Default: `5`
  - Range: `1` to `10`

- **`audio_guidance_scale`** (`float`, _optional_):
  The audio guidance scale to use for the video generation. Default value: `4`
  - Default: `4`
  - Range: `0` to `10`

- **`num_inference_steps`** (`integer`, _optional_):
  The number of inference steps to use for the video generation. Default value: `50`
  - Default: `50`
  - Range: `10` to `50`

- **`seed`** (`integer`, _optional_):
  The seed to use for the video generation.

- **`perturbation`** (`float`, _optional_):
  The amount of perturbation to use for the video generation. 0.0 means no perturbation, 1.0 means full perturbation. Default value: `0.1`
  - Default: `0.1`
  - Range: `0` to `1`



**Required Parameters Example**:

```json
{
  "image_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png",
  "audio_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3",
  "prompt": "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature."
}
```

**Full Example**:

```json
{
  "image_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png",
  "audio_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3",
  "prompt": "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature.",
  "aspect_ratio": "auto",
  "guidance_scale": 5,
  "audio_guidance_scale": 4,
  "num_inference_steps": 50,
  "perturbation": 0.1
}
```


### Output Schema

The API returns the following output format:

- **`video`** (`File`, _required_):
  The generated video file.
  - Examples: "https://storage.googleapis.com/falserverless/example_outputs/stable-avatar-output.mp4"



**Example Response**:

```json
{
  "video": "https://storage.googleapis.com/falserverless/example_outputs/stable-avatar-output.mp4"
}
```


## Usage Examples

### cURL

```bash
curl --request POST \
  --url https://fal.run/fal-ai/stable-avatar \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "image_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png",
     "audio_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3",
     "prompt": "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature."
   }'
```

### Python

Ensure you have the Python client installed:

```bash
pip install fal-client
```

Then use the API client to make requests:

```python
import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/stable-avatar",
    arguments={
        "image_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png",
        "audio_url": "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3",
        "prompt": "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature."
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)
```

### JavaScript

Ensure you have the JavaScript client installed:

```bash
npm install --save @fal-ai/client
```

Then use the API client to make requests:

```javascript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/stable-avatar", {
  input: {
    image_url: "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-image.png",
    audio_url: "https://storage.googleapis.com/falserverless/example_inputs/stable-avatar-input-audio.mp3",
    prompt: "A person is in a relaxed pose. As the video progresses, the character speaks while arm and body movements are minimal and consistent with a natural speaking posture. Hand movements remain minimal. Don't blink too often. Preserve background integrity matching the reference image's spatial configuration, lighting conditions, and color temperature."
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);
```


## Additional Resources

### Documentation

- [Model Playground](https://fal.ai/models/fal-ai/stable-avatar)
- [API Documentation](https://fal.ai/models/fal-ai/stable-avatar/api)
- [OpenAPI Schema](https://fal.ai/api/openapi/queue/openapi.json?endpoint_id=fal-ai/stable-avatar)

### fal.ai Platform

- [Platform Documentation](https://docs.fal.ai)
- [Python Client](https://docs.fal.ai/clients/python)
- [JavaScript Client](https://docs.fal.ai/clients/javascript)
