Run the latest models all in one Sandbox 🏖️

Omnihuman 1.5 User Guide

Explore all models

Omnihuman 1.5 via fal generates synchronized video avatars from a single portrait and audio file at $0.16/second. Use 720p for faster, higher-quality output (up to 60s audio) or 1080p for higher resolution (up to 30s).

last updated
1/11/2026
edited by
Zachary Roth
read time
8 minutes
Omnihuman 1.5 User Guide

From Static Portraits to Speaking Avatars

Video avatar generation has traditionally required specialized pipelines, manual rigging, and significant infrastructure investment. Omnihuman 1.5 on fal reduces this complexity to a single API call, transforming any portrait image and audio file into a synchronized video where the character speaks, sings, or performs with contextually appropriate expressions.

The model builds on ByteDance's Diffusion Transformer architecture, which has demonstrated strong scalability properties for video generation tasks1. Unlike earlier avatar systems that merely synchronized lip movements to audio waveforms, Omnihuman 1.5 generates semantically coherent animations where facial expressions, gestures, and head movements respond to the emotional content and rhythm of speech2. This guide covers authentication, implementation patterns, error handling, and production deployment with webhooks.

API Parameters

The following table documents all available parameters:

ParameterTypeRequiredDefaultConstraints
image_urlstringYes-Publicly accessible URL to portrait image
audio_urlstringYes-Publicly accessible URL; max 60s (720p) or 30s (1080p)
resolutionstringNo"1080p""720p" or "1080p"
turbo_modebooleanNofalseFaster generation with quality trade-off
promptstringNonullText guidance for expressions and movement

The 720p mode produces faster generation and higher quality output according to fal's documentation. Use 1080p only when the higher resolution is specifically required for your distribution platform.

falMODEL APIs

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

falSERVERLESS

Scale custom models and apps to thousands of GPUs instantly

falCOMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Authentication and Setup

Integration requires a fal account and API key. Generate credentials from your fal dashboard and store them securely using environment variables.

# Python
pip install fal-client

# JavaScript
npm install @fal-ai/client

Note: The @fal-ai/serverless-client package has been deprecated in favor of @fal-ai/client. See the migration guide for details.

Basic Integration

The subscribe method handles queue submission and polling automatically:

import fal_client
import os

os.environ['FAL_KEY'] = 'your-api-key-here'

result = fal_client.subscribe(
    "fal-ai/bytedance/omnihuman/v1.5",
    arguments={
        "image_url": "https://example.com/portrait.png",
        "audio_url": "https://example.com/audio.mp3",
        "resolution": "720p"
    }
)

print(result["video"]["url"])  # Temporary URL, valid ~24 hours
print(result["duration"])       # Video length in seconds (used for billing)

For JavaScript, the pattern is similar:

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/bytedance/omnihuman/v1.5", {
  input: {
    image_url: "https://example.com/portrait.png",
    audio_url: "https://example.com/audio.mp3",
    resolution: "720p",
  },
});

Queue and Webhook Integration

For production systems, use queue submission with webhooks instead of blocking on subscribe. This prevents timeout issues and enables better scaling.

Submit to queue with a webhook URL:

const { request_id } = await fal.queue.submit(
  "fal-ai/bytedance/omnihuman/v1.5",
  {
    input: {
      image_url: "https://example.com/portrait.png",
      audio_url: "https://example.com/audio.mp3",
    },
    webhookUrl: "https://your-app.com/api/fal/webhook",
  }
);

Poll status manually when webhooks are not available:

const status = await fal.queue.status("fal-ai/bytedance/omnihuman/v1.5", {
  requestId: request_id,
  logs: true,
});
// status.status: "IN_QUEUE" | "IN_PROGRESS" | "COMPLETED"

Webhook payload structure (POST to your endpoint):

{
  "request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
  "gateway_request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
  "status": "OK",
  "payload": {
    "video": { "url": "https://..." },
    "duration": 15.3
  }
}

On error, status is "ERROR" with error details in the error field. Webhooks retry up to 10 times over 2 hours if delivery fails. Verify webhook signatures using the X-Fal-Webhook-Signature header against fal's JWKS endpoint. See the webhooks documentation for signature verification details.

Error Handling

The API returns structured errors. Common failure scenarios:

  • Invalid/inaccessible URLs: Ensure image and audio URLs are publicly accessible without authentication
  • Duration exceeded: Audio over 30s with 1080p or over 60s with 720p returns a 422 error
  • Rate limiting: Implement exponential backoff; check response headers for retry timing

Error responses follow this structure:

{
  "status": "ERROR",
  "error": "Invalid status code: 422",
  "payload": {
    "detail": "Audio duration exceeds maximum for selected resolution"
  }
}

Validate inputs client-side before submission to provide immediate user feedback and reduce failed requests.

Pricing

Omnihuman 1.5 charges $0.16 per second of generated video. The duration field in successful responses indicates the billable length.

Video LengthCost
10 seconds$1.60
30 seconds$4.80
60 seconds$9.60

Implement cost estimation in user-facing applications by calculating audio_duration * 0.16 before submission.

Production Checklist

Security:

  • Store API keys in environment variables, never in client-side code
  • For browser applications, proxy requests through your backend
  • Verify webhook signatures to prevent spoofed callbacks

Input validation:

  • Image: publicly accessible URL
  • Audio: publicly accessible URL, duration within resolution limits
  • Supported formats: JPEG/PNG for images; MP3/WAV/M4A for audio

Reliability:

  • Download generated videos immediately; URLs expire after approximately 24 hours
  • Implement retry logic with exponential backoff for transient failures
  • Use webhooks for production workloads instead of long-polling

Monitoring:

  • Track the duration field for cost reconciliation
  • Log request_id for debugging and support requests
  • Monitor webhook delivery success rates

File Handling

Input URLs must be publicly accessible. For files that require authentication or are stored locally, use fal's storage API:

const file = new File([audioBuffer], "audio.mp3", { type: "audio/mpeg" });
const url = await fal.storage.upload(file);
// Use returned URL in your request

The client libraries also accept Base64 data URIs directly, though this impacts performance for large files.

Scaling Considerations

The fal platform handles infrastructure scaling automatically. For high-volume applications:

  • Submit requests concurrently; the queue system manages parallelization
  • Use webhooks rather than polling to reduce connection overhead
  • Implement request queuing on your side if you need to throttle submission rates
  • Consider fal Serverless for custom deployment requirements

Start with the subscribe method for development and testing, then migrate to queue submission with webhooks for production deployments where reliability and scale matter.

Common Integration Patterns

Customer service avatars: Generate video responses from support scripts. Pre-render common responses during off-peak hours and serve cached videos for frequent queries. For dynamic responses, the queue-webhook pattern ensures your application remains responsive while generation completes.

Content creation tools: Allow users to upload portraits and record audio directly. Use the storage API to handle user uploads, validate audio duration client-side before submission, and display cost estimates based on audio length. Implement progress indicators using queue status polling for better user experience.

Interactive experiences: For real-time applications, 720p with turbo mode provides the fastest generation. Pre-generate avatar videos for anticipated interactions where possible, and use webhooks to update your application state when generation completes.

Debugging Common Issues

No mouth movement in output: Usually indicates audio encoding issues. Ensure audio files use standard encoding (MP3 at 128kbps or higher, WAV at 16-bit PCM). Re-encode problematic files before submission.

Inconsistent quality: Image quality directly affects output. Use well-lit portraits with clear facial features and neutral expressions. Avoid heavily compressed images or those with artifacts.

Webhook not received: Verify your endpoint is publicly accessible and returns a 200 status code promptly. Check that your server can handle POST requests at the webhook URL. Review fal's webhook retry behavior if deliveries are delayed.

Recently Added

References

  1. Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. https://arxiv.org/abs/2212.09748

  2. Jiang, Jianwen, et al. "OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation." arXiv preprint, 2025. https://arxiv.org/abs/2508.19209

about the author
Zachary Roth
A generative media engineer with a focus on growth, Zach has deep expertise in building RAG architecture for complex content systems.

Related articles