Omnihuman 1.5 User Guide: Build Video Avatars with Simple API Calls

From Static Portraits to Speaking Avatars

Video avatar generation has traditionally required specialized pipelines, manual rigging, and significant infrastructure investment. Omnihuman 1.5 on fal reduces this complexity to a single API call, transforming any portrait image and audio file into a synchronized video where the character speaks, sings, or performs with contextually appropriate expressions.

The model builds on ByteDance's Diffusion Transformer architecture, which has demonstrated strong scalability properties for video generation tasks¹. Unlike earlier avatar systems that merely synchronized lip movements to audio waveforms, Omnihuman 1.5 generates semantically coherent animations where facial expressions, gestures, and head movements respond to the emotional content and rhythm of speech². This guide covers authentication, implementation patterns, error handling, and production deployment with webhooks.

API Parameters

The following table documents all available parameters:

Parameter	Type	Required	Default	Constraints
image_url	string	Yes	-	Publicly accessible URL to portrait image
audio_url	string	Yes	-	Publicly accessible URL; max 60s (720p) or 30s (1080p)
resolution	string	No	"1080p"	"720p" or "1080p"
turbo_mode	boolean	No	false	Faster generation with quality trade-off
prompt	string	No	null	Text guidance for expressions and movement

The 720p mode produces faster generation and higher quality output according to fal's documentation. Use 1080p only when the higher resolution is specifically required for your distribution platform.

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Authentication and Setup

Integration requires a fal account and API key. Generate credentials from your fal dashboard and store them securely using environment variables.

# Python
pip install fal-client

# JavaScript
npm install @fal-ai/client

Note: The @fal-ai/serverless-client package has been deprecated in favor of @fal-ai/client. See the migration guide for details.

Basic Integration

The subscribe method handles queue submission and polling automatically:

import fal_client
import os

os.environ['FAL_KEY'] = 'your-api-key-here'

result = fal_client.subscribe(
    "fal-ai/bytedance/omnihuman/v1.5",
    arguments={
        "image_url": "https://example.com/portrait.png",
        "audio_url": "https://example.com/audio.mp3",
        "resolution": "720p"
    }
)

print(result["video"]["url"])  # Temporary URL, valid ~24 hours
print(result["duration"])       # Video length in seconds (used for billing)

For JavaScript, the pattern is similar:

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/bytedance/omnihuman/v1.5", {
  input: {
    image_url: "https://example.com/portrait.png",
    audio_url: "https://example.com/audio.mp3",
    resolution: "720p",
  },
});

Queue and Webhook Integration

For production systems, use queue submission with webhooks instead of blocking on subscribe. This prevents timeout issues and enables better scaling.

Submit to queue with a webhook URL:

const { request_id } = await fal.queue.submit(
  "fal-ai/bytedance/omnihuman/v1.5",
  {
    input: {
      image_url: "https://example.com/portrait.png",
      audio_url: "https://example.com/audio.mp3",
    },
    webhookUrl: "https://your-app.com/api/fal/webhook",
  }
);

Poll status manually when webhooks are not available:

const status = await fal.queue.status("fal-ai/bytedance/omnihuman/v1.5", {
  requestId: request_id,
  logs: true,
});
// status.status: "IN_QUEUE" | "IN_PROGRESS" | "COMPLETED"

Webhook payload structure (POST to your endpoint):

{
  "request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
  "gateway_request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
  "status": "OK",
  "payload": {
    "video": { "url": "https://..." },
    "duration": 15.3
  }
}

On error, status is "ERROR" with error details in the error field. Webhooks retry up to 10 times over 2 hours if delivery fails. Verify webhook signatures using the X-Fal-Webhook-Signature header against fal's JWKS endpoint. See the webhooks documentation for signature verification details.

Error Handling

The API returns structured errors. Common failure scenarios:

Invalid/inaccessible URLs: Ensure image and audio URLs are publicly accessible without authentication
Duration exceeded: Audio over 30s with 1080p or over 60s with 720p returns a 422 error
Rate limiting: Implement exponential backoff; check response headers for retry timing

Error responses follow this structure:

{
  "status": "ERROR",
  "error": "Invalid status code: 422",
  "payload": {
    "detail": "Audio duration exceeds maximum for selected resolution"
  }
}

Validate inputs client-side before submission to provide immediate user feedback and reduce failed requests.

Pricing

Omnihuman 1.5 charges $0.16 per second of generated video. The duration field in successful responses indicates the billable length.

Video Length	Cost
10 seconds	$1.60
30 seconds	$4.80
60 seconds	$9.60

Implement cost estimation in user-facing applications by calculating audio_duration * 0.16 before submission.

Production Checklist

Security:

Store API keys in environment variables, never in client-side code
For browser applications, proxy requests through your backend
Verify webhook signatures to prevent spoofed callbacks

Input validation:

Image: publicly accessible URL
Audio: publicly accessible URL, duration within resolution limits
Supported formats: JPEG/PNG for images; MP3/WAV/M4A for audio

Reliability:

Download generated videos immediately; URLs expire after approximately 24 hours
Implement retry logic with exponential backoff for transient failures
Use webhooks for production workloads instead of long-polling

Monitoring:

Track the duration field for cost reconciliation
Log request_id for debugging and support requests
Monitor webhook delivery success rates

File Handling

Input URLs must be publicly accessible. For files that require authentication or are stored locally, use fal's storage API:

const file = new File([audioBuffer], "audio.mp3", { type: "audio/mpeg" });
const url = await fal.storage.upload(file);
// Use returned URL in your request

The client libraries also accept Base64 data URIs directly, though this impacts performance for large files.

Scaling Considerations

The fal platform handles infrastructure scaling automatically. For high-volume applications:

Submit requests concurrently; the queue system manages parallelization
Use webhooks rather than polling to reduce connection overhead
Implement request queuing on your side if you need to throttle submission rates
Consider fal Serverless for custom deployment requirements

Start with the subscribe method for development and testing, then migrate to queue submission with webhooks for production deployments where reliability and scale matter.

Common Integration Patterns

Customer service avatars: Generate video responses from support scripts. Pre-render common responses during off-peak hours and serve cached videos for frequent queries. For dynamic responses, the queue-webhook pattern ensures your application remains responsive while generation completes.

Content creation tools: Allow users to upload portraits and record audio directly. Use the storage API to handle user uploads, validate audio duration client-side before submission, and display cost estimates based on audio length. Implement progress indicators using queue status polling for better user experience.

Interactive experiences: For real-time applications, 720p with turbo mode provides the fastest generation. Pre-generate avatar videos for anticipated interactions where possible, and use webhooks to update your application state when generation completes.

Debugging Common Issues

No mouth movement in output: Usually indicates audio encoding issues. Ensure audio files use standard encoding (MP3 at 128kbps or higher, WAV at 16-bit PCM). Re-encode problematic files before submission.

Inconsistent quality: Image quality directly affects output. Use well-lit portraits with clear facial features and neutral expressions. Avoid heavily compressed images or those with artifacts.

Webhook not received: Verify your endpoint is publicly accessible and returns a 200 status code promptly. Check that your server can handle POST requests at the webhook URL. Review fal's webhook retry behavior if deliveries are delayed.

Omnihuman 1.5 User Guide