Omnihuman 1.5 via fal generates synchronized video avatars from a single portrait and audio file at $0.16/second. Use 720p for faster, higher-quality output (up to 60s audio) or 1080p for higher resolution (up to 30s).
From Static Portraits to Speaking Avatars
Video avatar generation has traditionally required specialized pipelines, manual rigging, and significant infrastructure investment. Omnihuman 1.5 on fal reduces this complexity to a single API call, transforming any portrait image and audio file into a synchronized video where the character speaks, sings, or performs with contextually appropriate expressions.
The model builds on ByteDance's Diffusion Transformer architecture, which has demonstrated strong scalability properties for video generation tasks1. Unlike earlier avatar systems that merely synchronized lip movements to audio waveforms, Omnihuman 1.5 generates semantically coherent animations where facial expressions, gestures, and head movements respond to the emotional content and rhythm of speech2. This guide covers authentication, implementation patterns, error handling, and production deployment with webhooks.
API Parameters
The following table documents all available parameters:
| Parameter | Type | Required | Default | Constraints |
|---|---|---|---|---|
| image_url | string | Yes | - | Publicly accessible URL to portrait image |
| audio_url | string | Yes | - | Publicly accessible URL; max 60s (720p) or 30s (1080p) |
| resolution | string | No | "1080p" | "720p" or "1080p" |
| turbo_mode | boolean | No | false | Faster generation with quality trade-off |
| prompt | string | No | null | Text guidance for expressions and movement |
The 720p mode produces faster generation and higher quality output according to fal's documentation. Use 1080p only when the higher resolution is specifically required for your distribution platform.
falMODEL APIs
The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models
Authentication and Setup
Integration requires a fal account and API key. Generate credentials from your fal dashboard and store them securely using environment variables.
# Python
pip install fal-client
# JavaScript
npm install @fal-ai/client
Note: The @fal-ai/serverless-client package has been deprecated in favor of @fal-ai/client. See the migration guide for details.
Basic Integration
The subscribe method handles queue submission and polling automatically:
import fal_client
import os
os.environ['FAL_KEY'] = 'your-api-key-here'
result = fal_client.subscribe(
"fal-ai/bytedance/omnihuman/v1.5",
arguments={
"image_url": "https://example.com/portrait.png",
"audio_url": "https://example.com/audio.mp3",
"resolution": "720p"
}
)
print(result["video"]["url"]) # Temporary URL, valid ~24 hours
print(result["duration"]) # Video length in seconds (used for billing)
For JavaScript, the pattern is similar:
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/bytedance/omnihuman/v1.5", {
input: {
image_url: "https://example.com/portrait.png",
audio_url: "https://example.com/audio.mp3",
resolution: "720p",
},
});
Queue and Webhook Integration
For production systems, use queue submission with webhooks instead of blocking on subscribe. This prevents timeout issues and enables better scaling.
Submit to queue with a webhook URL:
const { request_id } = await fal.queue.submit(
"fal-ai/bytedance/omnihuman/v1.5",
{
input: {
image_url: "https://example.com/portrait.png",
audio_url: "https://example.com/audio.mp3",
},
webhookUrl: "https://your-app.com/api/fal/webhook",
}
);
Poll status manually when webhooks are not available:
const status = await fal.queue.status("fal-ai/bytedance/omnihuman/v1.5", {
requestId: request_id,
logs: true,
});
// status.status: "IN_QUEUE" | "IN_PROGRESS" | "COMPLETED"
Webhook payload structure (POST to your endpoint):
{
"request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
"gateway_request_id": "764cabcf-b745-4b3e-ae38-1200304cf45b",
"status": "OK",
"payload": {
"video": { "url": "https://..." },
"duration": 15.3
}
}
On error, status is "ERROR" with error details in the error field. Webhooks retry up to 10 times over 2 hours if delivery fails. Verify webhook signatures using the X-Fal-Webhook-Signature header against fal's JWKS endpoint. See the webhooks documentation for signature verification details.
Error Handling
The API returns structured errors. Common failure scenarios:
- Invalid/inaccessible URLs: Ensure image and audio URLs are publicly accessible without authentication
- Duration exceeded: Audio over 30s with 1080p or over 60s with 720p returns a 422 error
- Rate limiting: Implement exponential backoff; check response headers for retry timing
Error responses follow this structure:
{
"status": "ERROR",
"error": "Invalid status code: 422",
"payload": {
"detail": "Audio duration exceeds maximum for selected resolution"
}
}
Validate inputs client-side before submission to provide immediate user feedback and reduce failed requests.
Pricing
Omnihuman 1.5 charges $0.16 per second of generated video. The duration field in successful responses indicates the billable length.
| Video Length | Cost |
|---|---|
| 10 seconds | $1.60 |
| 30 seconds | $4.80 |
| 60 seconds | $9.60 |
Implement cost estimation in user-facing applications by calculating audio_duration * 0.16 before submission.
Production Checklist
Security:
- Store API keys in environment variables, never in client-side code
- For browser applications, proxy requests through your backend
- Verify webhook signatures to prevent spoofed callbacks
Input validation:
- Image: publicly accessible URL
- Audio: publicly accessible URL, duration within resolution limits
- Supported formats: JPEG/PNG for images; MP3/WAV/M4A for audio
Reliability:
- Download generated videos immediately; URLs expire after approximately 24 hours
- Implement retry logic with exponential backoff for transient failures
- Use webhooks for production workloads instead of long-polling
Monitoring:
- Track the
durationfield for cost reconciliation - Log
request_idfor debugging and support requests - Monitor webhook delivery success rates
File Handling
Input URLs must be publicly accessible. For files that require authentication or are stored locally, use fal's storage API:
const file = new File([audioBuffer], "audio.mp3", { type: "audio/mpeg" });
const url = await fal.storage.upload(file);
// Use returned URL in your request
The client libraries also accept Base64 data URIs directly, though this impacts performance for large files.
Scaling Considerations
The fal platform handles infrastructure scaling automatically. For high-volume applications:
- Submit requests concurrently; the queue system manages parallelization
- Use webhooks rather than polling to reduce connection overhead
- Implement request queuing on your side if you need to throttle submission rates
- Consider fal Serverless for custom deployment requirements
Start with the subscribe method for development and testing, then migrate to queue submission with webhooks for production deployments where reliability and scale matter.
Common Integration Patterns
Customer service avatars: Generate video responses from support scripts. Pre-render common responses during off-peak hours and serve cached videos for frequent queries. For dynamic responses, the queue-webhook pattern ensures your application remains responsive while generation completes.
Content creation tools: Allow users to upload portraits and record audio directly. Use the storage API to handle user uploads, validate audio duration client-side before submission, and display cost estimates based on audio length. Implement progress indicators using queue status polling for better user experience.
Interactive experiences: For real-time applications, 720p with turbo mode provides the fastest generation. Pre-generate avatar videos for anticipated interactions where possible, and use webhooks to update your application state when generation completes.
Debugging Common Issues
No mouth movement in output: Usually indicates audio encoding issues. Ensure audio files use standard encoding (MP3 at 128kbps or higher, WAV at 16-bit PCM). Re-encode problematic files before submission.
Inconsistent quality: Image quality directly affects output. Use well-lit portraits with clear facial features and neutral expressions. Avoid heavily compressed images or those with artifacts.
Webhook not received: Verify your endpoint is publicly accessible and returns a 200 status code promptly. Check that your server can handle POST requests at the webhook URL. Review fal's webhook retry behavior if deliveries are delayed.
Recently Added
References
-
Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. https://arxiv.org/abs/2212.09748 ↩
-
Jiang, Jianwen, et al. "OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation." arXiv preprint, 2025. https://arxiv.org/abs/2508.19209 ↩

![Image-to-image editing with LoRA support for FLUX.2 [klein] 9B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aaeb2%2FFZOclk1jcZaVZAP_C12Qe_edbbb28567484c48bd205f24bafd6225.jpg&w=3840&q=75)
![Image-to-image editing with LoRA support for FLUX.2 [klein] 4B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aae07%2FWKhXnfsA7BNpDGwCXarGn_52f0f2fdac2c4fc78b2765b6c662222b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f49%2FnKsGN6UMAi6IjaYdkmILC_e20d2097bb984ad589518cf915fe54b4.jpg&w=3840&q=75)
![Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f3c%2F90FKDpwtSCZTqOu0jUI-V_64c1a6ec0f9343908d9efa61b7f2444b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f50%2FX8ffS5h55gcigsNZoNC7O_52e6b383ac214d2abe0a2e023f03de88.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f36%2FbYUAh_nzYUAUa_yCBkrP1_2dd84022eeda49e99db95e13fc588e47.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f40%2F-9rbLPCsz36IFb-4t3J2L_76750002c0db4ce899b77e98321ffe30.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f30%2FUwGq5qBE9zqd4r6QI7En0_082c2d0376a646378870218b6c0589f9.jpg&w=3840&q=75)








