fal-ai/ai-avatar/single-text

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

Text Input*

Voice*

Prompt*

Type # to reference inputs.

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "video": {
    "file_size": 797478,
    "content_type": "application/octet-stream",
    "url": "https://v3.fal.media/files/elephant/-huMN0zTaXmBr2CqzCMps_6c9dd31e1d9a4482877747a52a661a0a.mp4",
    "file_name": "6c9dd31e1d9a4482877747a52a661a0a.mp4"
  }
}

Your request will cost $0.2 per second.

For 720p price will be doubled.

fal-ai/ai-avatar/single-text

Input

Result

What would you like to do next?

Logs