fal-ai/ai-avatar/multi-text

MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.

Inference

Commercial use

Schema

LLMs

Playground API

Input

Image URL*

Hint: Drag and drop image files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: jpg, jpeg, png, webp, gif, avif

First Text Input*

Second Text Input*

Prompt*

Type # to reference inputs.

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Download

{
  "video": {
    "content_type": "application/octet-stream",
    "file_size": 352679,
    "file_name": "30b76b90c2164f9a926527497c20832b.mp4",
    "url": "https://v3.fal.media/files/zebra/lKMkUvzCqKn-gHC0vyUPP_30b76b90c2164f9a926527497c20832b.mp4"
  }
}

Your request will cost $0.2 per second.

For 720p price will be doubled.

fal-ai/ai-avatar/multi-text

Input

Result

What would you like to do next?

Logs