
Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

Extract seamless tiling textures with PBR attribute maps from images

Accelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.

Generate synced sounds for any video, and return the new sound track (like MMAudio)

State-of-the-art open-source model in aesthetic quality
FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Create high-fidelity video with audio from text with LTX-2 Pro.

Transform your photos into vibrant cool cartoons with bold outlines and rich colors.

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

Generates same object from different angles (azimuth/elevation)

Generate video with audio from text using LTX-2

Makes images more photorealistic and natural

Generate video clips from your multiple image references using Kling 1.6 (standard)

Bring colors into old or new black and white photos with DDColor.

A fal.ai endpoint that stitches an ordered list of images into an MP4 video by holding each image for a specified number of frames at a configurable frame rate

Generate video prompts using a variety of techniques including camera direction, style, pacing, special effects and more.

Use React-1 from SyncLabs to refine human emotions and do realistic lip-sync without losing details!

Stable Diffusion v1.5

Ovi can generate videos with audio from image and text inputs.

Qwen-Image (Image-to-Image) transforms and edits input images with high fidelity, enabling precise style transfer, enhancement, and creative modification.

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

Nemotron-Labs-Diffusion-VLM-8B is the vision-language extension of the Nemotron-Labs-Diffusion family.

Generate high quality video clips with different effects using PixVerse v5

Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.