Model Gallery

See all available model APIs provided by fal.ai
Image to Video

Veo 2

Image to video now available at fal! Veo creates videos with realistic motion and high quality output.

Search Results

31 models found

background texture
fal-ai/latentsync
video-to-video

LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.

new
animation
lip sync
background texture
fal-ai/csm-1b
text-to-audio

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

new
conversational
text to speech
background texture
fal-ai/diffrhythm
text-to-audio

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

new
music
background texture
fal-ai/elevenlabs/tts/multilingual-v2
text-to-audio

Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.

new
audio
background texture
fal-ai/elevenlabs/sound-effects
text-to-audio

Generate sound effects using ElevenLabs advanced sound effects model.

new
sound
background texture
fal-ai/elevenlabs/audio-isolation
audio-to-audio

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

new
audio
background texture
fal-ai/elevenlabs/tts/turbo-v2.5
text-to-speech

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

new
audio
background texture
fal-ai/kokoro/french
text-to-audio

An expressive and natural French text-to-speech model for both European and Canadian French.

speech
background texture
fal-ai/kokoro/hindi
text-to-audio

A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.

speech
background texture
fal-ai/kokoro/mandarin-chinese
text-to-audio

A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody.

speech
background texture
fal-ai/kokoro/british-english
text-to-audio

A high-quality British English text-to-speech model offering natural and expressive voice synthesis.

speech
background texture
fal-ai/kokoro/italian
text-to-audio

A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.

speech
background texture
fal-ai/kokoro/spanish
text-to-audio

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.

speech
background texture
fal-ai/kokoro/american-english
text-to-audio

Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.

speech
background texture
fal-ai/kokoro/brazilian-portuguese
text-to-audio

A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.

speech
background texture
fal-ai/zonos
text-to-audio

Clone voice of any person and speak anything in their voice using zonos' voice cloning.

voice cloning
background texture
fal-ai/kokoro/japanese
text-to-audio

A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.

speech
background texture
fal-ai/yue
text-to-audio

YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.

music
background texture
fal-ai/ffmpeg-api/waveform
json

Get waveform data from audio files using FFmpeg API.

ffmpeg
background texture
fal-ai/ffmpeg-api/metadata
json

Get encoding metadata from video and audio files using FFmpeg API.

ffmpeg
background texture
fal-ai/sync-lipsync
video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.

animation
lip sync
background texture
fal-ai/auto-caption
video-to-video

Automatically generates text captions for your videos from the audio as per text colour/font specifications

captioning
video
background texture
fal-ai/mmaudio-v2/text-to-audio
text-to-audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

audio
fast
background texture
fal-ai/sadtalker/reference
image-to-video

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

animation
background texture
fal-ai/playai/tts/v3
text-to-speech

Blazing-fast text-to-speech. Generate audio with improved emotional tones and extensive multilingual support. Ideal for high-volume processing and efficient workflows.

background texture
fal-ai/minimax-music
text-to-audio

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music
background texture
fal-ai/mmaudio-v2
video-to-video

MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.

ai video
fast
background texture
fal-ai/f5-tts
text-to-audio

F5 TTS

speech
background texture
fal-ai/sadtalker
image-to-video

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

animation
background texture
fal-ai/musetalk
image-to-video

MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.

animation
lip sync
real-time
background texture
fal-ai/stable-audio
text-to-audio

Open source text-to-audio model.

music