Model Gallery

See all available model APIs provided by fal.ai

Kling 2.0 Master

New Kling models are here! You can generate video clips from your prompts or images using Kling 2.0 with exceptional quality, realism, and control.

Search Results

40 models found

background texture
fal-ai/stable-audio
text-to-audio

Open source text-to-audio model.

music
background texture
fal-ai/kling-video/lipsync/audio-to-video
text-to-video

Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input.

audio to video
lipsync
background texture
fal-ai/zonos
text-to-audio

Clone voice of any person and speak anything in their voice using zonos' voice cloning.

voice cloning
background texture
fal-ai/dia-tts/voice-clone
audio-to-audio

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

new
speech
background texture
fal-ai/sync-lipsync/v2
video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

new
animation
lip sync
background texture
fal-ai/minimax-tts/voice-clone
text-to-speech

Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.

new
speech
background texture
fal-ai/playai/tts/v3
text-to-speech

Blazing-fast text-to-speech. Generate audio with improved emotional tones and extensive multilingual support. Ideal for high-volume processing and efficient workflows.

background texture
fal-ai/playai/tts/dialog
text-to-audio

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

audio
background texture
fal-ai/dia-tts
text-to-speech

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

new
text-to-speech
background texture
fal-ai/smart-turn
speech-to-text

An open source, community-driven and native audio turn detection model by Pipecat AI.

new
background texture
fal-ai/elevenlabs/tts/turbo-v2.5
text-to-speech

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

audio
background texture
fal-ai/elevenlabs/tts/multilingual-v2
text-to-audio

Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.

audio
background texture
fal-ai/ffmpeg-api/metadata
json

Get encoding metadata from video and audio files using FFmpeg API.

ffmpeg
background texture
fal-ai/ffmpeg-api/waveform
json

Get waveform data from audio files using FFmpeg API.

ffmpeg
background texture
fal-ai/sync-lipsync
video-to-video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.

animation
lip sync
background texture
fal-ai/sadtalker/reference
image-to-video

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

animation
background texture
fal-ai/musetalk
image-to-video

MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.

animation
lip sync
real-time
background texture
fal-ai/sadtalker
image-to-video

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

animation
background texture
fal-ai/latentsync
video-to-video

LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.

animation
lip sync
background texture
fal-ai/auto-caption
video-to-video

Automatically generates text captions for your videos from the audio as per text colour/font specifications

captioning
video
background texture
cassetteai/music-generator
text-to-audio

CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.

music
cassetteai
background texture
fal-ai/csm-1b
text-to-audio

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

conversational
text to speech
background texture
cassetteai/sound-effects-generator
text-to-audio

Create stunningly realistic sound effects in seconds - CassetteAI's Sound Effects Model generates high-quality SFX up to 30 seconds long in just 1 second of processing time

new
sound
sfx
sound-effects
cassetteai
background texture
fal-ai/diffrhythm
text-to-audio

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

music
background texture
fal-ai/elevenlabs/sound-effects
text-to-audio

Generate sound effects using ElevenLabs advanced sound effects model.

sound
background texture
fal-ai/kokoro/mandarin-chinese
text-to-audio

A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody.

speech
background texture
fal-ai/kokoro/hindi
text-to-audio

A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.

speech
background texture
fal-ai/kokoro/american-english
text-to-audio

Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.

speech
background texture
fal-ai/kokoro/brazilian-portuguese
text-to-audio

A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.

speech
background texture
fal-ai/kokoro/japanese
text-to-audio

A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.

speech
background texture
fal-ai/kokoro/italian
text-to-audio

A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.

speech
background texture
fal-ai/kokoro/french
text-to-audio

An expressive and natural French text-to-speech model for both European and Canadian French.

speech
background texture
fal-ai/kokoro/spanish
text-to-audio

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.

speech
background texture
fal-ai/kokoro/british-english
text-to-audio

A high-quality British English text-to-speech model offering natural and expressive voice synthesis.

speech
background texture
fal-ai/yue
text-to-audio

YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.

music
background texture
fal-ai/minimax-music
text-to-audio

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

music
background texture
fal-ai/f5-tts
text-to-audio

F5 TTS

speech