
Generate realistic images.

Run any audio capable LLM with fal. Process audio files — transcription, analysis, understanding, understand— using Gemini (Google) models. Supports wav, mp3, aiff, aac, ogg, flac, m4a. Powered by OpenRouter.

Kandinsky 5.0 Pro is a diffusion model for fast, high-quality image-to-video generation.

A high-quality British English text-to-speech model offering natural and expressive voice synthesis.

Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

Replace or dub audio on an existing video with fast audio-only lip-sync.

Wan 2.2's 14B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

Generate video clips from your prompts using Kling 1.5 (pro)

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

Create illusions conditioned on image.

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

Generate high quality video clips from text and image prompts using PixVerse v4.5

Generate video with audio from images using LTX-2.3

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
![Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffal_cdn%2Ffal%2FTraining-4.jpg/tr:w-1920,q-80/Training-4.webp)
Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

A blazing fast FLUX dev LoRA trainer for subjects and styles.

Apply artistic styles like impressionism, cubism, or surrealism to your images.

Image editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

Modify a face to look younger or older while keeping identity realistic.

Sana Sprint is a text-to-image model capable of generating 4K images with exceptional speed.

Latest object erasing model from Black forest labs. Remove undesired objects, texts from images.
![FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffalserverless%2Fgallery%2Fflux-pro-11-ultra.webp/tr:w-1920,q-80/flux-pro-11-ultra.webp)
FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
![Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs and custom LoRA.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a928e3b%2FsIC-Ne9BMwZZtBvR3FwKN_9a724704a550471a9df59999e9e1017f.jpg/tr:w-1920,q-80/sIC-Ne9BMwZZtBvR3FwKN_9a724704a550471a9df59999e9e1017f.webp)
Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs and custom LoRA.

Generate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Generate film-grade videos from text prompts with native audio, up to 1080p and 15 seconds, using PixVerse C1.

Nucleus-Image is a text-to-image generation model built on a sparse mixture-of-experts (MoE) diffusion transformer architecture.
![Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2Ftiger%2FnYv87OHdt503yjlNUk1P3_2551388f5f4e4537b67e8ed436333bca.jpg/tr:w-1920,q-80/nYv87OHdt503yjlNUk1P3_2551388f5f4e4537b67e8ed436333bca.webp)
Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.