
ByteDance's most advanced text-to-video model. Cinematic output with native audio, multi-shot editing, real-world physics, and director-level camera control.

Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs.

Generate videos from your image prompts using Veo 3.1 fast.

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Use the powerful and accurate topaz image enhancer to enhance your images.

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.
![Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f3c%2F90FKDpwtSCZTqOu0jUI-V_64c1a6ec0f9343908d9efa61b7f2444b.jpg/tr:w-1920,q-80/90FKDpwtSCZTqOu0jUI-V_64c1a6ec0f9343908d9efa61b7f2444b.webp)
Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

ByteDance's most advanced image-to-video model, fast tier. Lower latency and cost with synchronized audio, start and end frame control, and motion prompts.

Generate videos with audio with Seedance 1.5 (supports start & end frame)

Run any LLM with fal. Access Claude (Anthropic), ChatGPT / GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), DeepSeek, Llama (Meta), Qwen (Alibaba), Mistral, and 200+ more models through a single API. Supports reasoning, structured output, and streaming. Powered by OpenRouter.

ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs.

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

OpenAI-compatible chat completions API. Drop-in replacement for the OpenAI API — use any OpenAI SDK or client to access Claude, Gemini, Grok, DeepSeek, Llama, Qwen, Mistral, and all OpenAI models (GPT-5, GPT-4o, o3) through fal. Powered by OpenRouter.

Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

Clarity upscaler for upscaling images with high very fidelity.

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
![FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.](https://refinery.fal.media/url/https%3A%2F%2Ffal.media%2Ffiles%2Fpanda%2FjJ3ZxKTV6ulhHV6GKi9nZ_68430b557ef64f68bf6f0fed0e78c6f9.jpg/tr:w-1920,q-80/jJ3ZxKTV6ulhHV6GKi9nZ_68430b557ef64f68bf6f0fed0e78c6f9.webp)
FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

Generate text-to-speech audio using Eleven-v3 from ElevenLabs.

Google's famous original image generation and editing model, a.k.a Nano Banana

Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Run any Vision Language Model with fal. Analyze and understand images using Claude (Anthropic), GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), Qwen, Pixtral (Mistral), and more. Send one or multiple images for captioning, analysis, OCR, or visual Q&A. Powered by OpenRouter.