
Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.

Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Default parameters with automated optimizations and quality improvements.

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

Use the latest pixverse v5.6 model to turn your texts into amazing videos.

Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.

Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model.

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.

Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content

Vidu Start-End to Video generates smooth transition videos between specified start and end images.

Edit videos using plain language and Wan VACE

Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image
![Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffalserverless%2Fgallery%2Fflux-subject.webp/tr:w-1920,q-80/flux-subject.webp)
Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.
![Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a928e1f%2Fc62zNs4MhBXgm-5w7n0C5_90bad8837ecc451e96f91da93b78f564.jpg/tr:w-1920,q-80/c62zNs4MhBXgm-5w7n0C5_90bad8837ecc451e96f91da93b78f564.webp)
Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.

PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques

Generate videos from prompts using CogVideoX-5B

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.

FireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509.

Generate video with audio from images using LTX-2.3 Distilled

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?

Extends a face into a full body portrait

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Meshy-5 remesh allows you to remesh and export existing 3D models into various formats

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

Generate video clips from your multiple image references using Kling 1.6 (pro)

Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.

Generate video with audio from audio, text and images using LTX-2