![Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8b09ad%2FV8uFhTiTNXdAgvt1tbJmB_1335a918cf5542539d5954c13b7d0fef.jpg/tr:w-1920,q-80/V8uFhTiTNXdAgvt1tbJmB_1335a918cf5542539d5954c13b7d0fef.webp)
Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

An efficent SDXL multi-controlnet image-to-image model.

Create stickers from faces.

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

Generate long videos in 720p/30fps from images using LongCat Video

Wan 2.6 reference-to-video flash model.

Pony V7 is a finetuned text to image for superior aesthetics and prompt following.

Generate high-quality images, posters, and logos with Ideogram V2A. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
Generate video with audio from text using LTX-2.3

Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.

Seed 2.0 Mini is a high-performance multimodal model optimized for low latency and high concurrency. It supports text, image, and video input with 256K context and configurable thinking/reasoning modes.

ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Hunyuan Video 1.5 is Tencent's latest and best video model

Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both images and text.

Generate video with audio from videos using LTX-2

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.

Wan 2.6 reference-to-video model.

Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. This endpoint supports LoRAs made for Wan 2.2.

DeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework

Wan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts

Change sections of a video using LTX-2

Turn your casual photos into stunning professional studio portraits with perfect lighting and high-end photography style.

Generate videos from prompts using LTX Video-0.9.5

Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).

Ovis-Image is a 7B text-to-image model specifically optimized for quick, high quality text rendering.

Generate realistic audio from a video with an optional text prompt