![Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8b09b1%2F1lRmcLX6NOxTZp285RGPN_85a165d7a3cd4fbfba4c6302d341f0aa.jpg/tr:w-1920,q-80/1lRmcLX6NOxTZp285RGPN_85a165d7a3cd4fbfba4c6302d341f0aa.webp)
Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Generate high-quality 3D models from a single image using Tripo H3.1.
![Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications.](https://refinery.fal.media/url/https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2Ftiger%2FyUqMpmIEFNYAjtwP3j5VH_5a4980d4efa9484c9ad6a85f88d7563d.jpg/tr:w-1920,q-80/yUqMpmIEFNYAjtwP3j5VH_5a4980d4efa9484c9ad6a85f88d7563d.webp)
Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Generate videos from prompts using LTX Video

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

MiniMax Hailuo-2.3-Fast Image To Video API (Pro, 1080p): Advanced fast image-to-video generation model with 1080p resolution

Create blazing fast and economical videos with MiniMax Hailuo-02 Image To Video API at 512p resolution

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Enhances a given raster image using the 'creative upscale' tool, increasing image resolution, making the image sharper and cleaner.

Generate high quality images from text prompts using MiniMax Image-01. Longer text prompts will result in better quality images.

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.

Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling

Generate consistent character appearances across multiple images. Maintain facial features, proportions, and distinctive traits for cohesive storytelling and branding

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Tuning-free ID customization.

Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

Depth Anything v2 preprocessor.

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.

State of the art Image to 3D Object generation. Generate 3D model from a single image!

EVF-SAM2 combines natural language understanding with advanced segmentation capabilities, allowing you to precisely mask image regions using intuitive positive and negative text prompts.

Leverage the state-of-the-art capabilities of Hunyuan Image 3.0 to generate visual content that effectively conveys the messaging of your written material.
Generate text from speech using ElevenLabs advanced speech-to-text model.
Fix distorted or blurred photos of people with CodeFormer.