
Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts

A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.

Generate complete seamlessly tiling PBR materials including normal, roughness, basecolor, height and metalness maps up to 8K

Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Generate ambient sounds for any text prompt. Now you can turn any SFX into a natural loop for ambient soundscapes.

Animate images into cinematic videos with PixVerse C1, supporting 1080p resolution and native audio generation.

Upscale your images with DRCT-Super-Resolution.

Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.

Veo 2 creates videos from images with realistic motion and very high quality output.

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.

Vidu's latest Q3 pro models

Generate video with audio from images using LTX-2
![Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.](https://refinery.fal.media/url/https%3A%2F%2Fstorage.googleapis.com%2Ffal_cdn%2Ffal%2FSound-1.jpg/tr:w-1920,q-80/Sound-1.webp)
Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

Precise camera position and angle control (rotation, zoom, vertical movement)

SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

High quality zero-shot personalization

Phota's model enables personalized photo editing, preserving identity while erasing distractions seamlessly.

Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when?

Generate videos from images using LTX Video

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.

ImagineArt 2.0 is ImagineArt's latest state-of-the-art visual reasoning text-to-image model, generating high-fidelity, professional-grade visuals with lifelike realism, cinematic effects, and strong aesthetic quality.