
Applies sepia vintage effect to images

Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

TEED (Temporal Edge Enhancement Detection) preprocessor.

Apply designs/graphics onto people's shirts

Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.

Generate video with audio from images using LTX-2 and custom LoRA

Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

SCAIL is a character animation model that uses 3D consistent pose representations to animate reference images with coherent motion, supporting complex movements.

Heygen Avatar V3 Model for Digital Twin

FFMPEG Utility for Audio Compression

Generate short video clips from your images using SVD v1.1 at Lightning Speed

Semantic image alignment measurements

Extend video with audio using LTX-2.3 and custom LoRA

Generates satellite/aerial view style images

Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.

Restyle videos up to 30 min long - maintaining maximum detail quality.

High-fidelity keypoint-driven video object removal - minimal input, strong temporal consistency. Trained on licensed data for risk-free commercial video editing.

Generate video with audio from videos using LTX-2 and custom LoRA

Fast Text-to-Video endpoint for Krea's Wan 14b model.

Reference-free image measurements

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

Re-animate your videos in lightning speed!

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

Train LoRAs for the Qwen-Image-Layered model, customize how images are split into layers.

An efficent SDXL multi-controlnet inpainting model.

Generate high quality and fast video clips from text and image prompts using PixVerse v4 fast

Train Ideogram on your photos, your style, your subject, your look, from a small set of reference images to images that feel consistently yours