Search Page 24

Showing 28 of 1396 results

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

ocr

multimodal

vision

stable-audio-25/audio-to-audio

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

audio

audio-to-audio

live-portrait

Transfer expression from a video to a portrait.

Generate realistic images.

realism

diffusion

text-to-image

meshy/v5/multi-image-to-3d

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

SOTA Image Upscaler

z-image/turbo/image-to-image/lora

Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.

workflow-utilities/extract-nth-frame

FFMPEG Untility for Extracting nth Frame

image-to-image

flux-krea-lora/image-to-image

FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.

lora

style transfer

image-to-image

Generate high quality video clips with different effects using PixVerse v4.5

pixverse/v4.5/effects

Generate high quality video clips with different effects using PixVerse v4.5

image-to-video

recraft-20b

Recraft 20b is a new and affordable text-to-image model.

Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.

add-sound

video-to-video

pixelcut/video-background-removal

Pixelcut's Video Background Remover is an AI segmentation model that erases backgrounds frame by frame, with seamless temporal consistency.

ltx-2.3-quality/audio-to-video

Generate high-quality video with audio from audio, text and images using LTX-2.3

audio-to-video

photomaker

Customizing Realistic Human Photos via Stacked ID Embedding

meshy/v6-preview/text-to-3d

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

text-to-3d

moondream-next

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.

multimodal

vision

hidream-o1-image/dev/edit

Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.

image-to-image

Virtual clothing try-on (2 images: person + garment)

flux-2-lora-gallery/virtual-tryon

Virtual clothing try-on (2 images: person + garment)

stylized

transform

image-to-image

Create seamless transition between images using PixVerse v4.5

pixverse/v4.5/transition

Create seamless transition between images using PixVerse v4.5

stylized

transform

image-to-video

longcat-single-avatar/image-audio-to-video

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

image-to-video

audio-to-video

flux-vision-upscaler

Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.

image-to-image

wan-motion

Wan Motion is a streamlined character animation model that transfers motion from a driving video onto a reference character image. Based on Wan-Animate which preserves the original character's proportions, Simple uses pose retargeting to adapt the driving video's skeleton to match the reference character's body shape, producing more natural results when the two have different builds. It outputs at 720p with optimized defaults for fast, high-quality generation — just provide a video, an image, and an optional prompt.

video-to-video