Search Page 25

pixverse/c1/text-to-video

Generate film-grade videos from text prompts with native audio, up to 1080p and 15 seconds, using PixVerse C1.

Generate film-grade videos from text prompts with native audio, up to 1080p and 15 seconds, using PixVerse C1.

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

vidu/q3/text-to-video/turbo

Vidu's Q3 Turbo Model.

stable-diffusion-v35-medium

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

z-image/turbo/controlnet

Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Generate video with audio from text using LTX-2

ltx-2-19b/text-to-video

Generate video with audio from text using LTX-2

minimax/preview/speech-2.5-hd

leffa/virtual-tryon

Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

speech

vidu/q1/reference-to-video

pixverse/v5.5/transition

Pixverse Transition

image-to-video

sam-3-1/video-rle

SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked.

Generate video clips from your multiple image references using Vidu Q1

stylized

transform

image-to-video

moondream-next/detection

MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.

multimodal

wan/v2.2-5b/text-to-video/fast-wan

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

z-image-trainer

Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

flux/srpo

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

text-to-image

Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

text to video

motion

bria/genfill

Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

image editing

image-preprocessors/lineart

stable-video

Generate short video clips from your images using SVD v1.1

image-to-video

Line art preprocessor.

ltx-2.3-quality/text-to-video

Generate high-quality video with audio from text using LTX-2.3

video

image-editing/hair-change

Experiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results.

stylized

transform

chatterbox/text-to-speech/multilingual

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

multilingual