Ideogram Layerize takes an existing flat graphic, removes text, and returns structured text containers you can edit/recompose in html or json format.
ideogram/v3/layerize-text
image-to-image

Ideogram Layerize takes an existing flat graphic, removes text, and returns structured text containers you can edit/recompose in html or json format.

stylized
transform
typography
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
florence-2-large/open-vocabulary-detection
image-to-image

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

multimodal
vision
detection
Create detailed, fully-textured 3D models with text
hunyuan-3d/v3.1/rapid/text-to-3d
text-to-3d

Create detailed, fully-textured 3D models with text

3d
Replace or dub audio on an existing video with high-accuracy avatar-inference lip-sync.
heygen/v3/lipsync/precision
video-to-video

Replace or dub audio on an existing video with high-accuracy avatar-inference lip-sync.

lipsync
stylized
transform
Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
moondream2/object-detection
vision

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

image-to-image
Wan 2.5 text-to-image model.
wan-25-preview/text-to-image
text-to-image

Wan 2.5 text-to-image model.

Remove unwanted objects or people from your photos while seamlessly blending the background.
image-editing/object-removal
image-to-image

Remove unwanted objects or people from your photos while seamlessly blending the background.

stylized
transform
SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
sam-3/image-rle
image-to-image

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

segmentation
rle
real-time
Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
flux-krea-lora
text-to-image

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

lora
personalization
Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
z-image/turbo/image-to-image/lora
image-to-image

Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

turbo
z-image
fast
Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.
new
hidream-o1-image/dev
text-to-image

Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.

Replace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
ideogram/v3/replace-background
image-to-image

Replace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.

Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
luma-photon
text-to-image

Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Generate video with audio from audio, text and images using LTX-2
ltx-2-19b/audio-to-video
audio-to-video

Generate video with audio from audio, text and images using LTX-2

Create high-fidelity video with audio from images with LTX-2 Pro
ltx-2/image-to-video
image-to-video

Create high-fidelity video with audio from images with LTX-2 Pro

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
wan/v2.2-5b/text-to-video
text-to-video

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

SOTA Image Upscaler
ccsr
image-to-image

SOTA Image Upscaler

upscaling
FFMPEG Utilities to Scale Videos
workflow-utilities/scale-video
video-to-video

FFMPEG Utilities to Scale Videos

Heygen Translate Model with Extreme Precision
heygen/v2/translate/precision
video-to-video

Heygen Translate Model with Extreme Precision

Generate character-consistent videos from reference images using PixVerse C1, with subject and background references.
pixverse/c1/reference-to-video
image-to-video

Generate character-consistent videos from reference images using PixVerse C1, with subject and background references.

video-generation
reference-to-video
pixverse
Generate long, expressive multi-voice speech using Microsoft's powerful TTS
vibevoice/7b
text-to-speech

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

multi-speaker
podcast
Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.
orpheus-tts
text-to-speech

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.

text to speech
voice synthesis
high-fidelity
Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
kling-video/o1/standard/reference-to-video
image-to-video

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Rig humanoid 3D models from GLB URLs with Meshy, returning rigged GLB/FBX files plus basic   animations.
new
meshy/rigging
3d-to-3d

Rig humanoid 3D models from GLB URLs with Meshy, returning rigged GLB/FBX files plus basic animations.

rigging
Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.
hunyuan-video
text-to-video

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.

motion
Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.
meshy/v6/text-to-3d
text-to-3d

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

Wan Effects generates high-quality videos with popular effects from images
wan-effects
image-to-video

Wan Effects generates high-quality videos with popular effects from images

motion
effects
Juggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.
rundiffusion-fal/juggernaut-flux-lora
text-to-image

Juggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.

image generation
Showing 505 to 532 of 1354 results