Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.
flux-2/turbo
text-to-image

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.

Generate sound effects using ElevenLabs advanced sound effects model.
elevenlabs/sound-effects/v2
text-to-audio

Generate sound effects using ElevenLabs advanced sound effects model.

sound
[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
wizper
speech-to-text

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

transcription
speech
An endpoint for personalized image generation using Flux as per given description.
flux-pulid
image-to-image

An endpoint for personalized image generation using Flux as per given description.

personalization
style transfer
Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles
qwen-image-edit-2511-multiple-angles
image-to-image

Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles

stylized
transform
lora
FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
flux-pro/v1/fill
image-to-image

FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

editing
Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
kling-video/v2.5-turbo/standard/image-to-video
image-to-video

Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

stylized
transform
Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.
kling-video/v3/standard/text-to-video
text-to-video

Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.
new
alibaba/happy-horse/reference-to-video
image-to-video

Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

stylized
transform
lipsync
Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.
kling-video/v2.1/pro/image-to-video
image-to-video

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.
flux-2/flash
text-to-image

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.
elevenlabs/tts/turbo-v2.5
text-to-speech

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

audio
Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model
gemini-3-pro-image-preview
text-to-image

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

realism
typography
Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.
flux-1/schnell
text-to-image

Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.

Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.
hunyuan3d-v3/image-to-3d
image-to-3d

Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.

Edit videos using Kling O3 from Kling Team!
kling-video/o3/pro/video-to-video/edit
video-to-video

Edit videos using Kling O3 from Kling Team!

Gemini 3.1 Flash Image (a.k.a. Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model
gemini-3.1-flash-image-preview/edit
image-to-image

Gemini 3.1 Flash Image (a.k.a. Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model

Upscale your images with AuraSR.
aura-sr
image-to-image

Upscale your images with AuraSR.

upscaling
high-res
Use ffmpeg capabilities to merge 2 or more videos.
ffmpeg-api/merge-videos
video-to-video

Use ffmpeg capabilities to merge 2 or more videos.

Generate videos from a first/last frame using Google's Veo 3.1 Fast
veo3.1/fast/first-last-frame-to-video
image-to-video

Generate videos from a first/last frame using Google's Veo 3.1 Fast

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
veed/fabric-1.0
image-to-video

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

lipsync
avatar
Enhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces.
recraft/upscale/crisp
image-to-image

Enhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces.

upscaling
Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
recraft/v3/text-to-image
text-to-image

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

vector
typography
style
Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.
elevenlabs/tts/multilingual-v2
text-to-audio

Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.

audio
Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.
kling-video/v2.6/pro/text-to-video
text-to-video

Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.

FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
flux-2-max/edit
image-to-image

FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.

flux2
image-editing
high-quality
Generate videos from a first and last framed using Google's Veo 3.1
veo3.1/first-last-frame-to-video
image-to-video

Generate videos from a first and last framed using Google's Veo 3.1

Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities
flux-pro/kontext/max/multi
image-to-image

Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities

Showing 85 to 112 of 1355 results