- Controlnet
- Lora
Endpoint:
POST https://fal.run/fal-ai/z-image/turbo/controlnet
Endpoint ID: fal-ai/z-image/turbo/controlnetTry it in the Playground
Run this model interactively with your own prompts.
Quick Start
Input Schema
The prompt to generate an image from.
The size of the generated image. Default value:
autoPossible values: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9, autoThe number of inference steps to perform. Default value:
8Range: 1 to 8The same seed and the same prompt given to the same version of the model
will output the same image every time.
If
True, the media will be returned as a data URI and the output data won’t be available in the request history.The number of images to generate. Default value:
1Range: 1 to 4If set to true, the safety checker will be enabled. Default value:
trueThe format of the generated image. Default value:
"png"Possible values: jpeg, png, webpThe acceleration level to use. Default value:
"regular"Possible values: none, regular, highWhether to enable prompt expansion. Note: this will increase the price by 0.0025 credits per request.
URL of Image for ControlNet generation.
The scale of the controlnet conditioning. Default value:
0.75Range: 0 to 1The start of the controlnet conditioning.Range:
0 to 1The end of the controlnet conditioning. Default value:
0.8Range: 0 to 1What kind of preprocessing to apply to the image, if any. Default value:
nonePossible values: none, canny, depth, poseOutput Schema
The generated image files info.
The timings of the generation process.
Seed of the generated Image. It will be the same value of the one passed in the input or the randomly generated that was used in case none was passed.
Whether the generated images contain NSFW concepts.
The prompt used for generating the image.
Input Example
Output Example
Performance
Z-Image Turbo operates at roughly 3-5x more cost-effective rates than traditional ControlNet implementations by optimizing the 6B parameter base for rapid inference. At $0.0065 per megapixel, you’re running 153 megapixels per dollar, ideal for batch processing workflows where structural guidance matters more than photorealistic perfection.| Metric | Result | Context |
|---|---|---|
| Model Size | 6 billion parameters | Optimized for inference speed vs 70B+ alternatives |
| Inference Steps | 1-8 configurable | Default 8 steps balances quality and latency |
| Cost per Megapixel | $0.0065 | 153 megapixels per $1.00 on fal |
| Control Methods | 4 preprocessing modes | None, canny edge, depth map, pose detection |
| Batch Generation | Up to 4 images per request | Parallel generation with shared control input |
| Related Endpoints | Standard image-to-image, LoRA variants | ControlNet vs direct transformation vs custom training |
Structural Control Without Compromise
Z-Image Turbo routes your prompt through three parallel conditioning pathways: text embedding, reference image structure, and optional preprocessing filters. Unlike pure text-to-image models that hallucinate spatial relationships, this architecture extracts edge maps, depth channels, or skeletal poses from your input, then enforces those constraints during diffusion. What this means for you:- Configurable control strength (0-1 scale): Dial conditioning intensity from 0.9 for strict adherence to 0.3 for loose interpretation, critical when your reference image has good composition but needs significant style deviation
- Temporal control windowing: Apply ControlNet guidance only during steps 0-40% of generation (configurable start/end), letting early diffusion lock structure while late steps refine aesthetics
- Four preprocessing modes: Feed raw images directly or auto-extract canny edges (sharp boundaries), depth maps (spatial layering), or pose skeletons (human/character positioning) without external tools
- Multi-format output with safety: Generate 1-4 variants simultaneously in JPEG, PNG, or WebP with optional built-in safety filtering, batch testing style variations while maintaining structural consistency
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Z-Image Turbo 6B |
| Input Formats | Text prompt + reference image URL (JPEG, PNG, WebP, GIF, AVIF) |
| Output Formats | JPEG, PNG, WebP with configurable dimensions |
| Preprocessing Options | None, Canny edge detection, Depth estimation, Pose detection |
| Control Parameters | Scale (0-1), temporal start/end windowing, inference steps (1-8) |
| License | Commercial use permitted |
How It Stacks Up
**Z-Image Turbo Standard (0.0065 per megapixel, same base cost. Standard image-to-image prioritizes direct style transfer without intermediate edge/depth extraction, ideal for texture swaps and color grading where spatial relationships already match your target. ControlNet trades processing simplicity for precise geometric control when your reference structure needs enforcement. FASHN Virtual Try-On V1.5 – Z-Image Turbo ControlNet offers general-purpose structural conditioning across edge, depth, and pose modalities for diverse creative workflows. FASHN specializes in garment-to-body fitting with proprietary try-on algorithms optimized for fashion e-commerce, trading generality for domain-specific accuracy in clothing visualization.Related
- Z-Image Turbo — Image Generation
- Z Image Base — Image Generation
- Z Image Base (LoRA) — Image Generation
- Z-Image Turbo Seamless Tiling — Image Generation
Limitations
image_sizerestricted to:square_hd,square,portrait_4_3,portrait_16_9,landscape_4_3,landscape_16_9,autonum_inference_stepsrange: 1 to 8num_imagesrange: 1 to 4output_formatrestricted to:jpeg,png,webpaccelerationrestricted to:none,regular,highcontrol_scalerange: 0 to 1control_startrange: 0 to 1control_endrange: 0 to 1preprocessrestricted to:none,canny,depth,pose- Content moderation via safety checker