Z Image Trainer Training
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Your request will cost $2.26 per 1000-step training run. It scales per step, so a 2000-step training run costs $4.52. Note: 100 steps is the minimum step count, which will cost $0.226.
Training history
Nothing here yet...
Fine-tune your training parameters and start right now.
Z-Image Turbo LoRA trainer | [text-to-image]
Tongyi-MAI's Z-Image Turbo LoRA trainer delivers custom model fine-tuning at $2.26 per 1,000 training steps on a 6B parameter base. Trading generalist breadth for specialized precision, this training endpoint lets you encode specific visual styles or content patterns into reusable LoRA weights. Built for teams needing repeatable brand aesthetics, content creators maintaining visual consistency, and developers deploying style-controlled image generation at scale.
Use Cases: Brand-consistent image generation | Custom style deployment | Production visual workflows
Performance
Training cost scales linearly with step count, making iterative experimentation economically viable compared to training larger foundation models from scratch.
| Metric | Result | Context |
|---|---|---|
| Base Model Size | 6B parameters | Z-Image Turbo foundation optimized for speed |
| Training Cost | $2.26 per 1,000 steps | Scales linearly: 2,000 steps = $4.52, 5,000 steps = $11.30 |
| Minimum Training | 100 steps ($0.226) | Enables rapid prototyping iterations |
| Step Range | 100-10,000 steps | Configurable in 100-step increments |
| Related Endpoints | Z Image Text to Image, Z Image with LoRA | Base inference and LoRA-enhanced generation variants |
Training Control That Adapts to Your Use Case
Z Image Trainer exposes three distinct training modes: content, style, and balanced. This lets you bias the LoRA toward subject matter preservation or artistic treatment depending on your application requirements.
What this means for you:
- Flexible caption handling: Supply per-image text files (ROOT.txt naming convention) or fall back to a default caption for the entire dataset, eliminating preprocessing bottlenecks
- Configurable learning rate: Adjust the 0.0001 default to control training aggressiveness, balancing convergence speed against overfitting risk
- Training mode selection: Choose content focus for subject consistency, style focus for artistic transfer, or balanced for general-purpose adaptation
- Production-ready outputs: Receive diffusers-compatible LoRA weights and configuration files ready for immediate deployment in inference workflows
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Z-Image Turbo |
| Input Formats | ZIP archive (images + optional .txt captions per image) |
| Output Formats | Diffusers LoRA weights, JSON config file |
| Training Steps | 100-10,000 (configurable in 100-step increments) |
| License | Commercial use permitted |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
Z Image Text to Image ($0.039/image) – Z Image Trainer produces reusable LoRA weights for style consistency across unlimited generations, trading upfront training cost ($2.26 per 1,000 steps) for downstream inference efficiency. The base Z Image inference endpoint handles one-off generations where custom style encoding isn't required.
AuraFlow Text to Image ($0.055/image) – Z Image Trainer prioritizes speed-optimized training on a 6B parameter base for rapid iteration cycles. AuraFlow targets maximum output quality through a larger architecture, ideal for final production renders where training time investment isn't a constraint.