Z Image Trainer Training

fal-ai/z-image-trainer
Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
Training
Commercial use

Input

Additional Settings

Customize your input with more control.

Your request will cost $2.26 per 1000-step training run. It scales per step, so a 2000-step training run costs $4.52. Note: 100 steps is the minimum step count, which will cost $0.226.

Training history

Note: these are the most recent training requests. For the full history, check the requests tab.

Z-Image Turbo LoRA trainer | [text-to-image]

Tongyi-MAI's Z-Image Turbo LoRA trainer delivers custom model fine-tuning at $2.26 per 1,000 training steps on a 6B parameter base. Trading generalist breadth for specialized precision, this training endpoint lets you encode specific visual styles or content patterns into reusable LoRA weights. Built for teams needing repeatable brand aesthetics, content creators maintaining visual consistency, and developers deploying style-controlled image generation at scale.

Use Cases: Brand-consistent image generation | Custom style deployment | Production visual workflows


Performance

Training cost scales linearly with step count, making iterative experimentation economically viable compared to training larger foundation models from scratch.

MetricResultContext
Base Model Size6B parametersZ-Image Turbo foundation optimized for speed
Training Cost$2.26 per 1,000 stepsScales linearly: 2,000 steps = $4.52, 5,000 steps = $11.30
Minimum Training100 steps ($0.226)Enables rapid prototyping iterations
Step Range100-10,000 stepsConfigurable in 100-step increments
Related EndpointsZ Image Text to Image, Z Image with LoRABase inference and LoRA-enhanced generation variants

Training Control That Adapts to Your Use Case

Z Image Trainer exposes three distinct training modes: content, style, and balanced. This lets you bias the LoRA toward subject matter preservation or artistic treatment depending on your application requirements.

What this means for you:

  • Flexible caption handling: Supply per-image text files (ROOT.txt naming convention) or fall back to a default caption for the entire dataset, eliminating preprocessing bottlenecks
  • Configurable learning rate: Adjust the 0.0001 default to control training aggressiveness, balancing convergence speed against overfitting risk
  • Training mode selection: Choose content focus for subject consistency, style focus for artistic transfer, or balanced for general-purpose adaptation
  • Production-ready outputs: Receive diffusers-compatible LoRA weights and configuration files ready for immediate deployment in inference workflows

Technical Specifications

SpecDetails
ArchitectureZ-Image Turbo
Input FormatsZIP archive (images + optional .txt captions per image)
Output FormatsDiffusers LoRA weights, JSON config file
Training Steps100-10,000 (configurable in 100-step increments)
LicenseCommercial use permitted

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

Z Image Text to Image ($0.039/image) – Z Image Trainer produces reusable LoRA weights for style consistency across unlimited generations, trading upfront training cost ($2.26 per 1,000 steps) for downstream inference efficiency. The base Z Image inference endpoint handles one-off generations where custom style encoding isn't required.

AuraFlow Text to Image ($0.055/image) – Z Image Trainer prioritizes speed-optimized training on a 6B parameter base for rapid iteration cycles. AuraFlow targets maximum output quality through a larger architecture, ideal for final production renders where training time investment isn't a constraint.