Qwen Image 2512 Trainer V2 Training
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Your request will cost $0.00095 per step (minimum of 500 steps is charged). For $0.95 you can fine-tune a LoRA for 1000 steps.
Training history
Nothing here yet...
Fine-tune your training parameters and start right now.
Qwen-Image-2512 Trainer V2
Fine-tunes the Qwen-Image-2512 model using LoRA to teach it new subjects, objects, or styles from your images and captions.
Input Parameters
(required)
URL to a zip archive containing image + caption pairs. Each image (e.g., ) should have a matching caption file (e.g., ).
Supported formats: , , , , , , , , ,
Default:
Fallback caption for images without a file. If not set and a caption is missing, training fails.
Default:
| Dataset Size | Recommended Steps |
|---|---|
| 5-10 images | 500-1000 |
| 10-30 images | 1000-2000 |
| 30-100 images | 2000-4000 |
| 100+ images | 4000+ |
Default:
Use for slower/conservative learning, for faster/aggressive learning.
How the Training Works
Image-caption pairing: Each image (e.g., ) is paired with its caption file ( or ).
Aspect ratio bucketing: Images are assigned to the nearest bucket matching their aspect ratio, preserving natural proportions.
| Bucket (H×W) | AR | Orientation |
|---|---|---|
| 1344×576 | 3:7 | Portrait |
| 1280×720 | 9:16 | Portrait |
| 1248×832 | 2:3 | Portrait |
| 1152×864 | 3:4 | Portrait |
| 1152×896 | 7:9 | Portrait |
| 1024×1024 | 1:1 | Square |
| 896×1152 | 9:7 | Landscape |
| 864×1152 | 4:3 | Landscape |
| 832×1248 | 3:2 | Landscape |
| 720×1280 | 16:9 | Landscape |
| 576×1344 | 7:3 | Landscape |
Caption dropout: 5% of the time, captions are dropped during training to help the model generalize.
Tips for Good Results
Dataset
- Size: 5-10 images minimum, 15-30 optimal for subjects, 20-50 for styles
- Quality: High resolution (1024px+), good lighting, no watermarks
- Variety: Different poses, angles, lighting, backgrounds
Captions
Write specific, descriptive captions with consistent style:
- Good:
- Bad:
Tip: Try using image caption models like Moondream-3
Trigger phrases: For subject training, use a unique trigger in all captions (e.g., ). Include the same trigger at inference time.
Troubleshooting
| Problem | Signs | Solution |
|---|---|---|
| Overfitting | Outputs match training images exactly, poor generalization | Reduce , add more images |
| Underfitting | Outputs don't resemble training data | Increase , improve captions |
| Caption errors | Training fails | Add files or set |