Run the latest models all in one Sandbox 🏖️

Qwen Image 2512 Trainer V2 Training

fal-ai/qwen-image-2512-trainer-v2
Fast LoRA trainer for Qwen-Image-2512
Training
Commercial use

Input

Additional Settings

Customize your input with more control.

Your request will cost $0.00095 per step (minimum of 500 steps is charged). For $0.95 you can fine-tune a LoRA for 1000 steps.

Training history

Note: these are the most recent training requests. For the full history, check the requests tab.

Qwen-Image-2512 Trainer V2

Fine-tunes the Qwen-Image-2512 model using LoRA to teach it new subjects, objects, or styles from your images and captions.

Input Parameters

(required)

URL to a zip archive containing image + caption pairs. Each image (e.g., ) should have a matching caption file (e.g., ).

Supported formats: , , , , , , , , ,

Default:

Fallback caption for images without a file. If not set and a caption is missing, training fails.

Default:

Dataset SizeRecommended Steps
5-10 images500-1000
10-30 images1000-2000
30-100 images2000-4000
100+ images4000+

Default:

Use for slower/conservative learning, for faster/aggressive learning.

How the Training Works

Image-caption pairing: Each image (e.g., ) is paired with its caption file ( or ).

Aspect ratio bucketing: Images are assigned to the nearest bucket matching their aspect ratio, preserving natural proportions.

Bucket (H×W)AROrientation
1344×5763:7Portrait
1280×7209:16Portrait
1248×8322:3Portrait
1152×8643:4Portrait
1152×8967:9Portrait
1024×10241:1Square
896×11529:7Landscape
864×11524:3Landscape
832×12483:2Landscape
720×128016:9Landscape
576×13447:3Landscape

Caption dropout: 5% of the time, captions are dropped during training to help the model generalize.

Tips for Good Results

Dataset
  • Size: 5-10 images minimum, 15-30 optimal for subjects, 20-50 for styles
  • Quality: High resolution (1024px+), good lighting, no watermarks
  • Variety: Different poses, angles, lighting, backgrounds
Captions

Write specific, descriptive captions with consistent style:

  • Good:
  • Bad:

Tip: Try using image caption models like Moondream-3

Trigger phrases: For subject training, use a unique trigger in all captions (e.g., ). Include the same trigger at inference time.

Troubleshooting
ProblemSignsSolution
OverfittingOutputs match training images exactly, poor generalizationReduce , add more images
UnderfittingOutputs don't resemble training dataIncrease , improve captions
Caption errorsTraining failsAdd files or set