Run the latest models all in one Sandbox 🏖️

Qwen Image Layered Trainer Developer Guide

Explore all models

Train custom LoRAs for the Qwen-Image-Layered model using structured zip archives containing base images with corresponding transparent layers. The API accepts learning rate, step count, and caption parameters, returning LoRA weights and config files.

last updated
1/7/2026
edited by
Zachary Roth
read time
5 minutes
Qwen Image Layered Trainer Developer Guide

Training Custom Layer Decomposition Models

Image layer decomposition separates a composite image into distinct, independently editable components while preserving visual coherence. Recent advances in diffusion-based generative models have enabled sophisticated approaches to this problem, with layered representations proving essential for precise content creation workflows1. The Qwen Image Layered Trainer on fal provides API access to train specialized LoRA weights for custom layer separation tasks.

The trainer produces LoRA (Low-Rank Adaptation) weights that modify how the Qwen-Image-Layered model performs decomposition. By training on domain-specific examples, developers can teach the model custom separation patterns for architectural elements, product isolation, or design component extraction. This guide covers the complete workflow from data preparation through inference integration.

Prerequisites

Before starting, ensure you have:

  • A fal API key from your dashboard
  • Training images in PNG or WebP format with transparency
  • Python 3.8+ with fal_client or Node.js with @fal-ai/client

Python Setup:

import fal_client
import os

os.environ['FAL_KEY'] = 'your-api-key-here'

JavaScript Setup:

import { fal } from "@fal-ai/client";

fal.config({
  credentials: "your-api-key-here",
});

For detailed authentication options, see the quickstart documentation.

falMODEL APIs

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

falSERVERLESS

Scale custom models and apps to thousands of GPUs instantly

falCOMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Training Data Structure

The trainer requires a zip archive with specific naming conventions:

File PatternPurposeRequired
ROOT_start.EXTBase composite imageYes
ROOT_end.EXTFirst decomposed layerYes
ROOT_end2.EXT through ROOT_end8.EXTAdditional layers (up to 8 total)No
ROOT.txtCaption describing the decompositionNo

All images within a group must share the same root name, use matching layer counts, and be PNG or WebP format to support alpha channel transparency.

Pre-flight Validation Checklist:

Before submitting a training job, verify your zip archive meets these requirements:

  • All image groups have consistent layer counts
  • File names follow the exact ROOT_start/ROOT_end pattern
  • Images use PNG or WebP format only
  • Either caption files exist for each group or you provide a default_caption
  • Zip file is publicly accessible via URL

Python Integration

import fal_client
import time
from typing import Dict, Any

def train_layered_lora(
    data_url: str,
    learning_rate: float = 0.0001,
    steps: int = 1000,
    default_caption: str = None
) -> Dict[str, Any]:

    arguments = {
        "image_data_url": data_url,
        "learning_rate": learning_rate,
        "steps": steps
    }

    if default_caption:
        arguments["default_caption"] = default_caption

    handler = fal_client.submit(
        "fal-ai/qwen-image-layered-trainer",
        arguments=arguments
    )

    print(f"Training job submitted: {handler.request_id}")

    while True:
        status = handler.status()
        if status == "COMPLETED":
            return handler.get()
        elif status == "FAILED":
            raise Exception("Training failed")
        time.sleep(30)

result = train_layered_lora(
    data_url="https://your-storage.com/training-data.zip",
    steps=1000,
    default_caption="Product with transparent background layers"
)

lora_url = result['diffusers_lora_file']['url']

JavaScript Integration

import { fal } from "@fal-ai/client";

async function trainLayeredLoRA(config) {
  const {
    dataUrl,
    learningRate = 0.0001,
    steps = 1000,
    defaultCaption = null,
  } = config;

  const input = {
    image_data_url: dataUrl,
    learning_rate: learningRate,
    steps: steps,
  };

  if (defaultCaption) {
    input.default_caption = defaultCaption;
  }

  const result = await fal.subscribe("fal-ai/qwen-image-layered-trainer", {
    input: input,
    logs: true,
    onQueueUpdate: (update) => {
      if (update.status === "IN_PROGRESS") {
        console.log("Training in progress...");
      }
    },
  });

  return {
    loraUrl: result.diffusers_lora_file.url,
    configUrl: result.config_file.url,
  };
}

API Parameters

ParameterDefaultRangeDescription
image_data_urlRequiredURLPublicly accessible zip archive containing training data
learning_rate0.00010.00005 to 0.0002Lower values produce conservative adaptations; higher values suit dramatic separation patterns
steps1000100 to 10000Training iterations; more steps improve quality but increase time linearly
default_captionNoneStringFallback description when individual .txt files are missing

For current pricing, check the model page directly as rates may change.

Using Your Trained LoRA

After training completes, apply your LoRA weights using the inference endpoint at fal-ai/qwen-image-layered/lora:

result = fal_client.subscribe(
    "fal-ai/qwen-image-layered/lora",
    arguments={
        "image_url": "https://your-image.png",
        "num_layers": 4,
        "loras": [{"path": lora_url}]
    }
)

layers = result['images']  # Array of decomposed RGBA layer images

The inference endpoint accepts up to 3 LoRAs simultaneously, which are merged to produce the final decomposition. See the inference API documentation for complete parameter details.

Error Handling

from typing import Optional
import logging

def safe_train_lora(data_url: str, max_retries: int = 3, **kwargs) -> Optional[Dict]:
    for attempt in range(max_retries):
        try:
            return train_layered_lora(data_url, **kwargs)
        except fal_client.exceptions.ValidationError as e:
            logging.error(f"Invalid input: {e}")
            return None
        except fal_client.exceptions.RateLimitError:
            wait_time = 2 ** attempt * 60
            logging.warning(f"Rate limited, waiting {wait_time}s")
            time.sleep(wait_time)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(30)
    return None

Common Errors:

ErrorCauseSolution
ValidationErrorMalformed zip structureVerify file naming follows ROOT_start/ROOT_end pattern
ValidationErrorMissing captionsAdd .txt files or provide default_caption
ValidationErrorUnsupported formatConvert images to PNG or WebP
RateLimitErrorToo many concurrent requestsImplement exponential backoff

For additional error patterns, see the FAQ documentation.

Production Considerations

Asynchronous Processing: For training jobs, avoid blocking on completion. Use the Queue API to submit jobs and webhooks to receive results:

handler = fal_client.submit(
    "fal-ai/qwen-image-layered-trainer",
    arguments=arguments,
    webhook_url="https://your-server.com/webhook"
)
# Handler returns immediately; results delivered to webhook

Storage: Trained LoRA weights are hosted on fal infrastructure and accessible via the returned URL. For production deployments requiring persistent storage, download and host the weights in your own infrastructure.

Layer Complexity: Models trained with 2 to 3 layers converge faster than those handling 6 to 8 layer decompositions. Start with simpler structures when your use case permits.

Dataset Size: Keep individual training groups under 50 images for optimal performance. Larger datasets should be distributed across multiple training runs. Training time scales linearly with step count, so a 2000-step job takes approximately twice as long as a 1000-step job.

Training Data Best Practices

Effective layer decomposition training depends on high-quality input data. The model learns decomposition patterns from the relationships between your base images and their corresponding layers.

Caption Strategy: Descriptive, task-oriented captions outperform content-specific descriptions. "Product photography with transparent background and shadow layer" teaches decomposition better than "red sneaker on white." When training across diverse content, use the default_caption parameter to provide consistent task framing.

Layer Consistency: Maintain consistent semantic meaning for each layer position across your training set. If _end.png represents the primary subject in one image group, it should represent the primary subject in all groups. This consistency helps the model learn predictable decomposition behavior.

Resolution Considerations: The model accepts various input resolutions. While higher resolutions preserve more detail, they also increase training time. A resolution of 768x768 provides reasonable quality for most use cases. Match your training resolution to your expected inference resolution for best results.

Validation Set: Consider holding out 10-15% of your data to evaluate trained LoRA quality before production deployment. Compare decomposition results against your held-out ground truth to assess whether additional training steps would improve output.

Recently Added

References

  1. Yang, J., Liu, Q., Li, Y., et al. "Generative Image Layer Decomposition with Visual Effects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. https://openaccess.thecvf.com/content/CVPR2025/papers/Yang_Generative_Image_Layer_Decomposition_with_Visual_Effects_CVPR_2025_paper.pdf

about the author
Zachary Roth
A generative media engineer with a focus on growth, Zach has deep expertise in building RAG architecture for complex content systems.

Related articles