Qwen Image Layered Trainer Developer Guide

Training Custom Layer Decomposition Models

Image layer decomposition separates a composite image into distinct, independently editable components while preserving visual coherence. Recent advances in diffusion-based generative models have enabled sophisticated approaches to this problem, with layered representations proving essential for precise content creation workflows¹. The Qwen Image Layered Trainer on fal provides API access to train specialized LoRA weights for custom layer separation tasks.

The trainer produces LoRA (Low-Rank Adaptation) weights that modify how the Qwen-Image-Layered model performs decomposition. By training on domain-specific examples, developers can teach the model custom separation patterns for architectural elements, product isolation, or design component extraction. This guide covers the complete workflow from data preparation through inference integration.

Prerequisites

Before starting, ensure you have:

A fal API key from your dashboard
Training images in PNG or WebP format with transparency
Python 3.8+ with fal_client or Node.js with @fal-ai/client

Python Setup:

import fal_client
import os

os.environ['FAL_KEY'] = 'your-api-key-here'

JavaScript Setup:

import { fal } from "@fal-ai/client";

fal.config({
  credentials: "your-api-key-here",
});

For detailed authentication options, see the quickstart documentation.

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Training Data Structure

The trainer requires a zip archive with specific naming conventions:

File Pattern	Purpose	Required
`ROOT_start.EXT`	Base composite image	Yes
`ROOT_end.EXT`	First decomposed layer	Yes
`ROOT_end2.EXT` through `ROOT_end8.EXT`	Additional layers (up to 8 total)	No
`ROOT.txt`	Caption describing the decomposition	No

All images within a group must share the same root name, use matching layer counts, and be PNG or WebP format to support alpha channel transparency.

Pre-flight Validation Checklist:

Before submitting a training job, verify your zip archive meets these requirements:

All image groups have consistent layer counts
File names follow the exact ROOT_start/ROOT_end pattern
Images use PNG or WebP format only
Either caption files exist for each group or you provide a default_caption
Zip file is publicly accessible via URL

Python Integration

import fal_client
import time
from typing import Dict, Any

def train_layered_lora(
    data_url: str,
    learning_rate: float = 0.0001,
    steps: int = 1000,
    default_caption: str = None
) -> Dict[str, Any]:

    arguments = {
        "image_data_url": data_url,
        "learning_rate": learning_rate,
        "steps": steps
    }

    if default_caption:
        arguments["default_caption"] = default_caption

    handler = fal_client.submit(
        "fal-ai/qwen-image-layered-trainer",
        arguments=arguments
    )

    print(f"Training job submitted: {handler.request_id}")

    while True:
        status = handler.status()
        if status == "COMPLETED":
            return handler.get()
        elif status == "FAILED":
            raise Exception("Training failed")
        time.sleep(30)

result = train_layered_lora(
    data_url="https://your-storage.com/training-data.zip",
    steps=1000,
    default_caption="Product with transparent background layers"
)

lora_url = result['diffusers_lora_file']['url']

JavaScript Integration

import { fal } from "@fal-ai/client";

async function trainLayeredLoRA(config) {
  const {
    dataUrl,
    learningRate = 0.0001,
    steps = 1000,
    defaultCaption = null,
  } = config;

  const input = {
    image_data_url: dataUrl,
    learning_rate: learningRate,
    steps: steps,
  };

  if (defaultCaption) {
    input.default_caption = defaultCaption;
  }

  const result = await fal.subscribe("fal-ai/qwen-image-layered-trainer", {
    input: input,
    logs: true,
    onQueueUpdate: (update) => {
      if (update.status === "IN_PROGRESS") {
        console.log("Training in progress...");
      }
    },
  });

  return {
    loraUrl: result.diffusers_lora_file.url,
    configUrl: result.config_file.url,
  };
}

API Parameters

Parameter	Default	Range	Description
`image_data_url`	Required	URL	Publicly accessible zip archive containing training data
`learning_rate`	0.0001	0.00005 to 0.0002	Lower values produce conservative adaptations; higher values suit dramatic separation patterns
`steps`	1000	100 to 10000	Training iterations; more steps improve quality but increase time linearly
`default_caption`	None	String	Fallback description when individual .txt files are missing

For current pricing, check the model page directly as rates may change.

Using Your Trained LoRA

After training completes, apply your LoRA weights using the inference endpoint at fal-ai/qwen-image-layered/lora:

result = fal_client.subscribe(
    "fal-ai/qwen-image-layered/lora",
    arguments={
        "image_url": "https://your-image.png",
        "num_layers": 4,
        "loras": [{"path": lora_url}]
    }
)

layers = result['images']  # Array of decomposed RGBA layer images

The inference endpoint accepts up to 3 LoRAs simultaneously, which are merged to produce the final decomposition. See the inference API documentation for complete parameter details.

Error Handling

from typing import Optional
import logging

def safe_train_lora(data_url: str, max_retries: int = 3, **kwargs) -> Optional[Dict]:
    for attempt in range(max_retries):
        try:
            return train_layered_lora(data_url, **kwargs)
        except fal_client.exceptions.ValidationError as e:
            logging.error(f"Invalid input: {e}")
            return None
        except fal_client.exceptions.RateLimitError:
            wait_time = 2 ** attempt * 60
            logging.warning(f"Rate limited, waiting {wait_time}s")
            time.sleep(wait_time)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(30)
    return None

Common Errors:

Error	Cause	Solution
ValidationError	Malformed zip structure	Verify file naming follows `ROOT_start`/`ROOT_end` pattern
ValidationError	Missing captions	Add .txt files or provide `default_caption`
ValidationError	Unsupported format	Convert images to PNG or WebP
RateLimitError	Too many concurrent requests	Implement exponential backoff

For additional error patterns, see the FAQ documentation.

Production Considerations

Asynchronous Processing: For training jobs, avoid blocking on completion. Use the Queue API to submit jobs and webhooks to receive results:

handler = fal_client.submit(
    "fal-ai/qwen-image-layered-trainer",
    arguments=arguments,
    webhook_url="https://your-server.com/webhook"
)
# Handler returns immediately; results delivered to webhook

Storage: Trained LoRA weights are hosted on fal infrastructure and accessible via the returned URL. For production deployments requiring persistent storage, download and host the weights in your own infrastructure.

Layer Complexity: Models trained with 2 to 3 layers converge faster than those handling 6 to 8 layer decompositions. Start with simpler structures when your use case permits.

Dataset Size: Keep individual training groups under 50 images for optimal performance. Larger datasets should be distributed across multiple training runs. Training time scales linearly with step count, so a 2000-step job takes approximately twice as long as a 1000-step job.

Training Data Best Practices

Effective layer decomposition training depends on high-quality input data. The model learns decomposition patterns from the relationships between your base images and their corresponding layers.

Caption Strategy: Descriptive, task-oriented captions outperform content-specific descriptions. "Product photography with transparent background and shadow layer" teaches decomposition better than "red sneaker on white." When training across diverse content, use the default_caption parameter to provide consistent task framing.

Layer Consistency: Maintain consistent semantic meaning for each layer position across your training set. If _end.png represents the primary subject in one image group, it should represent the primary subject in all groups. This consistency helps the model learn predictable decomposition behavior.

Resolution Considerations: The model accepts various input resolutions. While higher resolutions preserve more detail, they also increase training time. A resolution of 768x768 provides reasonable quality for most use cases. Match your training resolution to your expected inference resolution for best results.

Validation Set: Consider holding out 10-15% of your data to evaluate trained LoRA quality before production deployment. Compare decomposition results against your held-out ground truth to assess whether additional training steps would improve output.