Personalization Training fine-tuning

fal-ai/personalization
Train a model to generate images based on photos of a person.
Training
Commercial use

About

Fine Tune

1. Calling the API#

Install the client#

The client provides a convenient way to interact with the model API.

npm install --save @fal-ai/client

Setup your API Key#

Set FAL_KEY as an environment variable in your runtime.

export FAL_KEY="YOUR_API_KEY"

Submit a request#

The client API handles the API submit protocol. It will handle the request status updates and return the result when the request is completed.

import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/personalization", {
  input: {
    images_data_url: "",
    photo_class: "Man"
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === "IN_PROGRESS") {
      update.logs.map((log) => log.message).forEach(console.log);
    }
  },
});
console.log(result.data);
console.log(result.requestId);

2. Authentication#

The API uses an API Key for authentication. It is recommended you set the FAL_KEY environment variable in your runtime when possible.

API Key#

In case your app is running in an environment where you cannot set environment variables, you can set the API Key manually as a client configuration.
import { fal } from "@fal-ai/client";

fal.config({
  credentials: "YOUR_FAL_KEY"
});

3. Queue#

Submit a request#

The client API provides a convenient way to submit requests to the model.

import { fal } from "@fal-ai/client";

const { request_id } = await fal.queue.submit("fal-ai/personalization", {
  input: {
    images_data_url: "",
    photo_class: "Man"
  },
  webhookUrl: "https://optional.webhook.url/for/results",
});

Fetch request status#

You can fetch the status of a request to check if it is completed or still in progress.

import { fal } from "@fal-ai/client";

const status = await fal.queue.status("fal-ai/personalization", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b",
  logs: true,
});

Get the result#

Once the request is completed, you can fetch the result. See the Output Schema for the expected result format.

import { fal } from "@fal-ai/client";

const result = await fal.queue.result("fal-ai/personalization", {
  requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b"
});
console.log(result.data);
console.log(result.requestId);

4. Files#

Some attributes in the API accept file URLs as input. Whenever that's the case you can pass your own URL or a Base64 data URI.

Data URI (base64)#

You can pass a Base64 data URI as a file input. The API will handle the file decoding for you. Keep in mind that for large files, this alternative although convenient can impact the request performance.

Hosted files (URL)#

You can also pass your own URLs as long as they are publicly accessible. Be aware that some hosts might block cross-site requests, rate-limit, or consider the request as a bot.

Uploading files#

We provide a convenient file storage that allows you to upload files and use them in your requests. You can upload files using the client API and use the returned URL in your requests.

import { fal } from "@fal-ai/client";

const file = new File(["Hello, World!"], "hello.txt", { type: "text/plain" });
const url = await fal.storage.upload(file);

Read more about file handling in our file upload guide.

5. Schema#

Input#

images_data_url string* required

URL to zip archive with images of a consistent style. Try to use at least 10 images, although more is better.

data_archive_format string

File format to archive training artifacts

captions_file_url string

URL to a jsonl file with captions. Each line should contain a json object with a 'file_name' field that matches a file name in the images_data_url archive. It should also have a 'text' field with the caption. The captions should have TOK, TOK1, etc in them.

The file should have lines that look like this:

{"file_name": "image1.jpg", "text": "In the style of TOK A picture of a cat."} {"file_name": "image2.jpg", "text": "In the style of TOK A picture of a dog."}

If a caption file is not provided captions will be generated with Llava with the TOK prepended to the start.

caption_column string

The column in the captions file that contains the captions. Default is text. Default value: "text"

instance_prompt string

The prompt to use for generating the image. Default to None and per image captions. Default value: "A photograph of a TOK"

rank integer

Rank of the model. Default is 32. Default value: 32

model_url string

Path to pretrained model or model identifier from huggingface.co/models. Default is stabilityai/stable-diffusion-xl-base-1.0 Default value: "stabilityai/stable-diffusion-xl-base-1.0"

vae_url string

Path to pretrained VAE model with better numerical stability. Default is madebyollin/sdxl-vae-fp16-fix Default value: "madebyollin/sdxl-vae-fp16-fix"

revision string

Revision of pretrained model identifier from huggingface.co/models. Default is None.

variant string

Variant of the model files of the pretrained model identifier from huggingface.co/models. Default is fp16.

token_abstraction string

Identifier specifying the instance. Default is TOK Default value: "TOK"

num_new_tokens_per_abstraction integer

Number of new tokens inserted to the tokenizers per token_abstraction identifier. Default is 2. Default value: 2

seed integer

A seed for reproducible training. Default is 42. Default value: 42

resolution_width integer

The resolution for the width for input images. Default is 768 Default value: 768

resolution_height integer

The resolution for the height for input images. Default is 768 Default value: 768

center_crop boolean

Whether to center crop input images. Default is False.

random_flip boolean

Whether to randomly flip images horizontally. Default is False.

train_text_encoder boolean

Whether to train the text encoder. Default is False since textual inversion is used by default.

num_train_epochs integer

Number of training epochs. Default is None in which case max_train_steps is used.

max_train_steps integer

Total number of training steps to perform. Default is 1000. Default value: 1000

learning_rate float

Initial learning rate for the unet. Default is 8e-5 Default value: 0.00008

text_encoder_lr float

Text encoder learning rate. Default is 8e-5. Default value: 0.00008

lr_scheduler string

The scheduler type to use. Default is constant. Default value: "constant"

snr_gamma float

SNR weighting gamma for rebalancing the loss. Default value: 0.5

lr_warmup_steps integer

Number of steps for the warmup in the lr scheduler. Default is 500. Default value: 500

lr_num_cycles integer

Number of hard resets in the lr scheduler. Default is 1. Default value: 1

lr_power float

Power factor of the polynomial scheduler. Default is 1.0. Default value: 1

train_text_encoder_ti boolean

Whether to use textual inversion. Default is True Default value: true

train_text_encoder_ti_frac float

Percentage of epochs to perform textual inversion. Default is 0.5 Default value: 0.5

train_text_encoder_frac float

Percentage of epochs to perform text encoder tuning. Default is 1.0. Default value: 1

optimizer string

The optimizer type to use. Default is prodigy. Default value: "adamw"

adam_beta1 float

The beta1 parameter for the Adam optimizer. Default is 0.9. Default value: 0.9

adam_beta2 float

The beta2 parameter for the Adam optimizer. Default is 0.999. Default value: 0.999

prodigy_beta3 float

Coefficients for Prodigy optimizer. Default is None.

prodigy_decouple boolean

Use AdamW style decoupled weight decay. Default is True. Default value: true

adam_weight_decay float

Weight decay for unet params. Default is 1e-4. Default value: 0.0001

adam_weight_decay_text_encoder float

Weight decay for text encoder. Default is 1e-3. Default value: 0.001

adam_epsilon float

Epsilon value for the optimizer. Default value: 1e-8

prodigy_use_bias_correction boolean

Use bias correction for Prodigy optimizer. Default is True. Default value: true

prodigy_safeguard_warmup boolean

Remove lr from the denominator of D estimate for Prodigy optimizer. Default value: true

batch_size integer

Batch size for training. Default is 1. Default value: 6

caption_dropout float

Percentage of captions to drop. Default is 0.0.

skip_caption_generation boolean

Whether to skip caption generation. Default is False. This only applies if no captions file is provided.

max_grad_norm float

Maximum gradient norm for clipping. Default is 1.0. Default value: 1

with_prior_preservation boolean

Whether to use prior preservation loss. Default is true. Default value: true

prior_loss_weight float

Weight of the prior preservation loss. Default is 1.0. Default value: 1

photo_class PhotoClassEnum* required

The class of the photo. Default is Man.

Possible enum values: Man, Woman, Person, Boy, Girl, Baby

target_unet_modules list<string>

The target module fors the unet. Default is ["to_k", "to_q", "to_v", "to_out.0", "conv1", "conv2"]. Default value: to_k,to_q,to_v,to_out.0,conv1,conv2

target_text_encoder_modules list<string>

The target module fors the text encoder. Default is ["q_proj", "k_proj", "v_proj", "out_proj"]. Default value: q_proj,k_proj,v_proj,out_proj

cache_latents boolean

Whether to cache latents. Default is False

use_lora boolean

Whether to use LORA. Default is True Default value: true

debug_dataset boolean

Directory to save debug images. Default is False.

random_crop_offset_x integer

Random crop offset for x. Default is 0.

random_crop_offset_y integer

Random crop offset for y. Default is 0.

clip_seg_mask_prompt string

Clip the segmentation mask prompt. Default is head. Default value: "head"

class_image_mask_style ClassImageMaskStyleEnum

The style of the mask for the class image. Default is 'invert'. Default value: "normal"

Possible enum values: none, normal, invert

clip_seg_mask_temperature float

Clip the segmentation mask temperature. Default is 1.0. Default value: 1

clip_seg_mask_bias float

Clip the segmentation mask bias. Default is 0.001. Default value: 0.001

random_rotation_start integer

Random rotation start. Default is -2. Default value: -2

random_rotation_end integer

Random rotation end. Default is 2. Default value: 2

use_dora boolean

Whether to use DORA. Default is True. Default value: true

lora_type LoraTypeEnum

The type of LORA to use. Default is 'lora'. Default value: "lora"

Possible enum values: lora, lokr, loha

face_aware_cropping boolean

Whether to use face aware cropping. Default is True. Default value: true

noise_offset float

Noise offset. Default is 0.0.

gradient_accumulation_steps integer

Gradient accumulation steps. Default is 1. Default value: 1

upscale_vae_32bit boolean

Upscale VAE to 32bit. Default is True Default value: true

debug_loss_masks boolean

Debug loss masks. Default is False

max_timestep_trained integer

Maximum timestep trained. Default is 1000. Default value: 1000

min_timestep_trained integer

Minimum timestep trained. Default is 0.

disable_unet_during_ti_training boolean

Disable unet during textual inversion training. Default is False.

{
  "images_data_url": "",
  "caption_column": "text",
  "instance_prompt": "A photograph of a TOK",
  "rank": 32,
  "model_url": "stabilityai/stable-diffusion-xl-base-1.0",
  "vae_url": "madebyollin/sdxl-vae-fp16-fix",
  "token_abstraction": "TOK",
  "num_new_tokens_per_abstraction": 2,
  "seed": 42,
  "resolution_width": 768,
  "resolution_height": 768,
  "max_train_steps": 1000,
  "learning_rate": 0.00008,
  "text_encoder_lr": 0.00008,
  "lr_scheduler": "constant",
  "snr_gamma": 0.5,
  "lr_warmup_steps": 500,
  "lr_num_cycles": 1,
  "lr_power": 1,
  "train_text_encoder_ti": true,
  "train_text_encoder_ti_frac": 0.5,
  "train_text_encoder_frac": 1,
  "optimizer": "adamw",
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "prodigy_decouple": true,
  "adam_weight_decay": 0.0001,
  "adam_weight_decay_text_encoder": 0.001,
  "adam_epsilon": 1e-8,
  "prodigy_use_bias_correction": true,
  "prodigy_safeguard_warmup": true,
  "batch_size": 6,
  "max_grad_norm": 1,
  "with_prior_preservation": true,
  "prior_loss_weight": 1,
  "photo_class": "Man",
  "target_unet_modules": [
    "to_k",
    "to_q",
    "to_v",
    "to_out.0",
    "conv1",
    "conv2"
  ],
  "target_text_encoder_modules": [
    "q_proj",
    "k_proj",
    "v_proj",
    "out_proj"
  ],
  "use_lora": true,
  "clip_seg_mask_prompt": "head",
  "class_image_mask_style": "normal",
  "clip_seg_mask_temperature": 1,
  "clip_seg_mask_bias": 0.001,
  "random_rotation_start": -2,
  "random_rotation_end": 2,
  "use_dora": true,
  "lora_type": "lora",
  "face_aware_cropping": true,
  "gradient_accumulation_steps": 1,
  "upscale_vae_32bit": true,
  "max_timestep_trained": 1000
}

Output#

diffusers_lora_file File

URL to the trained diffusers lora weights.

kohya_lora_file File

URL to the trained kohya lora weights.

unet_file File

URL to the trained unet weights.

text_encoder_1_file File

URL to the trained text encoder weights.

text_encoder_2_file File

URL to the trained text encoder weights.

embeddings_file File

URL to the trained text embeddings if .

config_file File* required

URL to the training configuration file.

debug_dataset File

URL to the debug dataset.

debug_masks File

URL to the debug masks.

{
  "config_file": {
    "url": "",
    "content_type": "image/png",
    "file_name": "z9RV14K95DvU.png",
    "file_size": 4404019
  }
}

Other types#

File#

url string* required

The URL where the file can be downloaded from.

content_type string

The mime type of the file.

file_name string

The name of the file. It will be auto-generated if not provided.

file_size integer

The size of the file in bytes.

file_data string

File data