How to Use FLUX AI: Prompts, Settings & Tips

This guide covers everything you need to actually use FLUX well: how to pick the right model version, how to write prompts that produce what you're imagining, which settings matter, and the mistakes that quietly wreck your outputs.

Picking the Right FLUX Model First

Before you start thinking about the perfect prompt, you'll have to choose the right model. After all, FLUX isn't one model: it's a "family" with very different speed, quality, and cost trade-offs.

FLUX.1 [schnell] is the fast one. It generates images in 1-4 inference steps, which makes it great for prototyping, real-time apps, and situations where you're iterating quickly and don't need maximum fidelity. On fal, it costs $0.003 per megapixel.

FLUX.1 [dev] is the workhorse. Most community LoRAs target this model. It needs 20-50 inference steps for quality output, but the results are a clear step up from schnell. On fal, it runs at $0.025 per megapixel.

FLUX 1.1 [pro] Ultra is the high-resolution flagship. It generates up to 4-megapixel images natively, with a "Raw" mode that produces candid, less-processed photography aesthetics. API-only. $0.06 per image on fal.

FLUX.2 [pro] is the latest generation. It's built on a 32-billion-parameter rectified flow transformer with a new Mistral Small 3.2 (24B) vision-language model replacing the T5 text encoder used in FLUX.1. $0.03 for the first megapixel, $0.015 per additional megapixel on fal.

FLUX.2 [dev] is the open-weight version of FLUX.2. Same 32B architecture, same JSON prompting and HEX color features, but designed as the foundation for custom LoRA training and fine-tuning. Unlike [pro], it exposes inference steps and guidance scale parameters for developer control.

FLUX Kontext is for editing. Feed it an existing image with text instructions, and it modifies the image while preserving its identity. Character consistency, style transfer, text changes, outfit swaps. $0.04 per image for the pro tier on fal and $0.08/image for [max].

FLUX.2 Klein is the compact model family for real-time use. It comes in two sizes: a 4B version and a 9B version, each available as Base and Distilled variants. The 4B model fits in roughly 13GB of VRAM and runs on consumer GPUs like the RTX 3090 or RTX 4070.

FLUX Model Comparison

Model	Best For	Price on fal	License
FLUX.1 [schnell]	Fast prototyping, real-time apps, quick iteration	$0.003/megapixel	Apache 2.0
FLUX.1 [dev]	Community LoRAs, fine-tuning, balanced quality	$0.025/megapixel	Non-commercial (commercial license available)
FLUX 1.1 [pro] Ultra	High-resolution native output up to 4MP	$0.06/image	API-only, commercial
FLUX.2 [pro]	Production output with zero configuration	$0.03/megapixel (first MP), $0.015/additional MP	API-only, commercial
FLUX.2 [dev]	Fine-tuning, custom LoRAs, developer control	$0.012/megapixel	Non-commercial (commercial license available)
FLUX Kontext [pro]	Image editing, character consistency, style transfer	$0.04/image	API-only, commercial
FLUX.2 Klein 4B	Real-time apps, edge deployment, consumer GPUs	$0.009 per megapixel of input and output. Input images will be resized to 1MP.	Apache 2.0

Here's a quick decision framework:

Need speed? Use schnell or Klein 4B.
Need quality with fine-tuning support? Use FLUX.1 [dev] or FLUX.2 [dev].
Need production output with zero configuration? Use FLUX.2 [pro].
Need to edit existing images? Use Kontext.
Need the highest native resolution? Use 1.1 [pro] Ultra.

How FLUX reads your prompts (and why it's different)

If you're coming from Stable Diffusion, this is the single biggest adjustment.

FLUX.1 uses a dual text encoder system. One is CLIP, which handles image-text alignment. The other is T5-XXL, an 11-billion-parameter language model (FLUX uses just the encoder portion, roughly 4.6B parameters) that actually reads and understands your prompt as a sentence.

FLUX.2 swaps both encoders for Mistral Small 3.2, a 24B vision-language model that's even better at interpreting complex descriptions. The smaller FLUX.2 Klein variants use Qwen3 encoders instead.

What this means in practice: FLUX doesn't want keyword lists. It wants natural language.

Here's the same idea expressed both ways:

Keyword-style (Stable Diffusion habit): woman, red dress, beach, sunset, bokeh, 8k, masterpiece, best quality

Natural language (what FLUX actually wants): A woman in a red silk dress standing barefoot on a sandy beach at sunset, warm golden light behind her, shallow depth of field with soft bokeh across the water

The natural language version tells FLUX what's happening in the scene. It gives the model spatial relationships, material details, and lighting context that keyword lists can't convey.

And forget about Stable Diffusion weight syntax. Constructions like (emphasis)++ or (word:1.5) do nothing in FLUX. The model ignores them entirely.

How to structure a FLUX prompt

FLUX weighs earlier tokens more heavily than later ones. So your prompt structure matters, which is why you'll have to put the most important information first.

Here's the hierarchy that seems to work best:

Subject first. What is the image of? Start here, every time.
Then action or pose. What is the subject doing?
Then the environment. Where is this happening?
Then lighting. How is the scene lit?
Then style and technical specs. What camera, what look, and what mood?

Here's that hierarchy in a real prompt:

Portrait of a middle-aged marathon runner catching his breath, sweat on his forehead, city street at dawn with empty storefronts behind him, soft backlight with cool blue tones, shot on Sony A7IV with 85mm f/1.8 lens

Generated using FLUX 1.1 [pro] ultra on fal.

If you bury the subject at the end of a long description, FLUX may deprioritize it. This is the most common structural mistake we've seen new users make.

Pro tip for using FLUX 1.1 [pro] Ultra on fal: You can turn on "enhance prompt" to enhance the prompt for better results.

FLUX doesn't use negative prompts

This trips up a lot of people who are used to Stable Diffusion workflows. FLUX's architecture is guidance-distilled. It doesn't support negative prompts.

If you try to pass a negative_prompt parameter through the standard FluxPipeline, it'll throw an error.

Instead of telling the model what not to do, tell it what you want.

Don't do this: negative_prompt: "blurry, low quality, bad hands, deformed"

Do this instead: sharp focus, crisp detail, accurate hands, natural proportions

Positive phrasing works better anyway. You're giving the model a target to hit instead of a list of things to dodge.

fal^{MODEL APIs}

The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models

Build

fal^SERVERLESS

Scale custom models and apps to thousands of GPUs instantly

Deploy

fal^COMPUTE

A fully controlled GPU cloud for enterprise AI training + research

Train

Using FLUX in day-to-day operations: best practices

Prompt length: the sweet spot

The T5 encoder in FLUX.1 [dev] supports up to 512 tokens. FLUX.1 [schnell] is capped at 256 tokens. But longer doesn't mean better in either case.

40-50 words is the sweet spot for most use cases.

Very short prompts (under 10 words) get expanded internally by the model, which means FLUX fills in details from its training data.

Very long prompts (200 words and up) will get internally summarized, which means parts of your description may get compressed or dropped.

Rendering text in FLUX images

Text rendering is one of FLUX's strongest capabilities. Almost no other open-weight model comes close to its accuracy here, and FLUX.2 improves on it further.

To get the best results:

Always use quotation marks around the exact text you want. Example: A chalkboard sign that says "TODAY'S SPECIAL: LAVENDER LATTE"

Specify font characteristics separately. Example: Bold serif font in dark green, centered on a cream-colored banner

ALL CAPS in the prompt produces ALL CAPS in the image. If you write "HELLO" in the prompt, you'll get "HELLO" in the output. If you write "hello" you'll get "hello."

Shorter text renders more accurately. Single words and short phrases (2-5 words) are highly reliable. Full sentences can work, but the error rate goes up with length.

Clean backgrounds help enormously. Text over busy, detailed backgrounds is harder for any model to render cleanly. If text accuracy matters, simplify the area behind it.

Place text explicitly in your description. Example: At the top of the poster, "SUMMER SALE" in large block letters. Below it, "50% OFF" in smaller italic font. Telling FLUX where to put text gives you much more control than just mentioning it somewhere in the prompt.

Camera specifications unlock photorealism

This is one of the most effective techniques in FLUX prompting, and a lot of people skip it. Naming specific camera equipment shapes the entire character of the output.

Shot on Canon EOS R5, 85mm lens at f/2.8 tells FLUX to produce a look consistent with that camera system: shallow depth of field, specific rendering of skin tones, a particular quality of background blur.

Shot on iPhone 16 produces a completely different character. More casual, natural, candid. The kind of image you'd see in someone's camera roll.

Hasselblad X2D, medium format, natural light pushes toward editorial quality. Larger sensor rendering, finer tonal gradations, that medium-format feel.

You don't need to be a photographer to use this. Just pick a camera system that matches the feeling you want, and FLUX will interpret it.

Some combinations that work well:

For professional portraits: Canon EOS R5, 85mm f/1.4, studio lighting
For street photography: Fujifilm X-T5, 23mm f/2, available light
For product shots: Phase One IQ4, 120mm macro, softbox lighting
For casual content: iPhone 16, natural light, candid angle

Describe how light behaves, not just what it is

There's a big difference between naming a lighting condition and describing how light interacts with the scene.

Don't just write golden hour lighting.

Instead: warm golden sunset light streaming through the window, casting long shadows across the hardwood floor, with dust particles visible in the light beam

The second version tells FLUX what the light does in the space. It gives the model information about direction, quality, interaction with surfaces, and atmosphere.

A few more examples:

Harsh noon sun creating deep contrast shadows under the awnings is better than bright sunlight.
Soft overcast light wrapping evenly around the subject's face is better than diffused lighting.
Neon signs reflecting off wet asphalt in pinks and blues is better than neon lights at night.

The more you describe light as something that moves through and interacts with the scene, the more realistic and intentional your results will look.

How to handle JSON structured prompts in FLUX.2

FLUX.2 introduced JSON-formatted prompts that give you granular control over complex compositions. Instead of writing everything as a single block of natural language, you can structure the prompt as a JSON object:

{
  "scene": "A dimly lit jazz club in 1960s New York",
  "subjects": [
    {
      "type": "musician",
      "description": "African American man in a charcoal suit, salt-and-pepper beard",
      "pose": "playing an upright bass with closed eyes",
      "position": "foreground left"
    }
  ],
  "style": "Cinematic film photography",
  "color_palette": ["#2C1810", "#D4A574", "#8B4513"],
  "lighting": "Single warm spotlight from above, smoke in the air catching the light",
  "mood": "Intimate and contemplative",
  "composition": "rule of thirds",
  "camera": {
    "angle": "slightly low angle",
    "distance": "medium shot",
    "lens": "50mm"
  }
}

JSON excels at scenes with multiple subjects, precise positioning, and situations where you need attributes to stay attached to the right elements. For single-subject images, natural language prompts usually work just as well.

You can also specify exact colors using HEX codes. Include the word "color" or "hex" before the code:

A wall painted in color #2ECC71

The car in color #1A1A1A with accents in hex #FFD700

This is useful for brand assets where color precision matters.

How to reference images in prompts (FLUX.2)

FLUX.2 supports up to 10 reference images in a single generation. You can reference them directly in the prompt using @ notation or ordinal indexing:

@image1 wearing the outfit from @image2

Combine the style of @image1 with the composition of @image3

The person from image 1 standing in the setting from image 2

This opens up workflows like: generate a character once, then place them in different scenes while maintaining consistency. Or take a product photo and apply it to different backgrounds without ControlNet setups.

Note that some platforms impose lower limits on reference images. The model supports 10, but check your provider's documentation for any caps.

Example prompts that work

Here are tested prompts across different use cases. You can try them on fal's playground to see the results firsthand.

Photorealistic portrait: Close-up portrait of an elderly Japanese woman with deep smile lines, silver hair pulled back loosely, wearing an indigo linen shirt, soft window light from the left, shot on Fujifilm GFX100S, 110mm f/2, shallow depth of field

Generated using FLUX 1.1 [pro] ultra on fal.

Product photography: A matte black ceramic coffee mug on a light oak table, steam rising from fresh coffee, morning sunlight streaming from the right, clean minimal background, shot on Phase One IQ4, 120mm macro lens, f/8

Generated using FLUX.2 [pro] on fal.

Text rendering: A weathered wooden sign mounted on a brick wall that reads "OPEN DAILY" in hand-painted white capital letters, with "est. 1987" in smaller cursive below, afternoon sunlight, urban setting

Generated using FLUX.2 [pro] on fal.

Illustration style: A cozy bookshop interior in the style of Studio Ghibli, warm afternoon light filtering through tall windows, wooden shelves overflowing with books, a calico cat sleeping on a stack of novels, watercolor texture with soft edges

Generated using FLUX.1 [dev] on fal.

Architectural visualization: Modern minimalist house with floor-to-ceiling glass walls overlooking a calm lake at dusk, warm interior lighting contrasting with cool blue twilight outside, concrete and natural wood materials, shot from a low angle on a wide 24mm lens

Generated using FLUX.1 Kontext [max] on fal.

Marketing asset with text: A gradient background transitioning from color #1A1A2E to color #16213E, with "LAUNCH DAY" in large bold white sans-serif text centered in the upper third, and "March 15, 2026" in smaller light gray text below

Generated using FLUX.1 [schnell] on fal.

Editing images with FLUX Kontext

FLUX isn't just for generating images from scratch. FLUX.1 Kontext [pro] is a 12-billion-parameter multimodal flow transformer built specifically for in-context image editing. You feed it an existing image along with a text instruction, and it modifies the image while keeping everything else intact.

The basic pattern: provide an image_url and a prompt describing the edit.

result = fal_client.subscribe(
    "fal-ai/flux-pro/kontext",
    arguments={
        "prompt": "Change the background to a tropical beach at sunset",
        "image_url": "https://your-image-url.com/photo.jpg",
    },
)

The model reads both the image context and your text instruction, so edits tend to feel natural rather than pasted-on.

Here are some practical editing prompts that work well:

Changing backgrounds: Place the subject in a modern office with floor-to-ceiling windows overlooking a city skyline
Swapping outfits: Change the person's clothing to a navy blue suit with a white shirt and no tie
Editing text in images: Change "SALE" to "SOLD OUT" on the storefront sign
Style transfer: Transform this photo into a pencil sketch on textured paper with soft shading
Adding objects: Add a steaming cup of coffee on the table next to the laptop

Generated using FLUX.1 Kontext [pro] on fal.

Kontext [pro] costs $0.04 per image on fal with the [dev] variant running at $0.025 per megapixel.

How to Use FLUX AI: Prompts, Settings, and Practical Tips for Better Images