GLM-Image combines a 9B-parameter autoregressive generator with a 7B-parameter diffusion decoder to produce images with accurate text rendering. Access it through fal at $0.15 per megapixel with support for text-to-image, style transfer, and multi-subject consistency.
What GLM-Image Does Differently
GLM-Image generates images with accurate embedded text, a capability where pure diffusion models consistently struggle. The model combines a 9B-parameter autoregressive generator (initialized from GLM-4-9B) with a 7B-parameter diffusion decoder using a single-stream DiT architecture. The autoregressive component handles semantic understanding and text layout, while the diffusion decoder synthesizes visual details.
This hybrid approach addresses a documented limitation in text-to-image generation. Diffusion models face challenges with long-range spatial dependencies required for coherent text rendering1. GLM-Image's architecture overcomes this by using the autoregressive component for global composition before passing to the diffusion decoder for visual refinement. The practical result: you can request specific text in images, such as poster headlines or product labels, and receive accurate outputs without extensive prompt engineering.
API Setup and Authentication
Install the fal client library. Note that @fal-ai/serverless-client has been deprecated in favor of @fal-ai/client.
For Python:
pip install fal-client
For JavaScript/TypeScript:
npm install --save @fal-ai/client
Configure authentication through environment variables:
export FAL_KEY="your-api-key-here"
The GLM-Image endpoints are fal-ai/glm-image for text-to-image and fal-ai/glm-image/image-to-image for image editing workflows.
Basic Integration
Here is a minimal Python implementation:
import fal_client
result = fal_client.subscribe(
"fal-ai/glm-image",
arguments={
"prompt": "A conference poster with the headline 'AI Summit 2026' in bold sans-serif",
"image_size": "landscape_16_9",
"num_inference_steps": 30
}
)
print(result['images'][0]['url'])
For JavaScript:
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/glm-image", {
input: {
prompt:
"A conference poster with the headline 'AI Summit 2026' in bold sans-serif",
image_size: "landscape_16_9",
num_inference_steps: 30,
},
});
console.log(result.data.images[0].url);
The subscribe method handles job submission, polling, and result retrieval. Generation time varies based on resolution and inference steps.
falMODEL APIs
The fastest, cheapest and most reliable way to run genAI models. 1 API, 100s of models
Request Parameters
The following parameters are documented in fal's API schema:
| Parameter | Type | Default | Description |
|---|---|---|---|
| prompt | string | required | Text description; supports specific text rendering requests |
| image_size | enum or object | square_hd | Preset sizes or custom {width, height} with values 512-2048, divisible by 32 |
| num_inference_steps | integer | 30 | Denoising steps; range 10-100 |
| guidance_scale | float | 1.5 | Prompt adherence; higher values follow prompts more literally |
| seed | integer | random | Fixed seed ensures reproducible outputs |
| num_images | integer | 1 | Batch size (1-4) |
| enable_safety_checker | boolean | true | NSFW content filtering |
| output_format | enum | jpeg | jpeg or png |
| sync_mode | boolean | false | Returns base64 data URI instead of URL |
| enable_prompt_expansion | boolean | false | LLM enhancement of prompts |
Preset image sizes include:
- square_hd, square (1:1)
- landscape_16_9, landscape_4_3, landscape_3_2, landscape_hd
- portrait_16_9, portrait_4_3, portrait_3_2, portrait_hd
The guidance_scale default of 1.5 is lower than typical diffusion models because the autoregressive component provides semantic guidance. The underlying model uses sampling with high temperature (approximately 0.9-0.95 depending on implementation), meaning outputs vary across runs even with identical prompts unless you specify a seed.
Image-to-Image Workflows
For editing and style transfer, use the fal-ai/glm-image/image-to-image endpoint with image_urls:
result = fal_client.subscribe(
"fal-ai/glm-image/image-to-image",
arguments={
"prompt": "Apply watercolor style while preserving the subject",
"image_urls": ["https://example.com/photo.jpg"],
"num_inference_steps": 35
}
)
The endpoint accepts up to 4 reference images. The first image typically serves as the primary subject, with additional images providing style or compositional guidance.
Error Handling
Production applications should handle three failure categories:
- Validation errors (invalid parameters): Do not retry
- Content policy violations (safety checker triggered): Do not retry
- Transient errors (network, capacity): Retry with backoff
import time
def generate_with_retry(prompt, max_retries=3):
for attempt in range(max_retries):
try:
return fal_client.subscribe(
"fal-ai/glm-image",
arguments={"prompt": prompt}
)
except Exception as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise
For production systems, implement proper exception handling based on fal's client SDK error types and log failures to your monitoring system.
Response Handling
GLM-Image returns structured responses:
{
"images": [
{
"url": "https://fal.media/files/...",
"width": 1280,
"height": 720,
"content_type": "image/jpeg"
}
],
"seed": 42,
"has_nsfw_concepts": [false],
"prompt": "Original prompt text"
}
Image URLs expire after 24 hours. Download and store images immediately after generation:
import requests
def save_generated_image(result, filepath):
image_url = result['images'][0]['url']
response = requests.get(image_url)
with open(filepath, 'wb') as f:
f.write(response.content)
Cost and Performance
GLM-Image pricing on fal is $0.15 per megapixel of output. A 1280x720 image (approximately 0.92 megapixels) costs roughly $0.14.
Optimization approaches:
- Use batch generation with num_images up to 4 rather than separate API calls
- Choose appropriate image sizes for your use case; smaller outputs cost less
- Set sync_mode to true for latency-sensitive applications to receive base64 data directly
- Implement caching with prompt and seed combinations for reproducible outputs
- Use webhooks for async workflows to eliminate polling overhead
The fal platform handles rate limiting through automatic request queuing.
Production Architecture
For high-volume deployments, consider these patterns:
- Queue-based processing: Buffer generation requests through Redis or similar to handle traffic spikes without overwhelming your application
- Webhook callbacks: Configure fal webhooks for completion notifications instead of polling, which reduces overhead and connection hold times
- Monitoring: Track error rates and generation times; sudden increases in content policy violations may indicate prompt injection attempts
- Fallback strategies: Implement graceful degradation when generation fails, whether showing cached alternatives or providing clear retry options
When to Use GLM-Image
GLM-Image excels in specific scenarios where text accuracy matters:
- Marketing materials requiring legible headlines, taglines or CTAs
- Educational diagrams with labels and annotations
- Product mockups showing packaging text or UI elements
- Social media graphics with integrated typography
- Presentation slides combining visuals with text elements
For general image generation without text requirements, other models may offer faster generation or different aesthetic qualities. GLM-Image's strength is the combination of visual quality with reliable text rendering, a capability that has historically required post-processing or compositing workflows.
The model also supports identity-preserving generation and multi-subject consistency across images, making it suitable for creating series of related visuals that maintain character or style coherence.
Autoregressive approaches can achieve competitive image generation quality when properly scaled2. GLM-Image applies this principle with a hybrid architecture optimized for text-rendering accuracy, addressing a documented weakness in pure diffusion approaches.
The complete API documentation at docs.fal.ai covers additional options including queue management, file uploads, and client-side integration patterns.
Recently Added
References
-
Zhang, C. et al. "Text-to-image Diffusion Models in Generative AI: A Survey." arXiv:2303.07909, 2023. https://arxiv.org/abs/2303.07909 ↩
-
Sun, P. et al. "Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation." arXiv:2406.06525, 2024. https://arxiv.org/abs/2406.06525 ↩

![Image-to-image editing with LoRA support for FLUX.2 [klein] 9B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aaeb2%2FFZOclk1jcZaVZAP_C12Qe_edbbb28567484c48bd205f24bafd6225.jpg&w=3840&q=75)
![Image-to-image editing with LoRA support for FLUX.2 [klein] 4B from Black Forest Labs. Specialized style transfer and domain-specific modifications.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8aae07%2FWKhXnfsA7BNpDGwCXarGn_52f0f2fdac2c4fc78b2765b6c662222b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f49%2FnKsGN6UMAi6IjaYdkmILC_e20d2097bb984ad589518cf915fe54b4.jpg&w=3840&q=75)
![Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f3c%2F90FKDpwtSCZTqOu0jUI-V_64c1a6ec0f9343908d9efa61b7f2444b.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f50%2FX8ffS5h55gcigsNZoNC7O_52e6b383ac214d2abe0a2e023f03de88.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f36%2FbYUAh_nzYUAUa_yCBkrP1_2dd84022eeda49e99db95e13fc588e47.jpg&w=3840&q=75)
![Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f40%2F-9rbLPCsz36IFb-4t3J2L_76750002c0db4ce899b77e98321ffe30.jpg&w=3840&q=75)
![Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.](/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8a7f30%2FUwGq5qBE9zqd4r6QI7En0_082c2d0376a646378870218b6c0589f9.jpg&w=3840&q=75)








