Try New Grok Imagine here!

OmniGen v1 Text to Image

fal-ai/omnigen-v1
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
Inference
Commercial use

Input

Additional Settings

Customize your input with more control.

Result

Idle

What would you like to do next?

Your request will cost $0.1 per processed megapixel.

Logs

OmniGen v1 | [text/image-to-image]

OmniGen is a unified image generation model that handles multi-modal prompts across image editing, personalized generation, virtual try-on, and multi-person composition at $0.10 per processed megapixel. Trading specialized single-task architectures for unified multi-modal flexibility, it consolidates workflows that typically require switching between separate models. This matters for production teams managing complex image generation pipelines where reference images, editing constraints, and generation need to work together seamlessly.

Use Cases: Image Editing | Personalized Image Generation | Virtual Try-On


Performance

OmniGen's unified architecture processes up to 4 megapixels with multi-image input support at $0.10 per processed megapixel.

MetricResultContext
Resolution SupportUp to 4 megapixels (2048×2048)Multiple preset sizes including square_hd
Multi-Image InputUp to multiple reference imagesVia `<|image_1|>` syntax in prompts
Inference Steps1-50 steps (default 50)Configurable quality/speed tradeoff
Cost per Megapixel$0.1010 generations per $1.00
Batch Generation1-4 images per requestParallel output with shared inference cost

Multi-Modal Prompt Architecture

OmniGen consolidates image editing, personalization, and generation into a single unified model that accepts text prompts with embedded image references. Where traditional workflows require separate models for inpainting, style transfer, and generation, OmniGen processes multi-modal inputs through a shared architecture with dual guidance controls.

What this means for you:

  • Reference Image Integration: Embed multiple images directly in text prompts using `<|image_1|>` syntax, no separate preprocessing pipeline required for personalization or style transfer workflows

  • Dual Guidance Control: Independent text guidance (0-20 CFG) and image guidance (0-20) scales let you balance prompt adherence against reference image influence for precise creative control

  • Task-Agnostic Pipeline: Handle virtual try-on, multi-person generation, and targeted editing through the same endpoint—consolidate API calls and simplify production infrastructure

  • Flexible Output Configuration: Generate 1-4 images per request with deterministic seeding, configurable inference steps (1-50), and JPEG/PNG output format selection


Technical Specifications

SpecDetails
ArchitectureOmniGen
Input FormatsText prompts, multiple image URLs via `input_image_urls` parameter
Output FormatsJPEG, PNG (configurable via `output_format`)
Resolution Optionssquare_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 (up to 4MP)
LicenseCommercial use enabled

API Documentation | Quickstart Guide | Enterprise Pricing


How It Stacks Up

AuraFlow Text to Image ($0.055 per image) – OmniGen prioritizes multi-modal prompt integration and task flexibility at $0.10 per processed megapixel versus AuraFlow's single-task text-to-image focus. AuraFlow delivers faster inference for straightforward text-to-image generation where reference image integration isn't required, trading multi-modal capabilities for 2x cost efficiency on simple prompts.