OmniGen v1 Text to Image
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.1 per processed megapixel.
Logs
OmniGen v1 | [text/image-to-image]
OmniGen is a unified image generation model that handles multi-modal prompts across image editing, personalized generation, virtual try-on, and multi-person composition at $0.10 per processed megapixel. Trading specialized single-task architectures for unified multi-modal flexibility, it consolidates workflows that typically require switching between separate models. This matters for production teams managing complex image generation pipelines where reference images, editing constraints, and generation need to work together seamlessly.
Use Cases: Image Editing | Personalized Image Generation | Virtual Try-On
Performance
OmniGen's unified architecture processes up to 4 megapixels with multi-image input support at $0.10 per processed megapixel.
| Metric | Result | Context |
|---|---|---|
| Resolution Support | Up to 4 megapixels (2048×2048) | Multiple preset sizes including square_hd |
| Multi-Image Input | Up to multiple reference images | Via `<|image_1|>` syntax in prompts |
| Inference Steps | 1-50 steps (default 50) | Configurable quality/speed tradeoff |
| Cost per Megapixel | $0.10 | 10 generations per $1.00 |
| Batch Generation | 1-4 images per request | Parallel output with shared inference cost |
Multi-Modal Prompt Architecture
OmniGen consolidates image editing, personalization, and generation into a single unified model that accepts text prompts with embedded image references. Where traditional workflows require separate models for inpainting, style transfer, and generation, OmniGen processes multi-modal inputs through a shared architecture with dual guidance controls.
What this means for you:
-
Reference Image Integration: Embed multiple images directly in text prompts using
`<|image_1|>`syntax, no separate preprocessing pipeline required for personalization or style transfer workflows -
Dual Guidance Control: Independent text guidance (0-20 CFG) and image guidance (0-20) scales let you balance prompt adherence against reference image influence for precise creative control
-
Task-Agnostic Pipeline: Handle virtual try-on, multi-person generation, and targeted editing through the same endpoint—consolidate API calls and simplify production infrastructure
-
Flexible Output Configuration: Generate 1-4 images per request with deterministic seeding, configurable inference steps (1-50), and JPEG/PNG output format selection
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | OmniGen |
| Input Formats | Text prompts, multiple image URLs via `input_image_urls` parameter |
| Output Formats | JPEG, PNG (configurable via `output_format`) |
| Resolution Options | square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9 (up to 4MP) |
| License | Commercial use enabled |
API Documentation | Quickstart Guide | Enterprise Pricing
How It Stacks Up
AuraFlow Text to Image ($0.055 per image) – OmniGen prioritizes multi-modal prompt integration and task flexibility at $0.10 per processed megapixel versus AuraFlow's single-task text-to-image focus. AuraFlow delivers faster inference for straightforward text-to-image generation where reference image integration isn't required, trading multi-modal capabilities for 2x cost efficiency on simple prompts.