Nano Banana 2 Text to Image
Input
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.08 per image. For $1.00, you can run this model 12 times. 2K and 4K outputs will be charged at 1.5 times and 2 times the standard rate, respectively. 0.5K (512px) resolution outputs will be charged at 0.75 times the standard rate. If web search is used, an additional $0.015 will be charged. Note: Pricing is subject to change.
Logs
Nano Banana 2 [text-to-image]
Google's Gemini 3.1 Flash Image architecture generates vibrant, high-fidelity visuals at speed, combining the reasoning capabilities of a multimodal foundation model with the efficiency of Flash-optimized inference. It understands creative intent holistically rather than matching keywords, producing images with accurate text rendering, character consistency, and coherent spatial composition in seconds.
Built for: Marketing campaigns and social media assets | Product photography and visualization | Designs requiring accurate in-image typography | Storyboarding with consistent characters across frames
Reasoning-Guided, Flash-Fast
Built on Google's Gemini 3.1 Flash Image foundation, Nano Banana 2 reasons about composition, lighting, and spatial relationships before rendering. Unlike traditional diffusion models that treat prompts as weighted tokens, this architecture interprets creative direction as a multimodal language model would, capturing nuance and context that single-modality systems miss, then executes at Flash-tier speed.
What this means for you:
- Vibrant output: Rich color, punchy contrast, and visual coherence out of the box without post-processing
- Accurate text rendering: Character-by-character validated typography in multiple languages, directly in generated images
- Character consistency: Maintain identity for up to 5 people across generations for storyboarding and campaign work
- Natural language control: Describe mood, style, and context conversationally without mastering prompt engineering syntax
- Web-grounded generation: Optionally ground outputs in real-time web information for factually current visuals
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Gemini 3.1 Flash Image (Nano Banana 2) |
| Input | Text prompts (natural language); up to 14 reference images for editing |
| Output Formats | PNG, JPEG, WebP |
| Resolution | 1K (default), 2K (1.5x rate), 4K (2x rate), 512x512 (0.75x rate) |
| Aspect Ratios | auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 |
| Batch | 1-4 images per request |
| Character Consistency | Up to 5 people across generations |
| Watermarking | SynthID digital watermarking on all outputs |
| Web Search | Optional grounding via `enable_web_search` or `enable_google_search` |
| License | Commercial use enabled through fal.ai |
How It Stacks Up
vs. Nano Banana Pro (Gemini 3 Pro Image): Nano Banana 2 prioritizes speed and vibrant output on the Flash architecture, generating in seconds where Pro optimizes for maximum reasoning depth at $0.15/image. Choose Nano Banana 2 for fast iteration and production volume, Pro for maximum compositional complexity.
vs. FLUX.2 [dev]: Nano Banana 2 delivers semantic-aware generation with native text rendering and character consistency through Gemini's multimodal reasoning. FLUX.2 [dev] prioritizes resolution control and fine detail preservation for technical illustration workflows.
vs. Original Nano Banana (Gemini 2.5 Flash Image): Nano Banana 2 adds reasoning-guided generation, dramatically improved text rendering, native multi-resolution output (1K/2K/4K), character consistency, multi-image compositing, and web search grounding. A generational leap in quality while maintaining Flash-tier speed.