- Nano Banana Pro
- Edit
Endpoint: 
POST https://fal.run/fal-ai/nano-banana-pro
Endpoint ID: fal-ai/nano-banana-proTry it in the Playground
Run this model interactively with your own prompts.
Quick Start
Examples
An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it’s mouth, and it’s paws paddling underwater.

Input Schema
The text prompt to generate an image from.
The number of images to generate. Default value:
1Range: 1 to 4The seed for the random number generator.
The aspect ratio of the generated image. Default value:
1:1Possible values: auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16The format of the generated image. Default value:
"png"Possible values: jpeg, png, webpThe safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Default value:
"4"Possible values: 1, 2, 3, 4, 5, 6If
True, the media will be returned as a data URI and the output data won’t be available in the request history.The resolution of the image to generate. Default value:
"1K"Possible values: 1K, 2K, 4KExperimental parameter to limit the number of generations from each round of prompting to 1. Set to
True to to disregard any instructions in the prompt regarding the number of images to generate.Enable web search for the image generation task. This will allow the model to use the latest information from the web to generate the image.
Output Schema
The generated images.
The description of the generated images.
Input Example
Output Example
Beyond CLIP: Multimodal Understanding
Built on Google’s Gemini 3 Pro foundation, Nano Banana Pro processes prompts through the same multimodal architecture that powers conversational AI understanding nuance, context, and creative intent rather than simple keyword matching. Where traditional diffusion models treat prompts as collections of weighted tokens, this approach interprets your creative direction holistically, capturing relationships between concepts that single-modality systems miss. What this means for you:- Semantic accuracy: Generates images that match creative intent, not just literal prompt keywords understanding “1960s aesthetic” means grain, color palette, and composition choices
- Reduced iteration cycles: First-generation outputs align with complex briefs, cutting revision rounds compared to keyword-dependent models
- Batch efficiency: Process approximately 7 generations per dollar with consistent quality across variations, making A/B testing and campaign asset creation economically viable
- Natural language control: Direct the model with conversational prompts describing mood, style, and context without mastering prompt engineering syntax
- Advanced text rendering: Industry-leading text generation capabilities for creating legible text in multiple languages, fonts, and calligraphy styles directly within images
Performance Optimized for Quality
Google’s multimodal foundation prioritizes quality and reasoning depth over raw speed, optimized for production workflows requiring sophisticated outputs.| Metric | Result | Context |
|---|---|---|
| Cost per Image | $0.15 | ~7 generations per $1.00 on fal.ai 4K outputs will be charged at double the standard rate |
| Architecture | Gemini 3 Pro Image | Multimodal foundation model with enhanced reasoning |
| Generation Philosophy | Quality-first | Prioritizes complex compositions and accuracy over speed |
| Batch Processing | Multiple images supported | Via num_images parameter in API |
| Resolution Options | 1K, 2K, 4K | Configurable via API |
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Gemini 3 Pro Image (Nano Banana Pro) |
| Input Formats | Text prompts with natural language support; multi-image blending (up to 14 images) |
| Output Formats | PNG, JPEG, WebP image files |
| Resolution Options | Multiple aspect ratios including 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 |
| Character Consistency | Maintains consistency and resemblance for up to 5 people across generations |
| Watermarking | SynthID digital watermarking on all outputs; visible watermark for non-Ultra subscribers |
| License | Commercial use enabled through fal.ai |
| Launch Date | November 20, 2025 |
How It Stacks Up
vs. FLUX.1 [dev]: Nano Banana Pro achieves semantic-aware generation with industry-leading text rendering through Gemini 3 Pro’s multimodal reasoning, making it ideal for marketing materials requiring accurate typography. FLUX.1 [dev] prioritizes maximum resolution control and fine detail preservation for technical illustration workflows. vs. Stable Diffusion 3.5: Nano Banana Pro achieves natural language interpretation and real-world knowledge integration through Gemini architecture, making it ideal for teams creating infographics and data visualizations without prompt engineering expertise. Stable Diffusion 3.5 prioritizes open-source flexibility for custom fine-tuning and on-premise deployment scenarios. vs. Original Nano Banana (Gemini 2.5 Flash Image): Nano Banana Pro trades speed for quality, offering enhanced reasoning, superior text rendering, better character consistency, and advanced composition capabilities. Original Nano Banana remains available for rapid iterations and simple edits at lower cost ($0.039/image).Related
- Nano Banana Pro — Image Generation
Limitations
num_imagesrange: 1 to 4output_formatrestricted to:jpeg,png,webpsafety_tolerancerestricted to:1,2,3,4,5,6resolutionrestricted to:1K,2K,4K
