Nano Banana 2 Image to Image
Input
Hint: Drag and drop files from your computer, images from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Customize your input with more control.
Result
What would you like to do next?
Your request will cost $0.08 per image. For $1.00, you can run this model 12 times. 2K and 4K outputs will be charged at 1.5 times and 2 times the standard rate, respectively. 0.5K (512px) resolution outputs will be charged at 0.75 times the standard rate. If web search is used, an additional $0.015 will be charged. Note: Pricing is subject to change.
Logs
Nano Banana 2 [image-to-image]
Google's Gemini 3.1 Flash Image architecture edits and transforms images with multimodal understanding, combining up to 14 reference images with natural language instructions to deliver precise, context-aware modifications at Flash-tier speed. It reasons about what you want changed and what should stay intact, producing edits that respect composition, lighting, and style coherence.
Built for: Product photo retouching and variation | Style transfer and creative remixing | Multi-image compositing and scene assembly | Iterative design workflows requiring fast turnaround
Edit with Intent, Not Masks
Built on Google's Gemini 3.1 Flash Image foundation, Nano Banana 2 Edit understands your editing instructions semantically. Instead of requiring manual masks or region selection, describe what you want changed in plain language and the model reasons about which elements to modify while preserving the rest. Supply up to 14 reference images for compositing, style guidance, or multi-subject scenes.
What this means for you:
- Multi-image input: Combine up to 14 reference images in a single request for compositing, style matching, or subject transfer
- Natural language editing: Describe edits conversationally - no masks, layers, or region coordinates needed
- Context-aware preservation: The model understands what to change and what to leave untouched, maintaining coherence across the edit
- Vibrant output: Rich color, punchy contrast, and visual fidelity carried through from source images
- Web-grounded editing: Optionally ground edits in real-time web information via
`enable_web_search`or`enable_google_search`
Technical Specifications
| Spec | Details |
|---|---|
| Architecture | Gemini 3.1 Flash Image (Nano Banana 2) |
| Input | Text prompt (required) + up to 14 reference images (required) |
| Output Formats | PNG, JPEG, WebP |
| Resolution | 1K (default), 2K (1.5x rate), 4K (2x rate) |
| Aspect Ratios | auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16 |
| Batch | 1-4 images per request |
| Watermarking | SynthID digital watermarking on all outputs |
| Web Search | Optional grounding via `enable_web_search` or `enable_google_search` |
| License | Commercial use enabled through fal.ai |
How It Stacks Up
vs. Nano Banana 2 Text-to-Image: The edit endpoint takes existing images as input alongside a text prompt, enabling modifications, compositing, and style transfer rather than generation from scratch. Use text-to-image for creating new visuals, edit for transforming existing ones.
vs. Nano Banana Pro Edit (Gemini 3 Pro Image): Nano Banana 2 Edit prioritizes speed and vibrant output on the Flash architecture, delivering edits in seconds where Pro optimizes for maximum reasoning depth. Choose Nano Banana 2 for fast iteration, Pro for complex multi-step edits requiring deeper compositional reasoning.
vs. FLUX.2 [dev] Image-to-Image: Nano Banana 2 Edit accepts up to 14 reference images with semantic understanding of edit instructions through Gemini's multimodal reasoning. FLUX.2 [dev] offers strength-based image-to-image with fine control over how much of the original to preserve.

