Nano Banana Pro API

Nano Banana Pro
Edit

Endpoint: POST https://fal.run/fal-ai/nano-banana-pro Endpoint ID: fal-ai/nano-banana-pro

Try it in the Playground

Run this model interactively with your own prompts.

Quick Start

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/nano-banana-pro",
    arguments={
        "prompt": "An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater."
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Examples

An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it’s mouth, and it’s paws paddling underwater.

Generated image: An action shot of a black lab swimming in an inground suburban swimming pool. Th

Input Schema

prompt

string

required

The text prompt to generate an image from.

num_images

integer

default:"1"

The number of images to generate. Default value: 1Range: 1 to 4

seed

integer

The seed for the random number generator.

aspect_ratio

Enum

default:"1:1"

The aspect ratio of the generated image. Default value: 1:1Possible values: auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16

output_format

OutputFormatEnum

default:"png"

The format of the generated image. Default value: "png"Possible values: jpeg, png, webp

safety_tolerance

SafetyToleranceEnum

default:"4"

The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Default value: "4"Possible values: 1, 2, 3, 4, 5, 6

sync_mode

boolean

default:"false"

If True, the media will be returned as a data URI and the output data won’t be available in the request history.

resolution

ResolutionEnum

default:"1K"

The resolution of the image to generate. Default value: "1K"Possible values: 1K, 2K, 4K

limit_generations

boolean

default:"false"

Experimental parameter to limit the number of generations from each round of prompting to 1. Set to True to to disregard any instructions in the prompt regarding the number of images to generate.

enable_web_search

boolean

default:"false"

Enable web search for the image generation task. This will allow the model to use the latest information from the web to generate the image.

Output Schema

images

list<ImageFile>

required

The generated images.

description

string

required

The description of the generated images.

Input Example

{
  "prompt": "An action shot of a black lab swimming in an inground suburban swimming pool. The camera is placed meticulously on the water line, dividing the image in half, revealing both the dogs head above water holding a tennis ball in it's mouth, and it's paws paddling underwater.",
  "num_images": 1,
  "aspect_ratio": "1:1",
  "output_format": "png",
  "safety_tolerance": "4",
  "sync_mode": false,
  "resolution": "1K",
  "limit_generations": false,
  "enable_web_search": false
}

Output Example

{
  "images": [
    {
      "content_type": "image/png",
      "file_name": "nano-banana-pro-t2i-output.png",
      "url": "https://storage.googleapis.com/falserverless/example_outputs/nano-banana-pro-t2i-output.png"
    }
  ],
  "description": ""
}

Endpoint: POST https://fal.run/fal-ai/nano-banana-pro/edit Endpoint ID: fal-ai/nano-banana-pro/edit

Try it in the Playground

Run this model interactively with your own prompts.

Quick Start

import fal_client

def on_queue_update(update):
    if isinstance(update, fal_client.InProgress):
        for log in update.logs:
           print(log["message"])

result = fal_client.subscribe(
    "fal-ai/nano-banana-pro/edit",
    arguments={
        "prompt": "make a photo of the man driving the car down the california coastline",
        "image_urls": [
            "https://storage.googleapis.com/falserverless/example_inputs/nano-banana-edit-input.png",
            "https://storage.googleapis.com/falserverless/example_inputs/nano-banana-edit-input-2.png"
        ]
    },
    with_logs=True,
    on_queue_update=on_queue_update,
)
print(result)

Examples

make a photo of the man driving the car down the california coastline

Generated image: make a photo of the man driving the car down the california coastline

Input Schema

prompt

string

required

The prompt for image editing.

num_images

integer

default:"1"

The number of images to generate. Default value: 1Range: 1 to 4

seed

integer

The seed for the random number generator.

aspect_ratio

Enum

default:"auto"

The aspect ratio of the generated image. Default value: autoPossible values: auto, 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16

output_format

OutputFormatEnum

default:"png"

The format of the generated image. Default value: "png"Possible values: jpeg, png, webp

safety_tolerance

SafetyToleranceEnum

default:"4"

The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Default value: "4"Possible values: 1, 2, 3, 4, 5, 6

sync_mode

boolean

default:"false"

If True, the media will be returned as a data URI and the output data won’t be available in the request history.

image_urls

list<string>

required

The URLs of the images to use for image-to-image generation or image editing.

resolution

ResolutionEnum

default:"1K"

The resolution of the image to generate. Default value: "1K"Possible values: 1K, 2K, 4K

limit_generations

boolean

default:"false"

Experimental parameter to limit the number of generations from each round of prompting to 1. Set to True to to disregard any instructions in the prompt regarding the number of images to generate.

enable_web_search

boolean

default:"false"

Enable web search for the image generation task. This will allow the model to use the latest information from the web to generate the image.

Output Schema

images

list<ImageFile>

required

The edited images.

description

string

required

The description of the generated images.

Input Example

{
  "prompt": "make a photo of the man driving the car down the california coastline",
  "num_images": 1,
  "aspect_ratio": "auto",
  "output_format": "png",
  "safety_tolerance": "4",
  "sync_mode": false,
  "image_urls": [
    "https://storage.googleapis.com/falserverless/example_inputs/nano-banana-edit-input.png",
    "https://storage.googleapis.com/falserverless/example_inputs/nano-banana-edit-input-2.png"
  ],
  "resolution": "1K",
  "limit_generations": false,
  "enable_web_search": false
}

Output Example

{
  "images": [
    {
      "content_type": "image/png",
      "file_name": "nano-banana-pro-edit-output.png",
      "url": "https://storage.googleapis.com/falserverless/example_outputs/nano-banana-pro-edit-output.png"
    }
  ],
  "description": ""
}

Google’s Gemini 3 Pro Image architecture delivers production-quality visuals at $0.15 per image—understanding context like a multimodal foundation model, not keyword matching like traditional diffusion systems. Trading raw speed for sophisticated semantic interpretation and enhanced reasoning capabilities, it transforms complex creative direction into accurate visuals without prompt engineering gymnastics, making it ideal for teams that need studio-quality results with advanced text rendering and character consistency. Built for: Marketing campaign generation | Product visualization workflows | Creative content production requiring text accuracy | Infographic and diagram creation at scale

Beyond CLIP: Multimodal Understanding

Built on Google’s Gemini 3 Pro foundation, Nano Banana Pro processes prompts through the same multimodal architecture that powers conversational AI understanding nuance, context, and creative intent rather than simple keyword matching. Where traditional diffusion models treat prompts as collections of weighted tokens, this approach interprets your creative direction holistically, capturing relationships between concepts that single-modality systems miss. What this means for you:

Semantic accuracy: Generates images that match creative intent, not just literal prompt keywords understanding “1960s aesthetic” means grain, color palette, and composition choices
Reduced iteration cycles: First-generation outputs align with complex briefs, cutting revision rounds compared to keyword-dependent models
Batch efficiency: Process approximately 7 generations per dollar with consistent quality across variations, making A/B testing and campaign asset creation economically viable
Natural language control: Direct the model with conversational prompts describing mood, style, and context without mastering prompt engineering syntax
Advanced text rendering: Industry-leading text generation capabilities for creating legible text in multiple languages, fonts, and calligraphy styles directly within images

Performance Optimized for Quality

Google’s multimodal foundation prioritizes quality and reasoning depth over raw speed, optimized for production workflows requiring sophisticated outputs.

Metric	Result	Context
Cost per Image	$0.15	~7 generations per $1.00 on fal.ai 4K outputs will be charged at double the standard rate
Architecture	Gemini 3 Pro Image	Multimodal foundation model with enhanced reasoning
Generation Philosophy	Quality-first	Prioritizes complex compositions and accuracy over speed
Batch Processing	Multiple images supported	Via `num_images` parameter in API
Resolution Options	1K, 2K, 4K	Configurable via API

Note: Generation times not publicly benchmarked by Google; model optimized for quality rather than speed metrics

Technical Specifications

Spec	Details
Architecture	Gemini 3 Pro Image (Nano Banana Pro)
Input Formats	Text prompts with natural language support; multi-image blending (up to 14 images)
Output Formats	PNG, JPEG, WebP image files
Resolution Options	Multiple aspect ratios including 21:9, 16:9, 3:2, 4:3, 5:4, 1:1, 4:5, 3:4, 2:3, 9:16
Character Consistency	Maintains consistency and resemblance for up to 5 people across generations
Watermarking	SynthID digital watermarking on all outputs; visible watermark for non-Ultra subscribers
License	Commercial use enabled through fal.ai
Launch Date	November 20, 2025

API Documentation

How It Stacks Up

vs. FLUX.1 [dev]: Nano Banana Pro achieves semantic-aware generation with industry-leading text rendering through Gemini 3 Pro’s multimodal reasoning, making it ideal for marketing materials requiring accurate typography. FLUX.1 [dev] prioritizes maximum resolution control and fine detail preservation for technical illustration workflows. vs. Stable Diffusion 3.5: Nano Banana Pro achieves natural language interpretation and real-world knowledge integration through Gemini architecture, making it ideal for teams creating infographics and data visualizations without prompt engineering expertise. Stable Diffusion 3.5 prioritizes open-source flexibility for custom fine-tuning and on-premise deployment scenarios. vs. Original Nano Banana (Gemini 2.5 Flash Image): Nano Banana Pro trades speed for quality, offering enhanced reasoning, superior text rendering, better character consistency, and advanced composition capabilities. Original Nano Banana remains available for rapid iterations and simple edits at lower cost ($0.039/image).

Nano Banana Pro — Image Generation

Limitations

num_images range: 1 to 4
output_format restricted to: jpeg, png, webp
safety_tolerance restricted to: 1, 2, 3, 4, 5, 6
resolution restricted to: 1K, 2K, 4K

Try it in the Playground

​Quick Start

​Examples

​Input Schema

​Output Schema

​Input Example

​Output Example

Try it in the Playground

​Quick Start

​Examples

​Input Schema

​Output Schema

​Input Example

​Output Example

​Beyond CLIP: Multimodal Understanding

​Performance Optimized for Quality

​Technical Specifications

​How It Stacks Up

​Related

​Limitations

Quick Start

Examples

Input Schema

Output Schema

Input Example

Output Example

Quick Start

Examples

Input Schema

Output Schema

Input Example

Output Example

Beyond CLIP: Multimodal Understanding

Performance Optimized for Quality

Technical Specifications

How It Stacks Up

Related

Limitations