fan-cam Template

AI Fan Cam Workflow

Create a personalized live sports broadcast fan cam from one user photo.

This fal.ai workflow turns a user image into a realistic spectator cutaway video. The user uploads a photo, describes the match or live event, and selects the spectator reaction. The workflow creates one 4K broadcast-style frame with GPT Image 2, then animates it into a 15 second Kling 3 video using multi prompt.

Use the template here:

Open the fan cam workflow template

What It Is For

The workflow is built for personalized fan cam videos that look like live TV crowd cutaways. The generated person should feel naturally embedded in the stands, watching the event with the selected reaction.

It can adapt to many live event contexts:

Football and soccer matches
Basketball arena cutaways
Tennis crowd reactions
Formula 1 grandstand shots
Combat sports audience reactions
Volleyball, cricket, rugby, hockey, baseball, esports, and other live events

The intended look is realistic broadcast capture, not a clean portrait. The frame should feel slightly compressed, crowded, off-center, and naturally imperfect.

Inputs

Photo URLs

Upload the user's own photo first. This image is used as the identity reference for the featured spectator.

Optional extra images can be added as event, venue, broadcast, or style references.

Event Details

Describe the sport or event in plain language. Include the match, teams or players, venue, broadcast style, outfit colors, scoreboard idea, and any crowd context that matters.

Example:

text
World Cup football match, the uploaded person is a national team supporter in a packed night stadium, realistic live TV crowd cutaway, compact score overlay, tense match atmosphere, natural stadium lighting.

Spectator Reaction

Choose the visible reaction for the featured spectator.

Supported reactions:

text
excited
happy
laughing
sad
neutral
angry
surprised
nervous
focused

The selected reaction controls the face, posture, eye direction, mouth shape, shoulders, and small movements in the final video.

Workflow Nodes

Node	Model or utility	Purpose
Merge Text	`fal-ai/workflow-utilities/merge-text`	Combines event details and spectator reaction.
Planner	`openrouter/router` with GPT-5.5	Writes the GPT Image 2 prompt and five Kling video prompts.
JSON Extract	`fal-ai/workflow-utilities/json-extract`	Extracts the image prompt and video prompts from the planner output.
Image Generation	`openai/gpt-image-2/edit`	Creates one personalized 4K broadcast frame.
Video Generation	`fal-ai/kling-video/v3/standard/image-to-video`	Animates the frame into a 15 second fan cam video.
Output	Display node	Returns the video, generated frame, image prompt, reaction, and planner JSON.

Model Chain

text
User photo + event details + reaction
-> Merge Text
-> GPT-5.5 planner
-> JSON Extract
-> GPT Image 2 edit
-> Kling 3 standard image to video
-> Output

GPT Image 2 Frame

The first generation step creates a single horizontal 16:9 broadcast frame.

Endpoint:

text
openai/gpt-image-2/edit

Frame settings:

json
{
  "image_size": {
    "width": 3840,
    "height": 2160
  },
  "quality": "low",
  "output_format": "jpeg",
  "num_images": 1
}

The image prompt should preserve the uploaded person's identity while placing them naturally inside the selected event. The prompt should ask for realistic TV capture quality, a professional broadcast camera look, mild compression noise, subtle motion blur, natural skin texture, and real crowd depth.

Avoid portrait framing, beauty retouching, changed facial anatomy, fake logos, sponsor marks, unstable scoreboard text, and isolated subject composition.

Kling 3 Video

The second generation step animates the generated frame.

Endpoint:

text
fal-ai/kling-video/v3/standard/image-to-video

Video settings:

json
{
  "duration": "15",
  "shot_type": "customize",
  "cfg_scale": 0.3,
  "generate_audio": true
}

The generated frame is passed to Kling as one image element and referenced as @Element1 in every video prompt. This keeps the subject stable and avoids exceeding Kling's image element limit.

Multi Prompt Structure

Kling receives five prompts. Each prompt lasts 3 seconds.

json
[
  { "prompt": "Use @Element1 as the exact broadcast cutaway...", "duration": "3" },
  { "prompt": "Stay on @Element1 with subtle spectator motion...", "duration": "3" },
  { "prompt": "Animate @Element1 with a small broadcast camera correction...", "duration": "3" },
  { "prompt": "Continue @Element1 in a tighter live TV angle...", "duration": "3" },
  { "prompt": "End on @Element1 as the crowd reaction rises...", "duration": "3" }
]

Each Kling prompt should:

Reference @Element1
Stay under 430 characters
Preserve the same person, outfit, venue, lighting, crowd, and overlay
Match the selected reaction
Use sport-specific language
Avoid face morphing, beautification, unstable text, and excessive camera movement

Example Event Details

Football

text
International football final, uploaded person is a supporter in a packed night stadium, team colors in the crowd, compact score overlay, tense live broadcast crowd cutaway, natural stadium lighting.

Suggested reaction:

text
nervous

Basketball

text
Professional basketball playoff game, uploaded person is seated in the lower bowl near the court, arena lights, hardwood glow in the background, compact quarter and game clock overlay, live TV crowd reaction shot.

Suggested reaction:

text
excited

Tennis

text
Major tennis final, uploaded person is a spectator in the center court crowd, restrained audience atmosphere, green court context, compact tennis scoreboard, realistic live broadcast cutaway.

Suggested reaction:

text
focused

Formula 1

text
Formula 1 street circuit grandstand, uploaded person watches from a packed spectator section, timing graphics, race-day broadcast atmosphere, sunlight, track action implied offscreen.

Suggested reaction:

text
surprised

Combat Sports

text
Championship fight night, uploaded person is seated in the lower bowl crowd, dramatic arena light, round clock overlay, fans reacting around them, realistic sports broadcast audience cutaway.

Suggested reaction:

text
angry

Example GPT Image 2 Prompt

text
Use the uploaded photo as the identity reference for the featured spectator. Preserve the real face, age impression, skin tone, hair, glasses if present, facial structure, natural pores, and ordinary imperfections. Create a horizontal 16:9 realistic TV capture quality broadcast screenshot from an international football final. The person is seated naturally among packed supporters, watching the match with a nervous reaction: alert eyes, tight mouth, slightly raised shoulders, and a small forward lean. Use a professional broadcast camera look with mild compression noise, subtle motion blur, off-center crop, foreground heads partly blocking the view, imperfect background faces, natural stadium light, and a compact score overlay. No AI beauty retouching, no face anatomy changes, no portrait orientation, no studio portrait, no passport photo, no influencer look, no fake logos, no readable sponsor marks, no warped text, no anime, no cartoon.

Example Kling Multi Prompts

json
[
  {
    "prompt": "Use @Element1 as the exact live football broadcast cutaway. Preserve the featured spectator, packed crowd, compact score overlay, seat layout, lighting, outfit, and nervous reaction. Add mild TV feed vibration, tiny head motion, natural crowd shifts, and realistic compression.",
    "duration": "3"
  },
  {
    "prompt": "Stay on @Element1. The spectator keeps watching the match with alert eyes, a small blink, tight mouth, and stiff shoulders. Nearby fans shift naturally and lean toward the action. Keep identity, wardrobe, overlay, stadium light, and crowd depth stable.",
    "duration": "3"
  },
  {
    "prompt": "Animate @Element1 with a subtle broadcast operator correction: a gentle push in through foreground heads and a tiny pan. The spectator remains embedded in the crowd, nervous and unposed. Preserve the same face, seats, overlay, venue light, and live TV texture.",
    "duration": "3"
  },
  {
    "prompt": "Continue @Element1 in a slightly tighter live TV angle. The spectator's eyes track offscreen action for a beat, shoulders stay tense, and background fans react softly. Keep the same person, crowd layout, compression, scoreboard, and natural motion blur.",
    "duration": "3"
  },
  {
    "prompt": "End on @Element1 as the crowd reaction rises around the spectator. Fans behind lift arms and lean forward out of focus while the featured person stays nervous and locked on the match. Keep broadcast realism, stable overlay, identity, outfit, and venue consistent.",
    "duration": "3"
  }
]

Output Fields

Output	Description
`video`	Final Kling video
`image`	Generated 4K broadcast frame
`image_prompt`	GPT Image 2 prompt used for the frame
`reaction`	Selected spectator reaction
`plan_json`	Full planner JSON

Recommended Details Template

text
[sport or event], [team/player 1] vs [team/player 2], [venue or competition], uploaded person is a spectator in the crowd, [wardrobe or color details], [broadcast style], compact realistic scoreboard overlay, natural live TV crowd cutaway, ordinary spectator realism.

fan-cam Template

Workflow Structure

About this template

AI Fan Cam Workflow

What It Is For

Inputs

Photo URLs

Event Details

Spectator Reaction

Workflow Nodes

Model Chain

GPT Image 2 Frame

Kling 3 Video

Multi Prompt Structure

Example Event Details

Football

Basketball

Tennis

Formula 1

Combat Sports

Example GPT Image 2 Prompt

Example Kling Multi Prompts

Output Fields

Recommended Details Template

API Endpoint

Created by

Estimated cost

Models used

Other templates