fan-cam Template
Workflow Structure
TryAbout this template
AI Fan Cam Workflow
Create a personalized live sports broadcast fan cam from one user photo.
This fal.ai workflow turns a user image into a realistic spectator cutaway video. The user uploads a photo, describes the match or live event, and selects the spectator reaction. The workflow creates one 4K broadcast-style frame with GPT Image 2, then animates it into a 15 second Kling 3 video using multi prompt.
Use the template here:
Open the fan cam workflow template
What It Is For
The workflow is built for personalized fan cam videos that look like live TV crowd cutaways. The generated person should feel naturally embedded in the stands, watching the event with the selected reaction.
It can adapt to many live event contexts:
- Football and soccer matches
- Basketball arena cutaways
- Tennis crowd reactions
- Formula 1 grandstand shots
- Combat sports audience reactions
- Volleyball, cricket, rugby, hockey, baseball, esports, and other live events
The intended look is realistic broadcast capture, not a clean portrait. The frame should feel slightly compressed, crowded, off-center, and naturally imperfect.
Inputs
Photo URLs
Upload the user's own photo first. This image is used as the identity reference for the featured spectator.
Optional extra images can be added as event, venue, broadcast, or style references.
Event Details
Describe the sport or event in plain language. Include the match, teams or players, venue, broadcast style, outfit colors, scoreboard idea, and any crowd context that matters.
Example:
textWorld Cup football match, the uploaded person is a national team supporter in a packed night stadium, realistic live TV crowd cutaway, compact score overlay, tense match atmosphere, natural stadium lighting.
Spectator Reaction
Choose the visible reaction for the featured spectator.
Supported reactions:
textexcited happy laughing sad neutral angry surprised nervous focused
The selected reaction controls the face, posture, eye direction, mouth shape, shoulders, and small movements in the final video.
Workflow Nodes
| Node | Model or utility | Purpose |
|---|---|---|
| Merge Text | `fal-ai/workflow-utilities/merge-text` | Combines event details and spectator reaction. |
| Planner | `openrouter/router` with GPT-5.5 | Writes the GPT Image 2 prompt and five Kling video prompts. |
| JSON Extract | `fal-ai/workflow-utilities/json-extract` | Extracts the image prompt and video prompts from the planner output. |
| Image Generation | `openai/gpt-image-2/edit` | Creates one personalized 4K broadcast frame. |
| Video Generation | `fal-ai/kling-video/v3/standard/image-to-video` | Animates the frame into a 15 second fan cam video. |
| Output | Display node | Returns the video, generated frame, image prompt, reaction, and planner JSON. |
Model Chain
textUser photo + event details + reaction -> Merge Text -> GPT-5.5 planner -> JSON Extract -> GPT Image 2 edit -> Kling 3 standard image to video -> Output
GPT Image 2 Frame
The first generation step creates a single horizontal 16:9 broadcast frame.
Endpoint:
textopenai/gpt-image-2/edit
Frame settings:
json{ "image_size": { "width": 3840, "height": 2160 }, "quality": "low", "output_format": "jpeg", "num_images": 1 }
The image prompt should preserve the uploaded person's identity while placing them naturally inside the selected event. The prompt should ask for realistic TV capture quality, a professional broadcast camera look, mild compression noise, subtle motion blur, natural skin texture, and real crowd depth.
Avoid portrait framing, beauty retouching, changed facial anatomy, fake logos, sponsor marks, unstable scoreboard text, and isolated subject composition.
Kling 3 Video
The second generation step animates the generated frame.
Endpoint:
textfal-ai/kling-video/v3/standard/image-to-video
Video settings:
json{ "duration": "15", "shot_type": "customize", "cfg_scale": 0.3, "generate_audio": true }
The generated frame is passed to Kling as one image element and referenced as `@Element1` in every video prompt. This keeps the subject stable and avoids exceeding Kling's image element limit.
Multi Prompt Structure
Kling receives five prompts. Each prompt lasts 3 seconds.
json[ { "prompt": "Use @Element1 as the exact broadcast cutaway...", "duration": "3" }, { "prompt": "Stay on @Element1 with subtle spectator motion...", "duration": "3" }, { "prompt": "Animate @Element1 with a small broadcast camera correction...", "duration": "3" }, { "prompt": "Continue @Element1 in a tighter live TV angle...", "duration": "3" }, { "prompt": "End on @Element1 as the crowd reaction rises...", "duration": "3" } ]
Each Kling prompt should:
- Reference
`@Element1` - Stay under 430 characters
- Preserve the same person, outfit, venue, lighting, crowd, and overlay
- Match the selected reaction
- Use sport-specific language
- Avoid face morphing, beautification, unstable text, and excessive camera movement
Example Event Details
Football
textInternational football final, uploaded person is a supporter in a packed night stadium, team colors in the crowd, compact score overlay, tense live broadcast crowd cutaway, natural stadium lighting.
Suggested reaction:
textnervous
Basketball
textProfessional basketball playoff game, uploaded person is seated in the lower bowl near the court, arena lights, hardwood glow in the background, compact quarter and game clock overlay, live TV crowd reaction shot.
Suggested reaction:
textexcited
Tennis
textMajor tennis final, uploaded person is a spectator in the center court crowd, restrained audience atmosphere, green court context, compact tennis scoreboard, realistic live broadcast cutaway.
Suggested reaction:
textfocused
Formula 1
textFormula 1 street circuit grandstand, uploaded person watches from a packed spectator section, timing graphics, race-day broadcast atmosphere, sunlight, track action implied offscreen.
Suggested reaction:
textsurprised
Combat Sports
textChampionship fight night, uploaded person is seated in the lower bowl crowd, dramatic arena light, round clock overlay, fans reacting around them, realistic sports broadcast audience cutaway.
Suggested reaction:
textangry
Example GPT Image 2 Prompt
textUse the uploaded photo as the identity reference for the featured spectator. Preserve the real face, age impression, skin tone, hair, glasses if present, facial structure, natural pores, and ordinary imperfections. Create a horizontal 16:9 realistic TV capture quality broadcast screenshot from an international football final. The person is seated naturally among packed supporters, watching the match with a nervous reaction: alert eyes, tight mouth, slightly raised shoulders, and a small forward lean. Use a professional broadcast camera look with mild compression noise, subtle motion blur, off-center crop, foreground heads partly blocking the view, imperfect background faces, natural stadium light, and a compact score overlay. No AI beauty retouching, no face anatomy changes, no portrait orientation, no studio portrait, no passport photo, no influencer look, no fake logos, no readable sponsor marks, no warped text, no anime, no cartoon.
Example Kling Multi Prompts
json[ { "prompt": "Use @Element1 as the exact live football broadcast cutaway. Preserve the featured spectator, packed crowd, compact score overlay, seat layout, lighting, outfit, and nervous reaction. Add mild TV feed vibration, tiny head motion, natural crowd shifts, and realistic compression.", "duration": "3" }, { "prompt": "Stay on @Element1. The spectator keeps watching the match with alert eyes, a small blink, tight mouth, and stiff shoulders. Nearby fans shift naturally and lean toward the action. Keep identity, wardrobe, overlay, stadium light, and crowd depth stable.", "duration": "3" }, { "prompt": "Animate @Element1 with a subtle broadcast operator correction: a gentle push in through foreground heads and a tiny pan. The spectator remains embedded in the crowd, nervous and unposed. Preserve the same face, seats, overlay, venue light, and live TV texture.", "duration": "3" }, { "prompt": "Continue @Element1 in a slightly tighter live TV angle. The spectator's eyes track offscreen action for a beat, shoulders stay tense, and background fans react softly. Keep the same person, crowd layout, compression, scoreboard, and natural motion blur.", "duration": "3" }, { "prompt": "End on @Element1 as the crowd reaction rises around the spectator. Fans behind lift arms and lean forward out of focus while the featured person stays nervous and locked on the match. Keep broadcast realism, stable overlay, identity, outfit, and venue consistent.", "duration": "3" } ]
Output Fields
| Output | Description |
|---|---|
`video` | Final Kling video |
`image` | Generated 4K broadcast frame |
`image_prompt` | GPT Image 2 prompt used for the frame |
`reaction` | Selected spectator reaction |
`plan_json` | Full planner JSON |
Recommended Details Template
text[sport or event], [team/player 1] vs [team/player 2], [venue or competition], uploaded person is a spectator in the crowd, [wardrobe or color details], [broadcast style], compact realistic scoreboard overlay, natural live TV crowd cutaway, ordinary spectator realism.
API Endpoint
The API follows common HTTP semantics and should work with the language of your preference. Below you will find the API endpoint and some code snippets to help you get started:
import { fal } from "@fal-ai/client";
const stream = await fal.stream("workflows/template/fan-cam", {
input: {
image_urls: [],
details: ""
}
});
for await (const event of stream) {
console.log(event);
}
const result = await stream.done();