Kling 3.0 is here, exclusively on fal!
features and labels, 2026.

State of
Generative
Media

Scroll to explore
volume 1image · video · audio
in partnership with artificial analysis
State of Generative Media

From e-commerce teams to visual designers, anyone can now generate hundreds of production-ready images in minutes. A few years ago, this volume would have required thousands of photographers, studios, and production staff. The cost structure that has governed e-commerce and other digital verticals has shifted. Traditional barriers around content production are evaporating, due to generative media infrastructure.

The primary impact of breakthroughs in generative technology is the expansion of creative potential for users and builders alike. Entertainment applications initiated adoption of generative media, but in 2025 production applications (e.g. e-commerce, advertising, creative studios) drove scale[1] and by year’s end 88% of organizations deployed AI in at least one business function[2].

Jeffrey Katzenberg articulated the fundamental transformation:

It’s the democratization of storytelling at a level that has never happened in the existence of humankind.

Jeffrey KatzenbergFounding Partner of WndrCo and founder of DreamWorks Animation [3]

This shift emerged from rapid advances in generative technology, as models reached levels of quality, controllability, and reliability once reserved for specialized production teams.

This report examines how generative technology and trends accelerated in 2025. These insights draw heavily from revelatory survey data collected across a diverse range of organizations and individual users[4]. We’ve included highlights from industry leaders who spoke at the Generative Media Conference in October, as well as the most impactful market research covering the changing landscapes in generative technology. Generative media is changing how we tell stories, build businesses and engage with users.

This signals the start of a new chapter in the digital age.

In 2025, video generation models delivered outputs passing visual Turing tests for untrained observers[5]. Technical capabilities advanced greatly across image, video, and audio generation in 2025, with different modalities reaching similar stages of evolution. Image editing capabilities revitalized a category that appeared to be declining. For all industries and modalities, infrastructure optimization reduced latency sufficiently for more real-time applications.

Timeline

While individual launches defined key inflection points, the broader story of 2025 was scale. Model releases were no longer isolated breakthroughs. They became continuous across modalities, driving expansion across every creative medium:

New models by modality*Total endpoints
Video450
Image406
Audio59
3D35
Speech35
Total985

*Models integrated into the fal platform in 2025

Image generation transformed experimental workflows into production pipelines in 2025. Black Forest Labs* released Flux.1 Dev with superior prompt adherence, text rendering accuracy, and human pose fidelity. The model established benchmarks that competitors pursued for months. GPT Image 1 by OpenAI created a cultural moment for the next generation of users, with the model’s Studio Ghibli aesthetic capturing billions of views across social platforms.

TimelineModelCompanyImpact
Aug 2024Flux.1 DevBlack Forest LabsShattered performance ceiling, superior prompt adherence
Mar 2025GPT Image 1OpenAITrue multimodal image generation, defining cultural moment
Aug 2025Qwen Image EditAlibabaOpen-source image editing with LoRA
Aug 2025Nano Banana v1Google DeepMindConsumer accessibility without technical proficiency

Black Forest Labs launched Flux Kontext, the first dedicated image editing model achieving character consistency, style transfer, and localized editing at near-real-time speeds. Qwen Image Edit arrived as one of the first open-source image editing models with LoRA capability, democratizing fine-tuning for developers without enterprise compute budgets.

Google DeepMind’s Nano Banana (v1) proved that users without technical proficiency could generate production-quality content through natural language. ByteDance’s Seedream 4.0 offered faster generation at lower computational costs while maintaining comparable output quality.

Cinematic pre-viz
[Generated using Seedream 4.5](https://fal.ai/sandbox/share/B6QP5y1qNVBL), a model by ByteDance
Generated using Seedream 4.5, a model by ByteDance
Model*InputOutput
Nano Banana Pro EditImageImage
ReveTextImage
Seedream 4.5ImageImage
Qwen Image MaxImageImage
FLUX.1 devTextImage

*most popular models (avg. requests per day)

Eight major video generation releases in ten months produced rapid competitive iteration. Performance leadership changed hands multiple times as companies shipped at speeds uncommon in enterprise software. Google DeepMind released Veo 2 in December 2024, establishing physically accurate video as the quality benchmark. The model’s physics simulation accurately modeled gravity, water dynamics, and object interactions, setting the quality-bar for production-ready video generation.

Hyper-realism
Generated using Veo3.1, a model by Google DeepMind
TimelineModelCompanyKey Innovation
Dec 2024Veo 2Google DeepMindPhysically accurate video, quality benchmark
Apr 2025Kling 2.0KuaishouFirst-frame-last-frame narrative control
May 2025Veo 3Google DeepMindNative audio generation with video output
Jul 2025MirageLSDDecartLive-stream diffusion, real-time generation
Sep 2025Sora 2OpenAIScene-aware multi-modal generation

PixVerse v4 in February 2025 expanded access beyond technical users, demonstrating that sophisticated video generation could reach mainstream creators. Kling 2.0 in April 2025 introduced first-frame-last-frame functionality, giving creators precise narrative control over generated sequences and allowing for consistent character depictions.

Veo 3 in May 2025 enabled rapid-turnaround workflows for social media and content channels. The model’s combination of speed and quality with native audio generation let content creators generate, iterate, and publish within hours rather than days. Competition intensified through summer 2025. MiniMax Hailuo 02 by Minimax and Seedance 1.0 by ByteDance both launched in June 2025, proving multiple technical approaches could achieve top performance simultaneously.

MirageLSD from Decart in July 2025 generated video frame-by-frame in real-time through live-stream diffusion. The approach opened applications in live streaming and interactive entertainment that batch-processing models could not address. Sora 2 in September 2025 combined native audio with excellent multi-shot generation in a single output, enabling coherent scene transitions without manual editing.

Major releases arrived every 4-6 weeks in 2025, with performance improvements expanding viable use cases across entertainment, marketing, and education.

Model*InputOutput
Kling 2.5ImageVideo
Seedance 1.5 ProImageVideo
Sora 2Video, text, image.Video
Veo 3.1TextVideo
FILMVideoVideo

*most popular models (avg. requests per day)

Audio became one of the most production-ready generative media categories in 2025. ElevenLabs Turbo v2.5 is among the most widely used low-latency text-to-speech systems (~250–300ms), while MiniMax Speech-02 (May 2025) achieved 99% human voice similarity across 32 languages. As one generative voice user noted, "Sub-300ms is table stakes for voice AI. Above that, the experience breaks."[6]

Open-source alternatives expanded accessibility. Kokoro TTS under Apache 2.0 achieved production quality with 82 million parameters. Dia 1.6B TTS from Nari Labs provided ultra-realistic dialogue synthesis.

Eleven Music by ElevenLabs (August 2025) was the first major AI music model trained entirely on licensed data, establishing opt-in participation and 50/50 royalty splits for artists. Suno drove rapid consumer adoption with high-quality, prompt-based song generation. Models like Mirelo SFX v1.5 (October 2025) automatically created synchronized sound effects and music from video.

Dance beat
Audio background
Generated using ElevenLabs Music, a model by Eleven Labs
Model*InputOutput
ElevenLabs TTS Turbo v2.5TextAudio
Minimax TTSTextAudio
ElevenLabs Multilingual v2TextAudio

*most popular models (avg. requests per day)

Model*InputOutput
Whisper v3SpeechText
WhisperSpeechText
ElevenLabs STTSpeechText

*most popular models (avg. requests per day)

3D generation matured from experimental outputs to production assets in 2025, compressing modeling timelines from weeks to minutes. Tencent released Hunyuan 3D 2.0 in January 2025. Deemos launched HyperRodin Gen 1.5 in April 2025 with 4 billion parameters. Meshy launched version 5 in July 2025 and v6 preview in October, recognized by Andreessen Horowitz in their game developer survey.[18]

Architecture
Generated using Hunyuan 3D 3.0, a model by Tencent

Tripo 3.0 in September 2025 served over 3 million creators and 700+ enterprises[26]. Microsoft’s TRELLIS 2, released most recently in December, can generate high-resolution assets in under 3 seconds, creating opportunities for real-time applications.

Further innovation in 3D models is on the horizon. Generated meshes still require topology cleanup for animation workflows. Geometric accuracy falters on intricate mechanical assemblies. Hard-surface modeling can demand significant manual refinement.

Model*InputOutput
Tripo SRImage3D
TrellisImage3D
Meshy 6Text3D
Sam 3DImage3D
Hunyuan 3DImage3D

*most popular models (avg. requests per day)

Creativity is not machines generating machine outputs. It's the kid who doesn't have access to a VFX lab who can now do these things. We're accelerating human creativity.

Steve JangGenerative Media Conference (October 24, 2025). Steve Jang presentation. [27]

World models simultaneously generate and simulate interactive 3D environments where all modalities converge. DeepMind announced Genie 2 in December 2024, generating playable 3D environments from single image prompts. Users and AI agents navigate using keyboard and mouse, with the model simulating action consequences within physically consistent spaces. The system maintains consistency for 10-20 seconds, with some environments persisting up to a minute[25].

Fei-Fei Li’s World Labs launched Marble in November 2025, the first commercial world model product. Marble generates persistent and downloadable 3D environments from text, images, videos, or panoramas. The platform outputs environments as Gaussian splats, meshes, or videos, integrating into Unity, Unreal Engine, and VR headsets.

World models are unifying video generation’s temporal understanding and 3D modeling’s spatial reasoning in real-time interactions. This enables autonomous vehicles training in simulated cities and game developers prototyping worlds from sketches. Current systems mostly serve prototype deployments rather than a full production release.

Generative media adoption: 89% personal vs. 57% organizational [4]

Foundation models will continue improving on core metrics (resolution, temporal consistency, physical realism), but improvement rates will likely decelerate as models approach fundamental limits. Addressing limitations will require architectural innovation beyond current diffusion and transformer approaches. Recent model releases signal the potential for new directions:

I was enthralled with generative media from the first day I used Stable Diffusion. Every day, the quality of AI content on my feeds is getting better. It is everywhere.

Justine MooreGenerative Media Conference (October 24, 2025). Justine Moore presentation. [28]
Product photography
[Generated using Flux 2 Max](https://fal.ai/models/fal-ai/flux-2-max/edit/playground?share=380da2c0-e34a-45c3-9af4-a6f62ece374e), a model by Black Forest Labs
Generated using Flux 2 Max, a model by Black Forest Labs

Flux.2 achieved 3x faster inference with comparable quality through architectural refinements, changing high-volume image generation economics. Enhanced prompt following and improved text rendering address persistent production deployment challenges.

Advertising
[Generated using Flux 2 Max](https://fal.ai/models/fal-ai/flux-2-max/edit/playground?share=f3e1ad99-6f4a-476d-84a1-256556fc87c7), a model by Black Forest Labs
Generated using Flux 2 Max, a model by Black Forest Labs

Wan 2.6 from Alibaba’s Tongyi Lab launched December 16, 2025, introducing native audio-visual synchronization. The model generates 15-second videos at 1080p with synchronized dialogue, sound effects, and background music while maintaining character consistency across multi-shot narratives.

Kling O1 introduced innovations in video editing, enabling complex multi-step instructions that previously required manual chaining, reducing human intervention for sophisticated video workflows.

Gaming asset
Generated using Hyper3D Rodin, a model by Deemos

SAM 3D from Meta launched November 19, 2025, reconstructing 3D objects with geometry, texture, and spatial layout from single images. Two variants address everyday items (SAM 3D Objects) and human pose estimation (SAM 3D Body), achieving high win rates over existing methods.

Enterprise generative AI adoption accelerated through 2025, with adoption rates varying by industry vertical and use case. Personal users bypassed technical requirements through emerging consumer applications, enabling immediate access without required expertise. Organizations faced distinct barriers: model orchestration complexity, integration decisions and cost management all constrained deployment pacing. Businesses used two pathways for access to generative technology, splitting evenly between applications (65%) and APIs (62%)[4], with many using both.

44% of image generation in production workflows, compared to 39% for video [4]
Industry VerticalAdoption RatePrimary Use Cases
Advertising56%Rapid creation of campaign visuals, banner ads, social graphics at scale
Entertainment, Media & Creative Storytelling43%Storyboarding, pre-visualization, special effects, short-form promotional clips
Creative Software or Tools31%Design platform, creative software, video/image editing tools
Educational and Training Content30%Interactive learning videos, animated explainers
Retail & E-Commerce19%Automated product photography, catalog images, virtual try-on mockups
Architecture & Real Estate8%3D renders, staging visuals, and concept imagery for developments

Source: Artificial Analysis & fal (2025). State of Generative Media Survey Report 2025 [4]

Production deployment maturity varied by modality. 31% of organizations are still in the prototyping phase of deploying generative models into their workflows[4]. Creative teams gravitated toward generative applications for rapid iteration without code, while engineering organizations prioritized API integration for programmatic control and workflow automation.

As frontier model access becomes increasingly commoditized, adoption is expanding beyond early entertainment-led experimentation. Organizations across advertising, e-commerce, and creative production are moving toward reliable production infrastructure, where consistent performance, scalability, and cost efficiency matter most.

Return on generative media investment materialized faster than expected for new enterprise software technology[4]. The details, however, reveal return on investment is still split: organizations achieving strong ROI concentrate on specific high-value use cases with clear metrics, while those pursuing broad experimentation report disappointing returns[9].

ROI StatusPercentage of Organizations
Already profitable34%
Expecting returns within 12 months31%
Total achieving ROI ≤ 12 months65%

Organizations reporting measurable ROI focused on three categories: efficiency gains, cost reduction, and revenue expansion. 74% of companies report their initiatives meet or exceed ROI expectations[9]. For the creative marketing platform Pimento, results were achieved by eliminating cold-start delays rather than maximizing quality, since marketers needed to test dozens of variations quickly before increasing sophistication and fidelity. Deployment reduced generation times by 80%, doubling their feature shipping pace[21].

Game studios need speed more than hosting control, as competitive advantages came from offering the latest capabilities before competitors. The digital creative platform Layer built on this insight, enabling a lean team to release new models to studios within 24 hours[22].

Organizations achieving generative scale made structural changes beyond deploying new technology. 43% redesigned workflows and production pipelines, 33% invested in staff training and upskilling, and 30% allocated dedicated budget for media generation infrastructure[4].

Advertising agencies

Marketing organizations showed 75% generative AI adoption, up from 61% in 2024. However, 80% reported using AI on less than half of their work. Legal concerns dominated hesitation: 94% cited intellectual property ownership and liability as implementation challenges[10]. Integration with existing creative workflows (e.g. through Adobe Creative Suite, DAM systems or campaign platforms) proved more challenging than anticipated.

Monologue
Audio background
Generated using Minimax Speech 2.6, a model by Minimax

Agencies achieving scale implemented generative media for content variation and A/B testing rather than primary asset creation. 72% of marketers identified GenAI as the most important trend for H2 2025[11], yet only 30% achieved full integration across the campaign lifecycle[12]. This gap showed pressing infrastructure needs: programmatic generation at campaign scale, brand consistency enforcement, and audit trails for legal compliance in certain industries.

E-commerce platforms

E-commerce platforms demonstrated high adoption rates, with product image generation becoming a core infrastructure capability. Matt Koenig articulated the critical constraint that distinguishes e-commerce from other verticals:

The creativity of models absolutely cannot interfere with product fidelity. Images and videos must have a faithful representation of every product.

Matt KoenigProduct Manager, Shopify
3D printing
Generated using SAM 3D, a model by Meta

Film and production studios

Film and television production exhibited cautiously optimistic adoption in major operational workflows. Major studios allocated less than 3% of production budgets to generative AI while shifting 7% of operational spending toward AI-enabled tools for contract management, permitting, and planning[13]. Independent studios followed different patterns: 65+ AI-centric film studios have launched since 2022[15] employing generative AI throughout production pipelines.

The 68% adoption rate among all media companies[14] reflects deployment in pre-visualization, automated editing, and post-production VFX augmentation, rather than primary content creation. Still, media companies’ AI spending is projected to grow at 37.2% CAGR between 2024-2029, from $2.6B to $12.5B[14], signaling sustained investment despite currently conservative production budgets.

VFX
Generated using Seedance 1.5, a model by ByteDance

The varying rates of generative adoption emphasize how established studios are optimizing operational costs, while new entrants compete on production economics restructured by new capabilities. Katzenberg highlighted the underlying institutional constraint at the Generative Media Conference:

The greatest innovations that occur, they don’t happen within the legacy enterprises. They’re just not able to let go of the past and innovate into the future.

Jeffrey KatzenbergFounding Partner of WndrCo and founder of DreamWorks Animation [16]

Gaming companies

Gaming studios showed strong adoption, with 68% actively implementing AI in workflows[17]. Gaming’s generative media growth came from rapid iteration requirements incompatible with traditional asset timelines, and technical infrastructure focused on more predictable generations. 40% of studios experienced productivity gains exceeding 20%, with 25% achieving cost savings above 20%[18].

Gaming
Generated using Grok Imagine, a model by xAI

Use cases ranged from concept art acceleration to texture generation, NPC dialogue variation, animation in-betweening, and procedural level generation augmented with AI-generated details. Infrastructure selection emphasized generation speed (41%) and reliability (37%) over maximum quality[4]. This preference explained gaming’s significant adoption of generative media APIs: programmatic control enables integration into development pipelines.

Gaming’s infrastructure requirements differ from other verticals: real-time generation for dynamic content, batch processing for asset libraries, and game engine integration. There is high demand for infrastructure that enables rapid model deployment and scales with unpredictable player loads. Burkay Gur outlined a future for generative gaming:

Text-to-game will be the continuation of text-to-video; it’s essentially making the video output interactive. We are not too far away from that. It’s a great use case for world models.

Burkay GurCo-founder and CEO, fal

This convergence of video and interactivity represents a fundamental shift from content creation to world simulation. Text-to-game capabilities would enable dynamic, AI-generated game environments responding to player actions in real-time, transforming gaming from pre-authored experiences to emergent narratives.

Education

The education sector represents one of the largest untapped opportunities for generative media, combining massive market size with historically limited technology adoption. Sonya Huang articulated this potential:

I’m most excited for the education use case. Education is a market that is so important and has never had that many compelling business cases behind it. The challenge is the bottleneck to create high quality content at scale that is most ideal for the learner.

Sonya HuangPartner, Sequoia
Education
Generated using Kling 3.0, a model by Kling

Traditional educational content production faces the same constraints Huang identified; creating high-quality, personalized content at scale has been economically prohibitive. Gorkem Yurtseven had an equally strong prediction on generative AI in education:

The education market is almost untouched right now with video generation. And there is so much potential there, it’s just waiting for the quality and predictability to open up new use cases.

Gorkem YurtsevenCo-founder and CTO, fal

Current limitations, particularly consistency and controllability, constrain educational deployments. Educational content requires factual accuracy, cultural sensitivity, and curriculum coherence across multi-week lesson sequences. As these capabilities mature, education could become one of the largest generative media markets, driven by the need for personally optimized learning at massive scale.

Infrastructure quality became a determining factor in development velocity during 2025. Organizations successful in scaling generative AI deployments prioritized optimized serving infrastructure over model selection. In gaming, studios needed to focus resources on core business competencies rather than GPU management[22].

Despite consistent positive ROI reporting, challenges persist in achieving production deployment at scale. Infrastructure providers varied widely in speed and reliability, with cold starts disrupting user flows[21].

Decision CriterionOrganizations Prioritizing
Cost optimization58%
Model availability49%
Generation speed41%
Reliability and uptime37%
Data security and compliance34%

Infrastructure Selection Criteria [4]

ProviderImage Generation APIsVideo Generation APIs
fal.ai50%44%
Google AI Studio33%56%
OpenAI39%-
Replicate15%22%

Infrastructure Provider Adoption [4]

These technical choices compound over time. Products processing millions of daily requests build competitive advantages through sustained optimization at the kernel and network layers, not quarterly feature sprints. Infrastructure partner responsiveness mattered as much as raw performance; trust and willingness to collaborate on tests and benchmarks became critical selection factors[21].

Enterprise production deployments use a median of 14 different models[24]. The belief that single "omni models" would handle all generative tasks proved incorrect. Production deployments revealed that task-specific optimization consistently outperformed general-purpose approaches for specialized applications.

People predicted omni models that can generate every type of token but it’s becoming more clear that you need to optimize for a specific output. The best upscaling model is just doing upscaling; all these special tasks require their own models with their own weights.

Gorkem YurtsevenCo-founder and CTO, fal

This proliferation creates a complexity that enterprise organizations struggle to manage. The opportunity for tools that simplify model selection, testing, switching, and performance monitoring across multiple providers is significant.

Video generation: 62% personal adoption vs. 32% organizational [4]

Certain enterprise use-cases increasingly favor open source models over closed APIs for production deployments. The transparency of open-source enables enterprise teams to audit model behavior, ensure data isolation, and deploy on-premises without vendor lock-in. Jennifer Li shared insights at the Generative Media Conference:

If we have an open source model where the code is available, the model is available, they can test this, they can play with it... the barriers to entry for self-hosting are much lower compared to closed models.

Jennifer LiManaging Partner, Andreessen Horowitz

Whether or not self-hosting, organizations need superior capabilities in inference optimization, multi-tenancy efficiency, and geographic distribution. As foundation models become more commoditized, infrastructure decisions will determine the speed and success of generative deployments.

Film score
Audio background
Generated using Minimax Music, a model by Minimax

The trajectory of generative media development through 2026+ is clear. Three major themes will dominate:

  • 1) Multimodal advancements (e.g. world models)
  • 2) Infrastructure optimization
  • 3) Democratization of creative tools

Expertise will become orchestration rather than execution. Taste becomes scarce, while capability becomes abundant. As technical capabilities are more commoditized, the fundamental value proposition shifts. "It’s the storytelling that matters."[19]

The creative democratization that Katzenberg described will manifest in new forms. Solo entrepreneurs will generate visual content indistinguishable from large-scale production companies. Successful organizations building generative technology into their products will compete on orchestration, deployment reliability, and domain-specific optimization.

The durable competitive moats will belong to teams that understand how to best deploy generative media, now that generating professional media is easier than ever.

State of Generative Media Volume 1, published 2026

Build with the fastest inference platform on the planet

Whether you need to ship a feature today or train a massive model from scratch, fal gives you the power and flexibility to do both.