Quora Poe and fal
Industry: Consumer Internet
Use Case: High-volume generation of images and videos via inference
Results: Driving 50% of Poe's image and video messages, faster response times, and higher user satisfaction
Poe is a platform by Quora that aims to be the central hub for AI, a go-to place where users can easily access and experiment with the best AI available. Just as the web browser unlocked the potential of the early internet, Poe seeks to make AI accessible and user-friendly for everyone.
Challenge: Poe wanted to provide users with faster, higher-capability image and video results and handle high & growing volume of requests
A key part of this strategy involves seamlessly integrating the latest image and video generation capabilities into the user experience. The ability for users to quickly create and share visual content unlocks new opportunities to explore what's possible and democratize content creation, making AI more engaging and practical.
Generative AI models for images and videos are typically large and computationally intensive, leading to high latency during inference. Poe needed a high-performance solution to ensure fast and high-quality responses—particularly for image and video output—without compromising on model accuracy or user experience.
The Poe team looked for an infrastructure partner that could:
- Optimize inference for large generative models.
- Provide seamless access to the latest AI models for image and video generation.
- Scale reliably to handle a large user base and increasing volume of AI requests.
Solution: Partnering with fal
By integrating fal's high-performance inference pipeline, Poe was able to:
- Reduce Inference Latency: fal's optimized infrastructure cut response times significantly, delivering lightning-fast outputs for Poe's image and video bots.
- Access Cutting-Edge Models: Poe can now select from fal's constantly updated library of generative media models, ensuring users always benefit from the latest breakthroughs.
- Handle High Volumes: fal's robust API endpoints scale effortlessly with Poe's traffic, keeping user experience smooth and responsive even during peak loads.
"We've been impressed with the speed and scalability of fal's inference throughout our partnership. Their optimized inference pipeline and quick access to newly released models have helped our product keep up with the rapidly developing market for generative media."
— Spencer Chan, Product Lead, Poe by Quora
Outcome: fal drives 50% of Poe's image and video messages, faster response times, and higher user satisfaction
- Image/Video Message Contribution: fal bots account for ~50% of image and video generation messages on Poe in Jan 2025
- Like Ratio: fal's image and video generation bots receive 18% more positive feedback than other multimedia bots, demonstrating impressive user satisfaction
- Image/Video Bots Share: fal currently powers 40% of Poe's official image and video generation bots
- Speed: fal's response time are 36% faster than other providers (Flux-dev)
- Enhanced User Experience: fal's prompt execution on optimization tasks helps Poe maintain a top-tier AI experience, further solidifying user loyalty.
"fal currently powers 50% of Poe's official image and video generation bots. The fal team is one of the fastest-moving organizations we work with and consistently goes the extra mile to optimize inference and ensure great user experience. We are excited to work together to scale both of our platforms as the incredibly rapid progress in AI continues and we make it accessible to the world."
— Adam D'Angelo, CEO of Quora
Building on the success of fal's integration, Poe plans to explore additional generative AI capabilities and scale further as user demand grows. fal will continue to optimize inference pipelines, ensuring Poe remains at the forefront of fast, cutting-edge AI experiences.