PlayAI and fal
Industry: Generative Voice AI
Use Case: High-volume text-to-speech (TTS) generation via inference
Results: Faster inference times, 28% lower latency
PlayAI develops foundational voice models that power cutting-edge text-to-speech (TTS) experiences. Their mission is to help businesses and developers quickly integrate lifelike TTS capabilities into applications ranging from content creation and consumer apps, to voice agents and customer support applications, where reliability and low latency are paramount.
Challenge: PlayAI wanted to provide users with faster text-to-speech results while scaling inference capacity
Generative voice models are large and computationally intensive, often causing high latency during inference. With growing user demand for instant, natural-sounding TTS, PlayAI faced four main challenges:
- Latency-Sensitive Workloads: Play AI's voice agent and customer support customers require near-instant audio responses, with sub-300 ms latency essential to maintaining a good user experience.
- Demand for Rapid Model Release Cycles: To stay ahead of market needs, PlayAI needed to quickly roll out new or updated TTS models without lengthy deployment processes.
- Scaling to meet demand: Not only was PlayAI growing quickly, but it experienced large swings in demand as different countries based on time of day, and customers onboarded. PlayAI needed a partner who could manage volume spikes automatically.
- More Efficient Fine-Tuning: To address diverse use cases and languages, PlayAI needed an infrastructure that supports rapid and cost-effective fine-tuning.
PlayAI sought an infrastructure partner with a track record of accelerating inference, handling volume spikes and streamlining fine-tuning while maintaining the highest standards for reliability and voice fidelity.
Partnering with fal
PlayAI onboarded fal in less than a week with minimal code changes thanks to fal's managed GPU infrastructure and comprehensive monitoring solutions. By integrating fal's high-performance inference pipeline, PlayAI achieved:
-
Distributed Global GPU Network fal optimizes TTS requests by routing them to the nearest GPU locations, minimizing latency for users worldwide and ensuring consistently rapid audio generation with high-quality service.
-
Rapid Model Releases With fal's streamlined deployment processes, PlayAI two three new voice models in quick succession—reducing time to market and making updates available to developers.
-
Accelerated Inference fal's platform delivers faster inference while preserving the quality of generated voices.
-
Seamless scaling based on demand fal's platform automatically scales compute resources on-demand in less than a second, allowing a high availability at a flexible cost.
-
Efficient Fine-Tuning fal's platform dramatically reduces iteration cycles for advanced TTS fine-tuning.
Outcome: PlayAI's TTS models resulted in 28% lower latency
- 28% Lower latency: Users benefit from near-instant voice generation, significantly boosting satisfaction and engagement.
- Supported >25% month over month user growth: Delivering rapid, high-fidelity TTS attracted new customers and broadened PlayAI's user base.
- 120 ms average latency (Time to First Audio) for PlayAI's 3.0 mini model: Even under a 3x traffic spike, fal's region-aware architecture helped maintain sub 150ms latency, keeping PlayAI's services responsive at peak load.
"Working with fal has completely transformed our text-to-speech infrastructure. Our customers love the near-instant voice responses, we can scale globally, and the fine-tuning speed is unmatched. We're excited to expand our partnership and push the boundaries of generative voice AI."
— Mahmoud Felfel, CEO of PlayAI
this voice was cloned using PlayAI on fal
Building on this success, PlayAI plans to further leverage fal's infrastructure for next-generation voice models. As user demands evolve, fal remains a trusted partner to optimize performance and scale seamlessly, ensuring PlayAI continues to set the standard in generative voice AI.