/data filesystem on fal is a distributed, parallel storage system. It performs best when multiple files are read concurrently. Sequential file reads — loading one weight file at a time — significantly underutilize the filesystem and result in slower cold starts.
The Problem
Most model loading libraries (HuggingFace, PyTorch) read weight files sequentially — one shard at a time. On a distributed filesystem, this leaves most of the available bandwidth idle:The Solution: Pre-Read Files in Parallel
Before loading your model, pre-read all weight files into the OS page cache using parallel I/O. When the model loader then reads the files, they’re already cached in memory and load instantly.from_pretrained() call then reads from cache instead of the network.
Using It in Your App
Add the pre-read step at the beginning of yoursetup() method, before model loading:
Python Helper
For a cleaner approach, wrap it in a function:When to Use This
| Scenario | Recommendation |
|---|---|
Loading HuggingFace models from /data | Use parallel pre-reading |
| Loading custom PyTorch checkpoints with multiple shard files | Use parallel pre-reading |
| Loading a single large file | Less benefit — the bottleneck is single-file transfer speed. Consider FlashPack instead |
| Models already in local node cache (warm runners) | Minimal benefit — files are already fast to read |
How It Works
fal’s/data is a distributed filesystem that can serve many files concurrently at high throughput. When you read files sequentially, you use only a fraction of the available bandwidth. The xargs -P 32 approach fires off 32 concurrent cat commands, each reading a different file. The OS caches the file contents in memory (page cache), so when your model loader reads the same files moments later, it reads from RAM instead of the network.
Comparison with FlashPack
| Parallel Pre-Reading | FlashPack | |
|---|---|---|
| What it does | Pre-reads files into OS cache | Streams tensors directly to GPU |
| Works with | Any files in any format | PyTorch models (custom format) |
| Requires conversion | No — works with existing files | Yes — must convert to .flashpack format |
| Best for | Quick win with existing models | Maximum performance with converted models |
FlashPack
High-throughput tensor loading at up to 25 Gbps
Persistent Storage
How /data works and caching behavior