Metrics - fal

GET

serverless

metrics

Metrics

curl --request GET \
  --url https://api.fal.ai/v1/serverless/metrics \
  --header 'Authorization: <api-key>'

const options = {method: 'GET', headers: {Authorization: '<api-key>'}};

fetch('https://api.fal.ai/v1/serverless/metrics', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

"# HELP fal_app_runners Number of fal app runners\n# TYPE fal_app_runners gauge\nfal_app_runners{application=\"my/app\",machine_type=\"NVIDIA B200\",state=\"running\"} 21\n# HELP fal_app_queue_size Current size of the fal app queue\n# TYPE fal_app_queue_size gauge\nfal_app_queue_size{application=\"my/app\"} 7\n# HELP fal_app_concurrent_requests Current number of concurrent requests being processed\n# TYPE fal_app_concurrent_requests gauge\nfal_app_concurrent_requests{application=\"my/app\"} 3\n# HELP fal_app_requests_completed Number of requests completed in the last minute\n# TYPE fal_app_requests_completed gauge\nfal_app_requests_completed{application=\"my/app\",method=\"POST\",status=\"200\"} 18\n# HELP fal_app_requests_received Number of requests received in the last minute\n# TYPE fal_app_requests_received gauge\nfal_app_requests_received{application=\"my/app\",method=\"POST\"} 24\n# HELP fal_app_request_latency Number of requests completed, bucketed by latency in seconds\n# TYPE fal_app_request_latency gauge\nfal_app_request_latency{application=\"my/app\",le=\"1.0\"} 5\nfal_app_request_latency{application=\"my/app\",le=\"+Inf\"} 24"

{
  "error": {
    "type": "authorization_error",
    "message": "Authentication required"
  }
}

{
  "error": {
    "type": "rate_limited",
    "message": "Rate limit exceeded"
  }
}

{
  "error": {
    "type": "server_error",
    "message": "An unexpected error occurred"
  }
}

Authorizations

Authorization

string

header

required

API key must be prefixed with "Key ", e.g. Authorization: Key YOUR_API_KEY

Response

Prometheus-compatible metrics retrieved successfully

Prometheus-compatible metrics in text format

Example:

"# HELP fal_app_queue_size Current size of the fal app queue\n# TYPE fal_app_queue_size gauge\nfal_requests_total{application=\"my/app\"} 10"

Analytics Time-bucketed metrics for your serverless app endpoints, including request counts, success/error rates, and latency percentiles across all inbound traffic. `prepare_duration` reflects queue/prepare time before execution; `duration` is request execution time. This endpoint shows all inbound requests to endpoints you own — not just your own calls. This is ideal for monitoring your deployed apps, tracking SLAs, and exporting data to tools like BigQuery or Grafana. You must own all requested endpoints; returns 403 otherwise. A bare app id ('<owner>/<name>') automatically includes the app's registered route-level endpoints (e.g. '<owner>/<name>/turbo'); results stay grouped by the route-level id they were recorded under. Pass a route-level id to filter to that route exactly. **Metric Selection:** You must specify which metrics to include using the `expand` query parameter. Only requested metrics will be populated in the response, allowing you to optimize query performance and data transfer. **Available Metrics:** The `expand` parameter accepts these values, grouped by category: *Volume* - `request_count`: Total number of requests in the time bucket - `success_count`: Successful requests (2xx responses) - `user_error_count`: User errors (4xx responses) - `error_count`: Server errors (5xx responses) *Error type breakdown* - `startup_error_count`: Startup errors (startup timeout, scheduling failure) - `connection_error_count`: Connection errors (timeout, disconnected, refused) - `timeout_error_count`: Request timeout errors - `runtime_error_count`: Runtime errors (internal error, server error) *Queue / prepare latency* - `p50_prepare_duration`, `p75_prepare_duration`, `p90_prepare_duration`, `p95_prepare_duration`, `p99_prepare_duration`: Time from request submission until execution starts *Request execution latency* - `p25_duration`, `p50_duration`, `p75_duration`, `p90_duration`, `p95_duration`, `p99_duration`: Time spent processing the request *Cold boot* - `cold_boot_count`: Requests with cold boot (startup > 1s) - `p50_cold_boot_duration`, `p75_cold_boot_duration`, `p90_cold_boot_duration`: Cold boot duration percentiles *Billing* - `total_billable_duration`: Aggregate billed execution time **Key Features:** - See all traffic to your apps across all callers - Selective metric inclusion via expand parameter - Performance metrics (latency percentiles, duration stats) - Reliability metrics (success/error rates, request counts) - Error type breakdown (startup, connection, timeout, runtime) - Cold boot metrics (count, latency percentiles) - Billing duration tracking - Time-bucketed data for trend analysis - Flexible date range and timeframe options **Common Use Cases:** - Monitor your serverless app performance and reliability - Export analytics to your own observability tools - Analyze latency trends across all callers - Track error rates and SLA compliance

⌘I

Metrics

curl --request GET \
  --url https://api.fal.ai/v1/serverless/metrics \
  --header 'Authorization: <api-key>'

const options = {method: 'GET', headers: {Authorization: '<api-key>'}};

fetch('https://api.fal.ai/v1/serverless/metrics', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

"# HELP fal_app_runners Number of fal app runners\n# TYPE fal_app_runners gauge\nfal_app_runners{application=\"my/app\",machine_type=\"NVIDIA B200\",state=\"running\"} 21\n# HELP fal_app_queue_size Current size of the fal app queue\n# TYPE fal_app_queue_size gauge\nfal_app_queue_size{application=\"my/app\"} 7\n# HELP fal_app_concurrent_requests Current number of concurrent requests being processed\n# TYPE fal_app_concurrent_requests gauge\nfal_app_concurrent_requests{application=\"my/app\"} 3\n# HELP fal_app_requests_completed Number of requests completed in the last minute\n# TYPE fal_app_requests_completed gauge\nfal_app_requests_completed{application=\"my/app\",method=\"POST\",status=\"200\"} 18\n# HELP fal_app_requests_received Number of requests received in the last minute\n# TYPE fal_app_requests_received gauge\nfal_app_requests_received{application=\"my/app\",method=\"POST\"} 24\n# HELP fal_app_request_latency Number of requests completed, bucketed by latency in seconds\n# TYPE fal_app_request_latency gauge\nfal_app_request_latency{application=\"my/app\",le=\"1.0\"} 5\nfal_app_request_latency{application=\"my/app\",le=\"+Inf\"} 24"

{
  "error": {
    "type": "authorization_error",
    "message": "Authentication required"
  }
}

{
  "error": {
    "type": "rate_limited",
    "message": "Rate limit exceeded"
  }
}

{
  "error": {
    "type": "server_error",
    "message": "An unexpected error occurred"
  }
}