Skip to main content
fal apps run on standard Python, so you can instrument them with the OpenTelemetry SDK the same way you would any other service. Add the SDK to requirements, initialize a tracer in setup(), and wrap your inference stages with spans. For custom metrics instrumentation, see Custom Metrics. For tracing across multiple fal apps, see Cross-Service Tracing.

Prerequisites

You need an OTLP-compatible backend. Any of the following work:
BackendOTLP endpointAuth header
New Relic (US)https://otlp.nr-data.net:4318api-key=<INGEST_LICENSE_KEY>
New Relic (EU)https://otlp.eu01.nr-data.net:4318api-key=<INGEST_LICENSE_KEY>
Datadog (US)https://otlp.datadoghq.comdd-api-key=<API_KEY>
Datadog (EU)https://otlp.datadoghq.eudd-api-key=<API_KEY>
Grafana Cloudhttps://otlp-gateway-prod-<region>.grafana.net/otlpAuthorization=Basic <base64(instanceId:token)>
Honeycombhttps://api.honeycomb.iox-honeycomb-team=<API_KEY>
Store your credentials as fal secrets so they are available as environment variables on the runner without being embedded in your code.
fal secrets set OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.datadoghq.com
fal secrets set OTEL_EXPORTER_OTLP_HEADERS="dd-api-key=<YOUR_API_KEY>"
See Managing Secrets for details on how fal secrets work.

Adding Traces to Your App

Add opentelemetry-sdk and opentelemetry-exporter-otlp-proto-http to your app’s requirements. The exporter reads OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS from the environment automatically, no endpoint or auth code required. Initialize the tracer in setup(). The provider and export connection are created once per runner, not once per request. The example below builds on the Stable Diffusion XL quickstart and adds spans around each stage of a text-to-image request.
Python
import os

import fal
from fal.toolkit import Image
from pydantic import BaseModel, Field


def setup_tracer(service_name: str):
    from opentelemetry import trace
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from opentelemetry.sdk.resources import Resource
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor

    resource = Resource.create({"service.name": service_name})
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
    trace.set_tracer_provider(provider)
    return trace.get_tracer(service_name), provider


class Input(BaseModel):
    prompt: str = Field(description="The prompt to generate an image from")
    num_inference_steps: int = Field(default=20)


class Output(BaseModel):
    image: Image
    trace_id: str


class TextToImageApp(fal.App):
    machine_type = "GPU-H100"
    requirements = [
        "hf-transfer==0.1.9",
        "diffusers[torch]==0.32.2",
        "torch==2.10.0",
        "transformers[sentencepiece]==4.51.0",
        "accelerate==1.6.0",
        "opentelemetry-sdk==1.41.0",
        "opentelemetry-exporter-otlp-proto-http==1.41.0",
    ]

    def setup(self):
        # Enable HF Transfer for faster downloads
        os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

        import torch
        from diffusers import StableDiffusionXLPipeline

        self.tracer, self.tracer_provider = setup_tracer("text-to-image")

        self.pipe = StableDiffusionXLPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        ).to("cuda")

        # Warmup runs once per runner at startup - not per request.
        # It compiles CUDA kernels so the first real request does not pay that cost.
        with self.tracer.start_as_current_span("warmup") as span:
            span.set_attribute("model.name", "stable-diffusion-xl-base-1.0")
            self.pipe("warmup")

    @fal.endpoint("/")
    def run(self, input: Input) -> Output:
        with self.tracer.start_as_current_span("text-to-image") as root:
            root.set_attribute("model.name", "stable-diffusion-xl-base-1.0")
            root.set_attribute("prompt.length", len(input.prompt))
            root.set_attribute("num_inference_steps", input.num_inference_steps)

            with self.tracer.start_as_current_span("inference") as span:
                span.set_attribute("num_inference_steps", input.num_inference_steps)
                result = self.pipe(
                    input.prompt,
                    num_inference_steps=input.num_inference_steps,
                )

            with self.tracer.start_as_current_span("upload"):
                image = Image.from_pil(result.images[0])

            trace_id = format(root.get_span_context().trace_id, "032x")

        return Output(image=image, trace_id=trace_id)

    def teardown(self):
        # Flush buffered spans before SIGKILL (5s grace period).
        # For sampling, batch tuning, and conditional tracing see Production Configuration.
        if self.tracer_provider:
            self.tracer_provider.force_flush(timeout_millis=4000)

Span Structure

The example above produces a tree of spans under a single root:
text-to-image
├── inference
└── upload
The warmup span appears in your backend attached to the runner’s startup trace, not to individual requests. Each request produces its own text-to-image root span. The parent span’s duration covers all of its children, so text-to-image reflects the total request time including upload. The trace appears in your backend like this, with inference and upload shown as timed children of the root span:
Span tree showing text-to-image root span with inference and upload children in a trace backend

Span Attributes

Call span.set_attribute(key, value) to attach metadata to a span. Attributes appear as filterable fields in your backend’s trace viewer, so you can search for all traces where num_inference_steps is above a threshold or prompt.length exceeds a limit.
Python
with self.tracer.start_as_current_span("inference") as span:
    span.set_attribute("model.name", "stable-diffusion-xl-base-1.0")
    span.set_attribute("num_inference_steps", input.num_inference_steps)
    span.set_attribute("prompt.length", len(input.prompt))
    span.set_attribute("guidance_scale", 7.5)
Attribute keys follow the OpenTelemetry semantic conventions where applicable. For model-specific attributes, use a consistent namespace like model.* or inference.*.

Marking Errors

Use record_exception and set_status to mark a span as failed. This is the portable OpenTelemetry pattern — all OTLP backends interpret StatusCode.ERROR as a failed span, whereas a custom error attribute is backend-specific metadata.
Python
from opentelemetry.trace import Status, StatusCode

with self.tracer.start_as_current_span("inference") as span:
    try:
        result = self.pipe(input.prompt)
    except RuntimeError as e:
        span.record_exception(e)
        span.set_status(Status(StatusCode.ERROR, str(e)))
        raise
BatchSpanProcessor exports spans asynchronously in the background. On a long-running runner, spans are batched and exported on a schedule. On shutdown, spans still in the buffer are flushed in teardown(). See Production Configuration for how to configure this flush.

What’s Next

Custom Metrics

Add counters, histograms, and gauges to your app

Cross-Service Tracing

Connect traces across two fal apps into a single parent trace

Production Configuration

Sampling, batch export tuning, and graceful flush on shutdown

App Lifecycle

How setup() and teardown() fit into the runner lifecycle