Using Zipkin and OpenTelemetry on Azure Container Apps

9 minute read | By Anthony Salemo

This post will go over how to set up a basic opentelemetry implementation that exports traces to Zipkin as a backend, all hosted with a Container App Environment

Overview

Note, this is not using the “managed” opentelemetry collector that Container Apps has as a in-preview feature, which is described here: Collect and read OpenTelemetry data in Azure Container Apps (preview)

This post will cover setting up a “custom” implementation, which wil have three (3) general parts:

  • Client: The client/application. For the sake of blog, this won’t go much into the client and setting up SDK or codeless instrumentation for the application. This can be any application using the opentelemetry SDK or codeless agent
  • Collector: The opentelemetry collector
  • Backend: Zipkin

Setting up

If not done so already, create a Container App Environment and deploy your opentelemetry instrumented application

Instrumenting an application with opentelemetry can be found in Zero code instrumentation or opentelemetry - Language APIs & SDKs

NOTE: The below FQDN’s used for Zipkin is the Kubernetes-based pod name. The FQDN used for the opentelemetry collector is an internal FQDN. It is possible to use an external FQDN for both of these.

Zipkin

We can use the Docker Hub image for Zipkin to quickly set up a basic Zipkin applicatiion. Zipkin defaults to using in-memory storage, therefor any container or pod/replica restarts/recreations will cause this data to be deleted.

In a production scenario, you want to set up a storage backend of either Cassandra or ElasticSearch. Configuration on how to do this can be found here: openzipkin - storage

For this proof-of-concept, we’ll be using in-memory storage.

  1. Create a Container App using the Zipkin Docker Hub image openzipkin/zipkin:latest - make sure to remember the name of the Container App - this is important for later. We’ll just name this “zipkin”:

    Container App creation with Zipkin image in the portal

  2. Set the following:
    • Ingress traffic: Accepting traffic from anywhere
    • Ingress port: 9411.

    For the purpose of this, we’ll be setting the above ingress traffic setting so we can view data in the browser. In production scenarios, we would want to lock down traffic to a known set of IP’s.

    Container App creation with ingress config in the portal

  3. Create the Container App. Since we’ll be using in-memory storage we want to set our minimim replica count to one (1). After creation, navigate to the Scale blade and change the Min replicas count:

    Min replica config in the portal

  4. Zipkin should now be set up. If you navigate to your Zipkin Container App FQDN, you’ll see the default search page for traces.

    Zipkin UI

Opentelemetry

We can use the opentelemetry Docker Hub image to quickly set up our collector. To make this work, this will require an Azure Storage account with Azure Files.

Alternatively, you could create your own image and override the default collector .yml file.

  1. Create a Container App using the following:
    • Name: otel
    • Image source: Docker Hub or other registries
    • Image type: Public
    • Registry login server: otel/opentelemetry-collector-contrib:0.114.0.
    • Arguments override: --config=/etc/otelcol-contrib/otel-collector.yml

    Container App creation for otel collector in the portal

  2. Set the following:
    • Ingress traffic: Limited to Container Apps Environment
    • Ingress type: HTTP/2
    • Target port: 4317

    Otel collector ingress configuration

  3. Create the Container App. After creation, navigate to the Scale blade and change the Min replicas count to one (1):

    Min replica config in the portal

  4. At this point, we need to create an Azure Files-based storage volume and mount it to our container in the pod.

    • a) Create a file named otel-collector.yml with the following contents and then upload it to an Azure File Share in your Azure Storage Account:

    IMPORTANT: If you used a different name other than zipkin for the Container App, make sure to update the below endpoint value by replacing zipkin with the name of your Container App.

     receivers:
       otlp:
         protocols:
           grpc:
             endpoint: 0.0.0.0:4317
    
     processors:
         batch:
    
     exporters:
       zipkin:
         endpoint: "http://zipkin/api/v2/spans"
         tls:
           insecure: true
    
     extensions:
         health_check:
         pprof:
         zpages:
    
     service:
         extensions: [health_check, pprof, zpages]
         pipelines:
           traces:
             receivers: [otlp]
             processors: [batch]
             exporters: [zipkin]
    
    • b) Navigate to your Container App Environment -> Azure Files - and create a storage resource that references the Storage Account and File Share that the above otel-collector.yml file resides in. Since we’re just referencing a singular .yml that’s only being read-in, this is set to read-only. This can be set to read-write if desired.

      Storage configuration on the environment

  5. With the storage resource now created on the environment, we need to create a volume and mount it to the container

    a) Go to the Container App -> Volumes - and add a new volume:

    Volume configuration on the Container App

    b) Go to the Container App -> Containers -> Edit and deploy -> select the “otel” container and choose Edit -> Volume mounts - and add a new volume to this container.

    Choose the volume name that was added in the previous Volumes section (above) - set the Mount path to /etc/otelcol-contrib:

    Volume mount configuration on the Container App

  6. At this point, the collector should now be set up.

Client

  1. Create your Container App. This example will be using a Flask application instrumented with the Open Telemetry Python SDK. Below is an example of what is being used in terms of instrumentation:

     def initialize_instrumentation():
         # Service name is required for most backends
         resource = Resource(attributes={
             SERVICE_NAME: os.getenv('OTEL_SERVICE_NAME',
                                     'otel-sdk-examples-python-sdk')
         })
    
         traceProvider = TracerProvider(resource=resource)
    
         if os.getenv('ENVIRONMENT') == 'dev':
             processor = BatchSpanProcessor(ConsoleSpanExporter())
         else:
             processor = BatchSpanProcessor(OTLPSpanExporter(
                 endpoint=os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT', 'localhost:4317')))
    
         traceProvider.add_span_processor(processor)
         trace.set_tracer_provider(traceProvider)
    
     ... other code ...
    

    If the ENVIRONMENT environment variable is set to dev, we export telemetry to the console. Otherwise, we’ll use the OTLP exporter.

    Furthermore, if OTEL_EXPORTER_OTLP_ENDPOINT is set, we’ll use that value, otherwise we use a default of localhost:4317

  2. During Container App creation (or post-creation), set the following environment variables:

    • OTEL_EXPORTER_OTLP_ENDPOINT: https://otel.internal.funnyname-1234abc.yourregion.azurecontainerapps.io/ - you can find your opentelemetry collector’s FQDN by going to the Ingress blade on the application
    • OTEL_EXPORTER_OTLP_PROTOCOL: grpc
    • OTEL_SERVICE_NAME: Set this to the name of your Container App

Validate tracing

Assuming everything is now properly set up, make a few requests to your client application - and then navigate to your Zipkin appliications FQDN. Select Run query - if this was properly set up and instrumented, traces should now populate:

Zipkin traces

Troubleshooting

It is heavily recommended to have either Log Analytics or Azure Monitor set up as log destinations. Or, another destination acting as a log collector.

Setting the log destination to “none” will not persist logging. See Log Options for details.

Collector troubleshooting

unsupported protocol scheme:

  • You’re likely forgetting to include a protocol scheme (eg. http:// or https://) in the endpoint property within otel-collector.yml. Ensure this is a valid and resolvable URI.

context deadline exceeded:

  • This can happen if you include a port within the URI. Typically, this would work in a setup such as docker-compose or other Kubernetes implementations. But doing something such as http://zipkin:9411/api/v2/spans within your endpoint property will cause this error. Ommiting the port (ex. http://zipkin/api/v2/spans) resolves this
  • This may also happen if trying to target a backend that is simply not reacheable due to network traffic being blocked. You can attempt to use curl or other tooling to validate it’s reacheable.

If the collector container is exiting or failing to start and you want to view general stdout/err, query the ContainerAppConsoleLogs_CL table (Log Analytics) or ContainerAppConsoleLogs table (Azure Monitor), for example:

ContainerAppConsoleLogs_CL
| where ContainerAppName_s == "otel"
| project TimeGenerated, Log_s, ContainerGroupName_s

For container or pod lifecycle events, you can use ContainerAppSystemLogs_CL or ContainerAppSystemLogs

Application troubleshooting

The way to go about this may depend on the language and if using the SDK or the codeless agent. As a first attempt, you can increase verbosity of opentelemetry logging through the OTEL_LOG_LEVEL environment variable.

However, you still may need to configure stdout/err from the application, if in the case the above variable doesn’t change logging. For example, with our Python application using the SDK, the below additional set up is needed to see opentelemetry SDK errors and logging:

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)
# Add a handler to see logging to console
logging.getLogger().addHandler(logging.StreamHandler())
logging.basicConfig(level=logging.INFO)

Other languages may vary.

StatusCode.UNAVAILABLE encountered while exporting logs to [somendpoint]

  • If this is repeating and telemetry is not appearing in Zipkin, this may be due to an invalid/incorrect URI set for the OTEL_EXPORTER_OTLP_ENDPOINT variable.

StatusCode.UNIMPLEMENTED

  • This may not actually directly indicate an error. With our Zipkin example, we’re only setting this up to accept traces. If we’re also trying to export logging and metrics from our application, for example, even though the collector isn’t set to collect/export these - we’ll see this error.

    • Failed to export logs to otel.internal.funnyname-123abc.youregion.azurecontainerapps.io, error code: StatusCode.UNIMPLEMENTED
  • Ensure if wanting to capture metrics and logs (or other telemetry types), that this is correctly configured through the opentelemetry collector under service.pipelines. If the backend doesn’t accept types of metrics or logs (as an example with Zipkin’s case here), you can potentially resolve these errors by setting the environment variables OTEL_METRICS_EXPORTER=none and OTEL_LOGS_EXPORTER=none - or - by programmatically excluding these.

For all other troubleshooting, ensure to review application logs through the ContainerAppConsoleLogs_CL or ContainerAppConsoleLogs table. For container or pod lifecycle events, you can use ContainerAppSystemLogs_CL or ContainerAppSystemLogs