Intro to Distributed Systems and traceability using Jaeger – Pt.2

Veröffentlicht von Hesham Abo El-Magd am

To me, Traceability is basically logging but in a better and more structured context.

In the first part of this series, we discussed the history of distributed systems and microservices, giving Amazon website as an example of how the microservices would look like and inter-communication between them all.

Moving forward, what to use to debug a service request that goes through many microservices and you want to correlate all of the events and transactions that happened in that specific service request, Using the right transaction ID (correlation ID) that flows through your system and stable that ID to traces collected through the entire workflow.

In this post, we’ll dig deeper into Jaeger, the powerful distributed tracing tool that uses OpenTracing implementation and adheres to its specification as shown here. The OpenTracing specification describes the semantics of transactions in distributed systems and defined a unified model for how monitoring and tracing systems should be attained.

First, let’s have a look at the Jaeger components and how they interact with each other:

Jaeger components:

  1. Jaeger agent is a daemonset, in our scenario, within a Kubernetes environment it would be deployed as part of the jaeger stack at all Kubernetes cluster nodes (the nodes that will host workloads) to listen and collect tracing data.
  2. Jaeger collector is a swiss army knife that acts as a funnel or as an adapter in three basic models:
    • As a receiver that receives and accepts various tracing formats from Jaeger agents.
    • As a processor that runs tracing data through a processing pipeline.
    • As an exporter that egress these tracing data to your choice of storage backend either between Cassandra (if you prefer column-based NoSQL DB) or Elasticsearch(if you’re into indexing and searching).
      ** Kafka topics can be used as on-flight backend storage. You will also need to add the Ingester component, whose sole purpose is to read tracing data from the Kafka topics and firehose them into another backend storage such as Cassandra or Elasticsearch.
      ** I wouldn’t add more component (i.e Ingester Component here) if I have the luxury not to, since this would probably add more latency
  3. Jaeger Query component, a UI service that can retrieve tracing data using searching expressions.

Deploying Jaeger into Kubernetes Cluster

There are several ways in order to deploy Jaeger into your Kubernetes cluster. The one stated by Jaeger is by using Kubernetes Operator, which basically deploys a CustomResourceDefinitions (CRDs) to communicate with Kubernetes API. Before that, we would need to take care of deploying the needed cluster roles, cluster role bindings, and service accounts:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: jaeger-operator-in-myproject
  namespace: myproject
subjects:
- kind: ServiceAccount
  name: jaeger-operator
  namespace: observability
roleRef:
  kind: Role
  name: jaeger-operator
  apiGroup: rbac.authorization.k8s.io

This is one way to do it, the other way which I’m gonna go ahead and use in this post is by using helm charts. Why helm charts? Because Helm is a package manager for Kubernetes that provides a „push button“ deployment style, which has all the versioned, pre-configured application resources as one unit. Furthermore, you’re able to override the values of the chart by modifying values.yaml file.

  1. Make sure to install helm into your command-line, a bastion host which reaches your cluster or have somewhere a CI node, using this instruction.
  2. Using the official helm chart from jaegertracing GitHub repo.
  • Add the Jaeger Tracing Helm repository:
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts

As stated before, we have two options to choose preferred backend storage, and we can simply do this by modifying the values.yaml file as follows:

# Default values for jaeger.
# This is a YAML-formatted file.
# Jaeger values are grouped by component. Cassandra values override subchart values

provisionDataStore:
  cassandra: true # I'm choosing to deploy cassandra as part of jaeger helm chart to be used as a backend storage.
  elasticsearch: false
  kafka: false

----snip--------

storage:
  # allowed values (cassandra, elasticsearch)
  type: cassandra
  cassandra:
    host: cassandra
    port: 9042
    tls:
      enabled: false
      secretName: cassandra-tls-secret
    user: user
    usePassword: true
    password: password
    keyspace: jaeger_v1_test

----snip--------

cassandra:
  persistence:
    # To enable persistence, please see the documentation for the Cassandra chart

    enabled: true. # I'm enabling the usage of persistenceStorage for my cassandra instance and using the provided storageClass . Check the storageClasses provided by your cloud provider.

      storageClass: "my-disk-storage"
   config:
    cluster_name: jaeger
    seed_size: 1
    dc_name: dc1
    rack_name: rack1
    endpoint_snitch: GossipingPropertyFileSnitch

----snip--------

schema:
  annotations: {}
  image: jaegertracing/jaeger-cassandra-schema
  imagePullSecrets: []
  pullPolicy: IfNotPresent
  resources: {}
  • To install a release named jaeger using the above modified values.yaml file:
helm install jaeger jaegertracing/jaeger -f values.yaml 
  • The result of running the installation command using helm should as below:
helm install jaeger
  • and if you run kubectl get pod in jaeger namespace:

As we can see, we have 3 jaeger-agent instances since my current cluster is consists of 3 worker nodes, 3 instances of jaeger-cassandra DB, one jaeger-collector instance, and one jaeger-query instance.

Moreover, to access the jaeger-UI service, you would need to port-forward the query service and jaeger-UI should look like the below:

After taking care of deploying Jaeger, we will go ahead and create a simple flask app that would let me define some REST API, which would help me test the functionality of tracing each app endpoint.

from flask import Flask, render_template, url_for
from jaeger_client import Config
from flask_opentracing import FlaskTracing

app=Flask(__name__,template_folder='template')
config = Config(
    config={
        'sampler':
        {'type': 'const',
         'param': 1},
                        'logging': True,
                        'reporter_batch_size': 1,}, 
                        service_name="service")
jaeger_tracer = config.initialize_tracer()
tracing = FlaskTracing(jaeger_tracer, True, app)
app.config["DEBUG"] = True

@app.route('/', methods=['GET'])
def home():
    return render_template('home.html')

@app.route('/liquid', methods=['GET'])
def liquid():
    return render_template('liquid.html')

@app.route('/healthz', methods=['GET'])
def healthz():
    return render_template('healthz.html')

@app.errorhandler(404)
def page_not_found(e):
    return render_template('error.html'), 404

@app.errorhandler(500)
def internal_server_error(e):
    return render_template('error.html'), 500

app.run(debug=True, host="0.0.0.0")

As shown on the code snippet, We define 3 endpoints that we can call/curl in which then we would trace using Jaeger. The marked code is where we define the code changes needed to define the jaeger module in Flask and annotate the trace.

Then moving forwarded , to creating a docker image out of this Flask app , build it and push the image to docker hub:

FROM alpine:3.13

RUN apk add --no-cache py3-pip python3 && \
    pip3 install flask Flask-Opentracing jaeger-client

COPY . /usr/src/frontend

ENV FLASK_APP main.py

WORKDIR /usr/src/frontend

CMD flask run --host=0.0.0.0 --port=5000
docker build -t heshamaboelmagd/health-check-api:v3.1 .
docker push heshamaboelmagd/health-check-api:v3.1

We need now to create a Kubernetes deployment using this docker image just created , as shown below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: health-check
  annotations:
    "sidecar.jaegertracing.io/inject": "true"
spec:
  selector:
    matchLabels:
      run: health-check
  replicas: 2
  template:
    metadata:
      labels:
        run: health-check
    spec:
      containers:
      - name: health-check
        image: heshamaboelmagd/health-check-api:v3.1
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: health-check
  labels:
    run: health-check
spec:
  type: LoadBalancer
  ports:
  - port: 5000
    targetPort: 5000
  selector:
    run: health-check

The marked part of the deployment shows how to inject the Jaeger-Agent sidecar container that collects the traces out from our Flask app and sends them to the Jaeger-Collector component.

Let’s explore the deployments status :

We can see that despite our kubernetes deployment manifest contains only one container but since we added injected the Jaeger annotations , that helped creating the jaeger-agent sidecar for us , as shown below:

Next, We will attempt to call our endpoints simply by curling the endpoint, and/or open the browser and call the REST API URL as follows:

Since my REST-API Kubernetes service is type LoadBalancer, I can either run curl it by:

or using the browser:

Heading to Jeager-UI , We can filter our Service from service drop-down list, and on the Operation list , we can choose an any endpoint , I will choose liquid and Limit Results to 30, then click Find Traces

Use Cases:

For a bigger workload perspective already deployed on the cluster (or you plan to deploy) (frontend, backend and redis and/or mysql), the trace view at the dashboard should look like this below:

The detailed view of the traces showing the API requests and how many spans from each one of the received requests

Grafana-Jeager Integration:

In our usual Liquid monitoring solution, we bundle Prometheus, Grafana, and Loki and also adding Jaeger as a data source at Grafana. This allows us to further visualize your tracing and not only that but also can work hand in hand with Grafana Loki as below:

Conclusion:

Despite the demo here is simple but for very complicated systems and architectures, distributed tracing becomes pretty much invaluable. Depending on your architectural maturity and platform capabilities, you may be able to get a full working distributed tracing capability within your clusters – all thanks to the auto-instrumentation that is offered by Jaeger. To learn more about how Jaeger can help you monitor and resolve performance issues in your cluster, visit their official documentation.

Reference list:


Hesham Abo El-Magd

Hesham is interested in the field of microservices and service architecture. focusing on containers and container orchestration especially Kubernetes and cloud-native solutions.