Retina by Microsoft OSS

KubeCon 2024 in Europe has recently wrapped up this past week with some major announcements from various vendors one that stood out to me is the use of Retina. Microsoft released a open-source cloud-agnostic Kubernetes Network Observability platform this can provide a path to customizable telemetry. This telemetry has multiple options on where you’d like to store the telemetry for monitors such as Prometheus (Managed/Unmanaged), Azure Monitor and more notably this is leveraging eBPF in Linux for increased visibility on networks and aim to simplify packet captures.

From the documentation image credit: retina.sh

Now that we know a overview what does this look like in action is what the intention of this post and to explore some of the capabilities from the CLI use.

Getting Started

For context I’m running the following.

Kubernetes 1.29
Retina Repo
Helm
Kubectl

We start by simply cloning the repository.

git clone https://github.com/microsoft/retina.git

Then we are instructed to run the following on the command line to start installation.

make helm-install

Now for the purposes of this its important to note this make helm-install command initiates the basic mode multiple modes exist but for context.

Next we will use and run through the installation of Unmanaged Prometheus.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Now key area we have to use these values for reference we are going to deploy prometheus with the values stated in the file below.

windowsMonitoring:
  enabled: true

prometheusOperator:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                  - linux

  admissionWebhooks:
    deployment:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - linux
    patch:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - linux

prometheus:
  prometheusSpec:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: kubernetes.io/os
                  operator: In
                  values:
                    - linux
    additionalScrapeConfigs: |
      - job_name: "retina-pods"
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_container_name]
            action: keep
            regex: retina(.*)
          - source_labels:
              [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            separator: ":"
            regex: ([^:]+)(?::\d+)?
            target_label: __address__
            replacement: ${1}:${2}
            action: replace
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: instance
        metric_relabel_configs:
          - source_labels: [__name__]
            action: keep
            regex: (.*)

We use this or you can reference the file via the CLI as shown below.

helm install prometheus -n kube-system -f deploy/prometheus/values.yaml prometheus-community/kube-prometheus-stack

This command for some reason populated errors so I went and installed the helm chart without this and chose to pass this in via a upgrade.

If you run into issues like I did the actual source helm chart is found in the https://artifacthub.io/packages/helm/prometheus-community/prometheus you can then use the install method and pass in the deploy/prometheues/values.yaml that way as a flag –values.

I’ve then use the port-forward command as below to check the visual of Prometheus.

export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward --namespace default $POD_NAME 9090

We want to navigate to Status -> Targets -> kubernetes-pods.

Now lets remember for the visualization piece of the stack we will need a managed or my favorite unmanaged instance of Grafana.

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana

Once you follow the instruction of using the command to get your temporary password and get your port-forward going your dashboard should look like the following.

Navigate to Dashboards -> Create + – Import Dashboard -> input the following URL for reference. https://grafana.com/grafana/dashboards/18814-kubernetes-networking-clusters/

Once this is added the reflected dashboard should look like this view shown below.

Now to understand what is going on at this point understand we’ve deployed grafana/prometheus for our monitoring and scraping of metrics we’ve also have imported the dashboard that will cover activity.

In the background I’ve changed the installation mode to capture mode to demonstrate the use capture we use the following manifests.

To understand the modes three are defined.

Basic – Metrics aggregated by Node – proportional to number of nodes in cluster

Advanced/Pod-level with remote context – Basic metrics + extra metrics aggregated by source and destination pod – Has Scale limitations. Metric cardinality is unbounded (proportional to number of source/destination pairs, including external IPs)

Advanced/Pod-Level with local context – Basic metrics + extra metrics from ‘local’ pod (source for outgoing traffic, destination for incoming traffic) allows selection of which/what pods to observe (create metrics for) with annotations – Designed for scale.

Using Capture Mode with Retina CLI

First to use the CLI we need to navigate to the release pages currently the only supported underlying OS is Linux.

wget https://github.com/microsoft/retina/releases/download/v0.0.1/kubectl-retina-linux-amd64

For some odd reason even after I extracted this portion for the I ran into some issues getting this running.

A dirty workaround is the following if you are in the cloned repo move to /cli – notice the main.go this is essentially the entry point of the cli so you can run go run main.go and this acts similar just not as pretty.

The command that was used for this capture creates a job you are essentially directing what your desired network traffic you want to capture by the following.

kubectl retina capture create --host-path /mnt/capture --namespace default --node-selectors "kubernetes.io/os=linux"

This starts the capture as a job and will complete fairly quickly to ensure you aren’t waiting you can also pass in the –no-wait=true.

Now since we didn’t put that we wanted a output we have options of where we can put the capture running the following.

kubectl retina capture create --host-path /mnt/capture --namespace capture --node-selectors "kubernetes.io/os=linux"

This assumes you have /mnt/capture created – if you haven’t run this as sudo mkdir -p /mnt/capture and then run a sudo chown $(whoami):$(whoami) /mnt/capture.

This should populate a tarball in our desired location however, I was unable to get the outputs to post inside the /mnt/capture

I ran the following to track down where the logs were stored but ran into multiple issues in being able to locate this to the /mnt/capture.

Summary

While this is just scratching the surface this is still relatively early the hardest part of getting this far was the communication between Grafana and Prometheus capturing metrics from my cluster, I’ve ran into numerous issues to get to the end of this but the overall arching goal is to use the capture mode in a tailored version to your target. The tarball was suppose to contain the metadata from the network capture that we can inspect further for debugging or deeper analysis on our cluster. It’s important to understand prior where you will store the captures this is just one portion you can also use the CRD’s option that the documentation works off of, I’m going to run more testing on this in Azure Kubernetes Service or using managed rather a bare metal cluster. Retina will ensure the portability of the captures you want to use can be stored wherever you’d like to store them for further analysis the great thing about using the ecosystem of Grafana is all the data sources that can connect and be visualized so I can see this in further improvements.