AKS Advanced Networking Capabilities in Action

Azure Kubernetes Service has a new service that enhances observability, introducing the use Advanced Container Networking Services. This in a nutshell is a suite of services to observability in your kubernetes cluster supporting visibility from the Hubble UI and native integration of Azure Monitor + Grafana or you can Bring Your Own. This service as of today, June 1st 2024 in preview but will start being a paid offering in the next couple of days June 4th.

Some of the features capabilities are noted below as the following.

Node-level metrics: node-level visibility for understanding your traffic volume, dropped packets, and number of connections.
Hubble Metrics (DNS and Pod Level Metrics): Layer 4/Layer 7 packet flows you will see this later in a visual.
Hubble Flow Logs: Flow logs can answer your latency questions, troubleshoot any path if you need to dive further.

The node-level metrics for Cilium which will be what the cluster provisioned will have support the following metrics.

Pod-Level Metrics (Hubble Metrics)

Current Limitations

Pod-level metrics are currently only available on Linux
Cilium data plane is supported starting with Kubernetes 1.29
Metric labels may have subtle differences between Cilium and non-Cilium clusters.
Cilium data plane does not currently support DNS Metrics.

It’s also listed the for Scale to understand that Azure managed Prometheus and Grafana impose service-specific scale limitations.

Getting Started

So for getting started to use this I’m going to run the following configuration

AKS 1.29 is a minimum
Azure Grafana + Managed Prometheus
Azure Monitor

First start with updating the use of aks-preview in the cli.

az extension add --name aks-preview
az extension update --name aks-preview

Then we have to register the provider to start using this feature so we’ll call the following notice the namespace falls in Microsoft.ContainerService

az feature register --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview"

This will say Registering if you don’t have this as a registered provider once this is done you can run the same command with changing register to show.

az feature show --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview"

While that is registering we can provision what our cluster will be by starting to script the following, I’ve shorten it to one script for easy creation.

!#/bin/bash
# Set environment variables for the resource group name and location. Make sure to replace the placeholders with your own values.
export RESOURCE_GROUP="<resource-group-name>"
export LOCATION="<azure-region>"

# Create a resource group
az group create --name $RESOURCE_GROUP --location $LOCATION

# Set an environment variable for the AKS cluster name. Make sure to replace the placeholder with your own value.
export CLUSTER_NAME="<aks-cluster-name>"

# Create an AKS cluster
az aks create \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --api-server-authorized-ip-ranges "[x.x.x./32"] \
  --generate-ssh-keys \
  --location eastus \
  --max-pods 250 \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --network-dataplane cilium \
  --node-count 2 \ 
  --pod-cidr 192.168.0.0/16 \
  --kubernetes-version 1.29 \
  --enable-advanced-network-observability

Once you’re cluster is up and running grab the credentials by running.

az aks get-credentials --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP

Since we are creating Azure Managed Prometheus and Grafana we will use the underlying in another script.

nano grafana.sh
#!/bin/bash
#Set an environment variable for the Grafana name. Make sure to replace the placeholder with your own value.
export AZURE_MONITOR_NAME="azure-east-aks"
export RESOURCE_GROUP="aks-demo-networking"

# Create Azure monitor resource
az resource create --resource-group $RESOURCE_GROUP --namespace microsoft.monitor --resource-type accounts --name $AZURE_MONITOR_NAME --location eastus --properties '{}'

# Creating Grafana
export GRAFANA_NAME="aks-grafana"

# Create the instance
az grafana create \
  --name $GRAFANA_NAME \
  --resource-group $RESOURCE_GROUP

Then we can convert this to a executable chmod +x bash.sh then run ./bash.sh.

After all these commands have our creation we can now grab the Grafana and Azure monitor resource IDs in variables by the following from documentation.

# Ensure you used export RESOURCE_GROUP, GRAFANA_NAME
grafanaId=$(az grafana show \
               --name $GRAFANA_NAME \
               --resource-group $RESOURCE_GROUP \
               --query id \
               --output tsv)
azuremonitorId=$(az resource show 
                    --resource-group $RESOURCE_GROUP \
                    --name $AZURE_MONITOR_NAME \
                    --resource-type "Microsoft.Monitor/accounts" \
                   --query id \
                   --output tsv)

Then we have to link Azure Monitor and Grafana to the AKS Cluster.

az aks update --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --enable-azure-monitor-metrics \
  --azure-monitor-workspace-resource-id $azuremonitorId \
  --grafana-resource-id $grafanaId

If you run into a issue similar to this one shown below you’ll have to register the Microsoft.AlertsManagement this is on the Subscriptions -> Settings -> Resource Providers -> Search “Microsoft.AlertsManagement”

To validate we have this link run the following and we can see the creation of Azure Monitoring Agent, Metrics and more.

Now you can navigate to your Grafana Instance with the UI and start to uncover the Advanced Networking Dashboards in Prometheus.

Assuming this is publicly accessible but in a production environment this likely would be segmented behind a private link or firewall, using our Entra ID Credentials to authenticate.

Select the Endpoint then navigate to Dashboards.

Under the Azure Managed Prometheus folder look for the following visual to see the dashboards from this enablement.

Selecting the first one to see what is going on at the cluster lever this was a snapshot with some nice metrics.

Exploring the Pod Flows (Workload) I can see the connections at the pod level as shown.

From the pods namespace metrics this can show you between namespaces granular connections.

Installing Hubble CLI

For the rest of this you’ll need to have the Hubble CLI, I’ve used the source documentation installation below.

HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then HUBBLE_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}

Then let’s run the following command.

kubectl get pods -o wide -n kube-system -l k8s-app=hubble-relay

Run a port-forward and open another terminal as well.

kubectl port-forward -n kube-system svc/hubble-relay --address 127.0.0.1 4245:443

We still we need to run some following lines to get the access to use Hubble.

Create a script as shown below. I’ve used install.sh, for this file.

#!/usr/bin/env bash

set -euo pipefail
set -x

# Directory where certificates will be stored
CERT_DIR="$(pwd)/.certs"
mkdir -p "$CERT_DIR"

declare -A CERT_FILES=(
  ["tls.crt"]="tls-client-cert-file"
  ["tls.key"]="tls-client-key-file"
  ["ca.crt"]="tls-ca-cert-files"
)

for FILE in "${!CERT_FILES[@]}"; do
  KEY="${CERT_FILES[$FILE]}"
  JSONPATH="{.data['${FILE//./\\.}']}"

# Retrieve the secret and decode it
  kubectl get secret hubble-relay-client-certs -n kube-system -o jsonpath="${JSONPATH}" | base64 -d > "$CERT_DIR/$FILE"

# Set the appropriate hubble CLI config
  hubble config set "$KEY" "$CERT_DIR/$FILE"
done

hubble config set tls true
hubble config set tls-server-name instance.hubble-relay.cilium.io

Change to a executable and run ./<script>.sh

Once that has run lets check the secrets in our cluster.

kubectl get secrets -n kube-system | grep hubble-

We will need to add another file for accessing the UI hubble-ui.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: hubble-ui
  namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: hubble-ui
  labels:
    app.kubernetes.io/part-of: retina
rules:
  - apiGroups:
      - networking.k8s.io
    resources:
      - networkpolicies
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - componentstatuses
      - endpoints
      - namespaces
      - nodes
      - pods
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apiextensions.k8s.io
    resources:
      - customresourcedefinitions
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - cilium.io
    resources:
      - "*"
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hubble-ui
  labels:
    app.kubernetes.io/part-of: retina
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: hubble-ui
subjects:
  - kind: ServiceAccount
    name: hubble-ui
    namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-ui-nginx
  namespace: kube-system
data:
  nginx.conf: |
    server {
        listen       8081;
        server_name  localhost;
        root /app;
        index index.html;
        client_max_body_size 1G;
        location / {
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            # CORS
            add_header Access-Control-Allow-Methods "GET, POST, PUT, HEAD, DELETE, OPTIONS";
            add_header Access-Control-Allow-Origin *;
            add_header Access-Control-Max-Age 1728000;
            add_header Access-Control-Expose-Headers content-length,grpc-status,grpc-message;
            add_header Access-Control-Allow-Headers range,keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout;
            if ($request_method = OPTIONS) {
                return 204;
            }
            # /CORS
            location /api {
                proxy_http_version 1.1;
                proxy_pass_request_headers on;
                proxy_hide_header Access-Control-Allow-Origin;
                proxy_pass http://127.0.0.1:8090;
            }
            location / {
                try_files $uri $uri/ /index.html /index.html;
            }
            # Liveness probe
            location /healthz {
                access_log off;
                add_header Content-Type text/plain;
                return 200 'ok';
            }
        }
    }
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: hubble-ui
  namespace: kube-system
  labels:
    k8s-app: hubble-ui
    app.kubernetes.io/name: hubble-ui
    app.kubernetes.io/part-of: retina
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: hubble-ui
  template:
    metadata:
      labels:
        k8s-app: hubble-ui
        app.kubernetes.io/name: hubble-ui
        app.kubernetes.io/part-of: retina
    spec:
      serviceAccount: hibble-ui
      serviceAccountName: hubble-ui
      automountServiceAccountToken: true
      containers:
      - name: frontend
        image: mcr.microsoft.com/oss/cilium/hubble-ui:v0.12.2   
        imagePullPolicy: Always
        ports:
        - name: http
          containerPort: 8081
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8081
        readinessProbe:
          httpGet:
            path: /
            port: 8081
        resources: {}
        volumeMounts:
        - name: hubble-ui-nginx-conf
          mountPath: /etc/nginx/conf.d/default.conf
          subPath: nginx.conf
        - name: tmp-dir
          mountPath: /tmp
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext: {}
      - name: backend
        image: mcr.microsoft.com/oss/cilium/hubble-ui-backend:v0.12.2
        imagePullPolicy: Always
        env:
        - name: EVENTS_SERVER_PORT
          value: "8090"
        - name: FLOWS_API_ADDR
          value: "hubble-relay:443"
        - name: TLS_TO_RELAY_ENABLED
          value: "true"
        - name: TLS_RELAY_SERVER_NAME
          value: ui.hubble-relay.cilium.io
        - name: TLS_RELAY_CA_CERT_FILES
          value: /var/lib/hubble-ui/certs/hubble-relay-ca.crt
        - name: TLS_RELAY_CLIENT_CERT_FILE
          value: /var/lib/hubble-ui/certs/client.crt
        - name: TLS_RELAY_CLIENT_KEY_FILE
          value: /var/lib/hubble-ui/certs/client.key
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8090
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8090
        ports:
        - name: grpc
          containerPort: 8090
        resources: {}
        volumeMounts:
        - name: hubble-ui-client-certs
          mountPath: /var/lib/hubble-ui/certs
          readOnly: true
        terminationMessagePolicy: FallbackToLogsOnError
        securityContext: {}
      nodeSelector:
        kubernetes.io/os: linux 
      volumes:
      - configMap:
          defaultMode: 420
          name: hubble-ui-nginx
        name: hubble-ui-nginx-conf
      - emptyDir: {}
        name: tmp-dir
      - name: hubble-ui-client-certs
        projected:
          defaultMode: 0400
          sources:
          - secret:
              name: hubble-relay-client-certs
              items:
                - key: tls.crt
                  path: client.crt
                - key: tls.key
                  path: client.key
                - key: ca.crt
                  path: hubble-relay-ca.crt
---
kind: Service
apiVersion: v1
metadata:
  name: hubble-ui
  namespace: kube-system
  labels:
    k8s-app: hubble-ui
    app.kubernetes.io/name: hubble-ui
    app.kubernetes.io/part-of: retina
spec:
  type: ClusterIP
  selector:
    k8s-app: hubble-ui
  ports:
    - name: http
      port: 80
      targetPort: 8081

Now once that is finally complete we can run the following.

kubectl port-forward -n kube-system svc/hubble-ui 12000:80

Navigate to http://localhost:1200/

I clicked on gatekeeper-system to show the following visual.

I can also navigate to the kube-system get a more larger view of operations in this area.

Once you’ve explored the use of this feel free to delete as needed or keep for the visual just know pricing charges go into effect soon.

Summary

Tear down resources by running the following.

az resource delete -n aks-demo-networking

This will destroy the cluster and all our components the great thing is keeping it all in one resource from a quick management perspective.

This announcement appears to spearhead the continued support of Cilium and the tighter integration of Azure Managed Grafana its also real welcoming to see the continued use of Bring Your Own Prometheus if you want more control. Check out Azure Advanced Networking Capabilities for AKS if you’re trying to get more native integration of Azure Monitor.