Navigating Federal Information Processing Standards in Azure Kubernetes Service

Organizations that operate in highly sensitive data domains have to often validate the use of FIPS(Federal Information Processing Standards) Level 2 compliant concerns throughout adoption of multiple technologies. This blog is intended to show the use of Azure Kubernetes Service FIPS Enablement along with the brief understanding of FIPS and uses. FIPS Levels are represented as a standard that defines minimum security requirements for cryptographic modules in information technology products and systems. As your organization starts to migrate to containerized applications across cloud platforms its likely you’ll encompass the question of how much security does the provider enable organizations and what do they provide? Azure Kubernetes Service supports FIPS 140-2 with Linux and Windows Node Pools you can have this enabled at creation or segment that part of your cluster by adding an additional node pool with FIPS Compliant nodes. From a practitioner standpoint you have to know FIPS 140-3 has superseded FIPS 140-2. The Cryptographic Module Validation Program (CMVP) is a joint effort of NIST has provided a searchable list to validate vendors against cryptographic modules shown below to demonstrate Microsoft on this list.

Azure Kubernetes FIPS-Enabled Node

AKS offers a simple way of ensuring you running nodes that are FIPS Level 2 compliant, you can simply run the parameter from the command line as shown below or you can be complex like myself and use terraform as Infrastructure as Code.

module "aks" {
  source                            = "Azure/aks/azurerm"
  version                           = "7.5.0"
  resource_group_name               = azurerm_resource_group.aks.name
  kubernetes_version                = var.kubernetes_version
  orchestrator_version              = var.kubernetes_version
  prefix                            = "aks-chaos-mesh"
  network_plugin                    = "kubenet"
  vnet_subnet_id                    = lookup(module.aks-vnet.vnet_subnets_name_id, "subnet0")
  os_disk_size_gb                   = 50
  sku_tier                          = "Standard" # defaults to Free
  private_cluster_enabled           = false
  rbac_aad                          = var.rbac_aad
  role_based_access_control_enabled = var.role_based_access_control_enabled
  http_application_routing_enabled  = false
  enable_auto_scaling               = true
  enable_host_encryption            = false
  log_analytics_workspace_enabled   = false
  agents_min_count                  = 1
  agents_max_count                  = 3
  agents_count                      = null 
  agents_max_pods                   = 100
  agents_pool_name                  = "system"
  agents_availability_zones         = ["1", "2"]
  agents_type                       = "VirtualMachineScaleSets"
  agents_size                       = var.agents_size
  workload_identity_enabled         = true
  oidc_issuer_enabled               = true
  default_node_pool_fips_enabled    = true

  agents_labels = {
    "nodepool" : "defaultnodepool"
  }

  agents_tags = {
    "Agent" : "defaultnodepoolagent"
  }

  ingress_application_gateway_enabled = false

  network_policy             = "calico"
  net_profile_dns_service_ip = "10.0.0.10"
  net_profile_service_cidr   = "10.0.0.0/16"

  key_vault_secrets_provider_enabled = true
  secret_rotation_enabled            = true
  secret_rotation_interval           = "3m"

  depends_on = [module.aks-vnet]
}

Of course I won’t leave you hanging this represents our aks.tf file we can also reference the main.tf below.

terraform {
  required_version = ">=1.3"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.0, < 4.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "1.14.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "2.10.1"
    }
  }
}
provider "azurerm" {
  features {}

}

provider "kubectl" {
  config_path = "~/.kube/config"
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
}
}
resource "azurerm_resource_group" "aks" {
  name     = "aks-chaos-mesh-rg"
  location = "East US"
}

#creates a vnet/subnet with the ability to use the mapping as shown see ref https://registry.terraform.io/modules/Azure/subnets/azurerm/latest
module "aks-vnet" {
  source              = "Azure/subnets/azurerm"
  version             = "1.0.0"
  resource_group_name = azurerm_resource_group.aks.name
  subnets = {
    subnet0 = {
      address_prefixes = ["10.52.0.0/24"]
    }
  }
  virtual_network_address_space = ["10.52.0.0/16"]
  virtual_network_location      = var.region
  virtual_network_name          = "aks-chaos-vnet"
}

module "aks-vnet2" {
  source              = "Azure/subnets/azurerm"
  version             = "1.0.0"
  resource_group_name = azurerm_resource_group.aks.name
  subnets = {
    subnet0 = {
      address_prefixes = ["10.0.0.0/24"]
    }
  }
  virtual_network_address_space = ["10.0.0.0/16"]
  virtual_network_location      = var.region
  virtual_network_name          = "aks-chaos-vnet2"
}

Additionally it would be missing if I didn’t include the variable.tf – listed below.

#variable.tf


variable "region" {
  type    = string
  default = "eastus"
}

variable "agents_size" {
  default     = "Standard_D2s_v3"
  description = "The default virtual machine size for the Kubernetes agents"
  type        = string
}

variable "kubernetes_version" {
  description = "Specify which Kubernetes release to use. The default used is the latest Kubernetes version available in the region"
  type        = string
  default     = null
}

variable "os_sku" {
  type        = string
  default     = null
  description = "(Optional) Specifies the OS SKU used by the agent pool. Possible values include: `Ubuntu`, `CBLMariner`, `Mariner`, `Windows2019`, `Windows2022`. If not specified, the default is `Ubuntu` if OSType=Linux or `Windows2019` if OSType=Windows. And the default Windows OSSKU will be changed to `Windows2022` after Windows2019 is deprecated. Changing this forces a new resource to be created."
}

variable "pod_subnet_id" {
  type        = string
  default     = null
  description = "(Optional) The ID of the Subnet where the pods in the default Node Pool should exist. Changing this forces a new resource to be created."
}

variable "private_cluster_enabled" {
  type        = bool
  default     = false
  description = "If true cluster API server will be exposed only on internal IP address and available only in cluster vnet."
}

variable "private_cluster_public_fqdn_enabled" {
  type        = bool
  default     = false
  description = "(Optional) Specifies whether a Public FQDN for this Private Cluster should be added. Defaults to `false`."
}

variable "private_dns_zone_id" {
  type        = string
  default     = null
  description = "(Optional) Either the ID of Private DNS Zone which should be delegated to this Cluster, `System` to have AKS manage this or `None`. In case of `None` you will need to bring your own DNS server and set up resolving, otherwise cluster will have issues after provisioning. Changing this forces a new resource to be created."
}

variable "public_network_access_enabled" {
  type        = bool
  default     = true
  description = "(Optional) Whether public network access is allowed for this Kubernetes Cluster. Defaults to `true`. Changing this forces a new resource to be created."
  nullable    = false
}

variable "public_ssh_key" {
  type        = string
  default     = ""
  description = "A custom ssh key to control access to the AKS cluster. Changing this forces a new resource to be created."
}

variable "rbac_aad" {
  type        = bool
  default     = false
  description = "(Optional) Is Azure Active Directory integration enabled?"
  nullable    = false
}

variable "role_based_access_control_enabled" {
  type        = bool
  default     = false
  description = "Enable Role Based Access Control."
  nullable    = false
}


variable "sku_tier" {
  type        = string
  default     = "Free"
  description = "The SKU Tier that should be used for this Kubernetes Cluster. Possible values are `Free` and `Standard`"

  validation {
    condition     = contains(["Free", "Standard"], var.sku_tier)
    error_message = "The SKU Tier must be either `Free` or `Standard`. `Paid` is no longer supported since AzureRM provider v3.51.0."
  }
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "Any tags that should be present on the AKS cluster resources"
}

variable "aks_virtual_network" {
  type        = string
  default     = "vnet-aks"
  description = "virtual network name"
}

variable "aks_vnet_address_space" {
  description = "Specifies the address prefix of the AKS subnet"
  default     = ["10.0.0.0/16"]
  type        = list(string)
}

variable "subnet_delegation" {
  type = map(list(object({
    name = string
    service_delegation = object({
      name    = string
      actions = optional(list(string))
    })
  })))
  default     = {}
  description = "`service_delegation` blocks for `azurerm_subnet` resource, subnet names as keys, list of delegation blocks as value, more details about delegation block could be found at the [document](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/subnet#delegation)."
  nullable    = false
}

variable "subnet_enforce_private_link_endpoint_network_policies" {
  type        = map(bool)
  default     = {}
  description = "A map with key (string) `subnet name`, value (bool) `true` or `false` to indicate enable or disable network policies for the private link endpoint on the subnet. Default value is false."
}

variable "subnet_names" {
  type        = list(string)
  default     = ["subnet1"]
  description = "A list of public subnets inside the vNet."
}

variable "subnet_prefixes" {
  type        = list(string)
  default     = ["10.0.1.0/24"]
  description = "The address prefix to use for the subnet."
}

variable "default_node_pool_subnet_name" {
  description = "Specifies the name of the subnet that hosts the default node pool"
  default     = "SystemSubnet"
  type        = string
}

variable "default_node_pool_subnet_address_prefix" {
  description = "Specifies the address prefix of the subnet that hosts the default node pool"
  default     = ["10.0.0.0/20"]
  type        = list(string)
}

variable "subnet_service_endpoints" {
  type        = map(list(string))
  default     = {}
  description = "A map with key (string) `subnet name`, value (list(string)) to indicate enabled service endpoints on the subnet. Default value is []."
}

variable "use_for_each" {
  type    = bool
  default = true
}

variable "api_server_authorized_ip_ranges" {
  type        = set(string)
  default     = null
  description = "(Optional) The IP ranges to allow for incoming traffic to the server nodes."
}

variable "api_server_subnet_id" {
  type        = string
  default     = null
  description = "(Optional) The ID of the Subnet where the API server endpoint is delegated to."
}

Few items of notice from our code we aren’t declaring the access mode to the API Server this should be limited to need-to-know as the control plane is operating across nodes. For this quick spin-up I’ve added the parameters api_server_authorized_ip_ranges = [“xx.xx.xx.xx/xx”] for my own uses if I’m doing a quick demo and ensuring minimal access from this mechanism. However, in production you should use private link endpoints along with a bastion host in a paired virtual network for this access.

Let’s run the following to see our output intentions.

terraform init
terraform plan

We can see that our node pool adds the parameter we sent through the module to the annotation under default node pool. This ensures to tell the API request we are going to need a FIPS-Enabled image.

After you’re satisfied with the parameters you run a terraform apply (ideally after you’ve scanned your configuration prior to moving to production) since we are demoing this I’m going to delete this prompt after this demonstration.

Validating FIPS on Azure Kubernetes Service

So after we’ve applied our configuration we now can access our cluster by accessing through our credentials.

az account set --subscription <id>
az aks get-credentials --resource-group aks-chaos-mesh-rg --name aks-chaos-mesh-aks --overwrite-existing

Once we’ve authenticated now lets check out our nodes we have running

kubectl get nodes -o wide

We can see we running our kernel version reflects 5.4.0-1121-azure-fips further more we can also validate this by accessing the node. We can do that be initiating the debugging mode to access the node with the following command, first I want to show how we can further validate that FIPS is enabled now from the auditor perspective lets actually dive into the node and validate.

az aks show --resource-group aks-chaos-mesh-rg --name aks-chaos-mesh-aks --query="agentPoolProfiles[].{Name:name enableFips:enableFips}" -o table

To run debug node the only change in this command will be your node name

kubectl debug node/aks-system-65776705-vmss000000 -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0

You should see this will prompt on the use of debugger on the node to access and now we are accessing the aks-system node.

cat /proc/sys/crypto/fips_enabled

This attempts to access the cryptographic library and the response can be represented as ‘1’ signifying enablement or a ‘0’ false.

Additionally, on our FIPS-enabled node pools the label is applied kubernetes.azure.com/fips_enable=true if we want to target our workloads to deploy on that. In our case we only have one node so this wouldn’t be much of a challenge.

If we run a kubectl describe nodes <node> we can see the label annotated its quite a lot so I did a wider view for this.

Summary

Federal Information Protection Standards vary in levels and notably are a area of focus for organization operating with sensitive data that isn’t considered classified but does provide minimum security requirements for cryptographic modules and this shows how you can operate kubernetes securely. While this doesn’t encompass other areas of security concerns that we can cover a good starting point if Azure Kubernetes Service is your back yard you can reference this link. I’m going to cover more in this implementation of services such as if you require the most secure workloads such as data in use being encrypted consider confidential computing images which are supported in AKS as well. Reference architecture in areas such as PCI-DSS is covered extensively in the architecture center such as this link.