Azure ML on AKS with Trusted Access

Trusted Access which is in preview provides secure access to the Kubernetes API Server while granting services that are needed for operations without requiring a traditional (private-endpoint). This feature uses the system0assigned managed identity as a authentication mechanism as intermediary to access your AKS clusters.

As always in any feature that is rolled out prior to going “General Availability” the documentation does explicitly state this is on a self-service, opt-in basis and previews are provided “as-is” and “as available”. In other words until this feature comes to AKS full supported General Availability I wouldn’t recommend putting this into production conversations yet.

Requirements

  • Azure Subscription
  • Resource types that support system-assigned managed identity (mostly all Azure services support this)
  • If the Azure CLI is used ensure your at least on the version 0.5.74 or later is required.

First we have to ensure we have the latest aks-preview updated on our cli run the following commands

az login # if you haven't already
az extension add --name aks-preview
az extension update --name aks-preview

This should get us up to the latest update from the aks-preview after we’ve completed those we now have to register the feature this tells Azure we want to use this preview feature specifically in our scope.

az feature register --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

This should relatively return fairly quickly when I ran this it took about 3-5 seconds for this value to return.

Validate this is registered with the following command.

az feature show --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

So now that we’ve registered our “feature” we have to also refresh the provider with the following command (think of this as the parent of the Trusted Access)

az provider register --namespace Microsoft.ContainerService

If you are going to use api-server authorized access to limit the API Servers visibility to a private IP or /32 block you’re going to have to also authorize Machine Learning IP’s of the region this is provided by this link.

For this cluster I’m deploying this is in East US so to search the JSON it will look like this following mind you we are using the accessible IP Ranges of the service by region.

If your running this in the parameters in terraform the plan output should look like this in your terminal.

I’ve received a issue with the IPv6 on the parameter I’m going to test this and add the IPv6 via the portal after deployment to see if its working.

I’m also deploying the Azure Machine Learning extension add on as code shown the plan output of this.

If all goes well from the deployment you should see this on your AKS cluster UI page.

This is taking much longer than anticipated deploying via terraform in any case after refreshing several times in thirty minutes I was able to connect and see the extension pods are running mostly.

kubectl get pods -A

After this updates – assuming all goes well I was returned with a error from the API Server I’m of the belief I got ahead of my self trying to use terraform to deploy the extension so to recreate the actions I’m going to use the native CLI.

az k8s-extension create --name AzureML --extension-type Microsoft.AzureML.Kubernetes --config enableTraining=True enableInference=True inferenceRouterServiceType=LoadBalancer allowInsecureConnections=True InferenceRouterHA=False --cluster-type managedClusters --cluster-name aks-chaos-mesh-aks --resource-group aks-chaos-mesh-rg --scope cluster

Assuming if you are following along these parameters match the located repository here that I’m working out of https://github.com/sn0rlaxlife/aks-chaos-engineer

Now once we run this command notice the syntax “allowInsecureConnections=true” – this states to use this as a Proof-of-concept if you need more secure parameters for this such as production this changes to the following syntax –config-protected sslCertPemfile=<file-path-to-cert-PEM> sslKeyPemFile=<file-path-to-cert-Key>.

Assuming you have a Machine Learning Workspace if not go ahead and create one you will use this to connect to the AKS cluster. Once this next feature is updated to our cluster I’m using the UI to help those not too familiar with the CLI commands but you can use either.

Select Kubernetes Clusters

You don’t have to assign a namespace but for logical isolation this would be ideal to have a namespace defined such as “ml-ops” (example). Ensure we are using a managed-identity that can be either system-assigned or user-assigned.

It appears I’m still receiving a error due to the extension ran on the cluster isn’t fully deployed, no worries we will come back to that.

Going back to our CLI we can see that our cluster has updated the namespace azureml with more pods following the parameters added.

We can also see the active roles running in our namespace running the following

kubectl get roles -n azureml

After our extension updates with all settings applied we can see the succeed state.

Now we can move back to the UI of our Azure Machine Learning Studio and re-configure our attachment and get the success.

Trusted Access in Action

Okay if you are still with me this is where we are going to use our parameters from the CLI for Trusted Access

# Sample command provided by docs in our case I've updated the rg, cluster name, source-resource-id

az aks trustedaccess rolebinding create -g aks-chaos-mesh-rg --cluster-name aks-chaos-mesh-aks -n ml-binding --source-resource-id /subscriptions/00000-000000-00000-00000/resourceGroups/aks-chaos-mesh-rg/providers/Microsoft.MachineLearningServices/workspaces/aks-chaos-mesh
--roles Microsoft.MachineLearningServices/workspaces/mlworkload

We want this AKS cluster to write/delete however we can also scope this down to what we intend the AKS cluster to be responsible for a good documentation covering this to specific roles outlining each API permission and use the link below.

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-assign-roles?view=azureml-api-2&tabs=labeler

If you run into issues such as I had when assigning even the sample to the cluster a few items to consider (I’m running local on VS Code the Azure CLI is highly sensitive if you miss one update especially with preview features you have to ensure this updated to adequately run some of these commands). Mind you the documentation on AKS Trusted Access has this sample code wrong you have to dive into the documentation located here

We’ve successfully have done the following.

  • Deployed to AKS a ML Extension hosting our workloads
  • Deployed Azure ML Studio
  • Create a Trusted Access Binding to Azure ML Studio – Segmented Access

Remember to destroy your resources so you don’t incur those pesky costs that can add up!

Summary

Azure Kubernetes Service roadmap continues to iterate in features that are to enhance capabilities of native use of Azure services, once this goes to General Availability I could see more guides being published on the use of this for production workloads such as ensuring segmentation to roles that access services. Role-based access control is a focal point in kubernetes as a whole, if your not utilizing a RBAC Analyzer or a similar KSPM type of product/mechanism it can get out of hand if you don’t enforce guard rails. This is a step forward in having segmented access that could likely extend to some JIT iteration I could see with Entra moving into the SASE space as well. Its interesting to see the connections made to Azure ML and the adoption of MLOps for Kubernetes clusters I will post more in this space but with the security lens and architecture outlining how the decisions are made.