Github Actions with Azure ML Jobs

Introduction

Its no secret that Github is a premier development platform for Source Control Management and does have a robust features that also allow for Continuous Integration/Continuous Deployment. These features are a part of the use of Github Actions, in this blog post I’m going to use the example code in a repository here this is a part of a challenge for understanding Machine Learning Operations. I’ve been studying in the background on the topic of Azure Data Scientist (DP-100) this encompasses automation of experiments, jobs and building ML models and touches on integration of Azure DevOps and Github Actions.

Pre-requisites

If you plan on following along with this demo you’ll need the following items.

  • Azure Subscription
  • Github Account
  • A Service Principal (Scoped to Resource Group as a Contributor)
  • Azure Machine Learning Workspace + Compute Cluster

For starters for any integration to a CI/CD system this requires a authentication/authorization method to act on behalf of the system to your environment. Visually this can be described as the following visual below.

You can see from the visual we have a few items that are stored in the resource group this is deployed via the ./setup.sh with exception that the compute cluster isn’t added from this however we can use the initial compute attached to the workspace to run this as well.

Getting started

First things first we start with a forked clone of the existing repository I’m using cloud shell you can use WSL/Linux to follow along.

git clone https://github.com/sn0rlaxlife/mlops-demo.git

After we clone and change into the mlops-demo directory we then run the following for our setup script and you can edit the sp.sh for the following parameters.

chmod +x ./setup.sh

After the script runs you’ll notice this error from the data asset we are trying to register.

This is expected until I fix the last part of the script in any case for simplicity terms Azure ML will look for a file MLTable this has to be detected when running via the CLI data asset.

To have this fixed we will run the following.

az ml data create --type mltable --name "diabetes-dev-folder" --path ./experimentation/data/

This now adds our MLTable data asset to our workspace navigate to your resource group and Azure ML Studio -> Data to view this.

For the second part of the setup we will need a service principal authentication to put in our Github Secrets this is in the sp.sh

#!/bin/bash

# Fetch subscription ID
SUBSCRIPTION_ID=$(az account show --query id --output tsv)

# Define the resource group (update this with your actual resource group name or method to fetch it)
RESOURCE_GROUP="your-resource-group-name"  # Replace this with the actual resource group or a method to fetch it

# Define the service principal name (hardcoded or fetch as needed)
SERVICE_PRINCIPAL_NAME="<service-principal-name>"  # Replace with the actual service principal name

## Placeholder ## you'll need to replace values with sub id/rg these are dynamic so keeping blank for security reasons
echo "Creating SP related to RG housing ML Workspace"

az ad sp create-for-rbac --name $SERVICE_PRINCIPAL_NAME --role contributor \
  --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP

To be clear don’t commit any sensitive values to your repository this ensures the subscription id is grabbed from the Azure CLI dynamically and the only hard-coded values are the resource group which is dynamic on the random prefix from the setup script. Once you update the resource group and the service principal name then we can execute.

chmod +x sp.sh

The output of this which I will omit is sensitive so ensure you keep this private and delete after running.

{
 "appId": "example",
 "displayName": "example",
 "password": "example",
 "tenant": "example",
}

Now we will navigate to your repository if you’ve forked this repository in Github and you’ll select the tab Settings -> Security -> Secrets and Variables.

This will be a bit tedious but you’ll have to update the values one by one and it should look like this.

Of if you want this all in one you can use the json structure from above and run a Azure_Credentials.

Now we have to update the src/job.yml this will house some values that will run our Azure ML job.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
code: model
command: >-
  python train.py
  --training_data ${{inputs.training_data}}
  --reg_rate ${{inputs.reg_rate}}
inputs:
  training_data: 
    type: uri_folder 
    path: azureml://datastores/experimentation/data/diabetes-dev.csv #This will change to your unique URI
  reg_rate: 0.01
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest #Change to curated environment if required
compute: azureml:cpu-cluster # Change to your cluster name
experiment_name: diabetes-training
description: train-model

To grab the Azure ML Data Asset path you’ll navigate to the Azure Machine Learning Studio.

You’ll see the asset noted here it will have the nomenclature we disclosed earlier.

Inside the source you’ll see the right hand side listing the data source this will house the URI required.

Once you’ve updated the data source the file should look similar to this file I had to change a few items related to the cluster creation.

Located in the /src/job.yml this list the following we are using the train.py (already fixed and defined) and takes arguments training_data, reg_rate.

After running the workflow manually via Github to test the connection it appears the credentials if you run it in a single file have to be structured as this.

{
    "clientSecret":  "******",
    "subscriptionId":  "******",
    "tenantId":  "******",
    "clientId":  "******"
}

Once I’ve fixed this we will navigate into Github and go to .github/workflows folder in the repository.

Select the 02-manual-trigger-job.yaml this will then on the right hand side have a button View Runs you’ll select this.

You’ll see the Run workflow and the drop down will have a few options we will select Run workflow on branch Main.

The issue I ran into was the direct run of Azure ML Datastore URI I’ve added this as a secret to the repository and refactored the workflow.

Then the workflow using sed to replace this value to inject the values at runtime.

This appears to work when we trigger the job and the workflow now succeeds.

The only values that I’d recommend having more secure would be dynamically having the resource group and workspace if triggering via CLI although these aren’t as sensitive since we’ve authenticated and authorized to our account these can likely be added as a enclave or set up prior to the run similar to how I used the sed.

Inside our Azure ML Workspace you’ll see the job is now appearing and the Created by will annotate Service Principal for the method.

Since the compute cluster is request this has to size to run the job as shown below.

Completion of Job

After the job runs you’ll see the update to the job moving from Running to Completed.

If you navigate to the Metrics we can see the training data metrics that have been produced.

Under the output + logs tab you’ll see the logs from the user that have the visuals.

Summary

Integrating GitHub Actions with Azure Machine Learning is one way of extending the use of training jobs submitted to your workspace, ideally you want to review the inputs that are submitted prior to just a commit. As a best practice triggering a workflow by pushing directly to the repo is not considered a best practice