Batch Jobs in Azure OpenAI

Introduction

In the existing landscape of Generative AI, optimizing API submissions is crucial for both cost and performance. Whether you’re fine-tuning token usage or streamlining context-aware requests using Retrieval-Augmented Generation (RAG), finding the right tools can make a significant difference.

One of the most promising solutions is the Azure OpenAI Batch API, designed specifically for handling large-scale and high-volume processing tasks. This API offers a significant advantage by providing up to a 50% discount on processing costs, making it an attractive option for businesses looking to optimize their AI workflows.

Example code used in this is provided here.

  • East US
  • West US
  • Sweden Central

The supported models for Global Batch are the following

ModelVersionSupported
gpt-4o2024-05-13Yes (text + vision)
gpt-4o-mini2024-07-18Yes (text + vision)
gpt-4turbo-2024-04-09Yes (text only)
gpt-40613Yes
gpt-35-turbo0125Yes
gpt-35-turbo1106Yes
gpt-35-turbo0613Yes
This is supported from documentation listed here

Prerequisites

  • Azure Subscription
  • Python 3.8 or later
  • Library OpenAI installed (this will be in the code)
  • Jupyter Notebooks
  • Azure OpenAI resources with model deployment of ‘Global-Batch’ deployed.

Enabling a deployment of Azure OpenAI with ‘Global-Batch’ this is shown below when we are in Azure AI Studio and ‘Deployments’.

We select the base-model then I’m going with gpt-4o-mini but you can select from the table listed above in the supported regions.

Once this is deployed we are able to work with our code in the Jupyter Notebook with our credentials whether that is authenticated via Entra ID or via API Key.

Interacting with the batch jobs will require a .jsonl (JSON lines) file similar to the use in the fine-tuned deployments.

{"custom_id": "request-1", "method": "POST", "url": "/chat/completions", "body": {"model": "<deployment-name>", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"},{"type": "image_url","image_url": {"url": "https://www.investopedia.com/thmb/g6tXiy8r7-C1D2hmfq64HCGqZps=/4835x1792/filters:no_upscale():max_bytes(150000):strip_icc()/AStockTicker3-b2e09bfee6254daca63b0374104144fc.png"}}]}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/chat/completions", "body": {"model": "<deployment-name>", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"},{"type": "image_url","image_url": {"url": "https://ca-times.brightspotcdn.com/dims4/default/f2a296b/2147483647/strip/true/crop/4032x3024+0+0/resize/1200x900!/format/webp/quality/75/?url=https%3A%2F%2Fcalifornia-times-brightspot.s3.amazonaws.com%2F00%2F57%2F6b62e0e84a738cd3ac7c6d45b77a%2Fla-me-abcarian-column.jpg"}}]}],"max_tokens": 1000}}

Notice this is going to be a little long I ran a simple search for the following images one is a stock ticker symbol break-down another is a dog sleeping on a bed.

Example of the image located at the following link

It appears this ran into issues assuming the truncated url so in that event I’ve reverted to the test.jsonl.

Once this runs we can see the id that is generated and held from the file_id.

Using the sample code from the docs for tracking the next lines of code in our jupyter notebook will look like this.

# Wait until the uploaded file is in processed state
import time
import datetime 

status = "pending"
while status != "processed":
    time.sleep(15)
    file_response = client.files.retrieve(file_id)
    status = file_response.status
    print(f"{datetime.datetime.now()} File Id: {file_id}, Status: {status}")

So now that we’ve submitted our file that we want to use for our batch API and it’s processed we have to send it to the API.

# Submit a batch job with the file
batch_response = client.batches.create(
    input_file_id=file_id,
    endpoint="/chat/completions",
    completion_window="24h",
)

# Save batch ID for later use
batch_id = batch_response.id

print(batch_response.model_dump_json(indent=2))

Since this is running in a notebook we won’t have to call Azure OpenAI as this is already chained to the pre-existing code we’ve ran notice we are accessing the client.batches.create()

The output should have a time stamp and and it will update with a status as shown below.

So for tracking what is happening in the background we are going to run the following code in our notebook (this is from the documentation).

import time
import datetime 

# What status is current? We saw that this is validating once sent, the while statement is looking for three conditions with a timer.
status = "validating"
while status not in ("completed", "failed", "canceled"):
    time.sleep(60)
    batch_response = client.batches.retrieve(batch_id)
    status = batch_response.status
    print(f"{datetime.datetime.now()} Batch Id: {batch_id},  Status: {status}")

I’ve added comments to the code to help you understand the syntax further but we are searching for three conditions in a while loop we start at ‘validating’ which is returned in our last code as the image shows. The following table is cited in this link.

StatusDescription
validatingThe input file is being validated before the batch processing can begin.
failedThe input file has failed the validation process.
in_progressThe input file was successfully validated and the batch is currently running.
finalizingThe batch has completed and the results are being prepared.
completedThe batch has been completed and the results are ready.
expiredThe batch wasn’t able to be completed within the 24-hour time window.
cancellingThe batch is being cancelled (This may take up to 10 minutes to go into effect.)
cancelledthe batch was cancelled.
Source documentation linked above

Run the following code in a separate code block. After you’ve received the ‘completed’ status.

print(batch_response.model_dump_json(indent=2))

So a part of the first batch job submitted it stated ‘completed’ however the run failed therefore this didn’t produce any output of the desired vision text. I’ve resubmitted a job to test this out further and see if the response is completed with a successful run.

import json

# Takes the file response id
file_response = client.files.content(batch_response.output_file_id)
raw_responses = file_response.text.strip().split('\n')  
# Runs a for loop across the raw responses and puts out the json response and formatted json
for raw_response in raw_responses:  
    json_response = json.loads(raw_response)  
    formatted_json = json.dumps(json_response, indent=2)  
    print(formatted_json)

Produces the following for our image.

So it does detect the image with accuracy using the vision capability of GPT-4o-mini additionally, in the job I’ve submitted a image of a stock ticker symbol.

If you want to see visually the batch jobs in the Azure AI Studio it shows on the left hand side by tools.

Conclusion

If you’re trying to consider a alternative to submitting jobs that might not have the upmost need for immediate latency I could see batch jobs being your go-to for various tasks and operations. This could be event driven such as if a file changes on a storage account, this triggers a batch job weekly with a analysis on the latest file of the output given the context window is quite large you could also use this for a number of image processing functions given its multi-modal. Cost will play a important part of positioning where and when you want to process inputs for your users/application this can be a area of exploration further.