Large Language Models present a powerful enabler for various use-cases for most enterprises but without some form of due diligence and testing can spew some unintended responses. Content safety is a preventative mechanism that is used for Azure AI Studio and can also be tested with the Prompt-flow SDK. In this blog post I’ve going to show the use of adversarial attacks on GPT 3.5 Turbo, GPT-4o, GPT-4.
Pre-requisites
This will assume the following if you’d like to replicate this.
- Azure OpenAI Service Deployed (East US 2 Region)
- Azure AI Studio (East US 2 Region)
- Azure AI Studio Project (Acts as workspace in East US 2)
Getting Started
Once we have our resources deployed and we are able to start leveraging the API keys we can switch over to what our script will look like and I’ll break down some of the requirements.
First we need a requirements.txt
promptflow-evals==0.3.0
aiohttp==3.9.5
openai==1.30.5
We can run this in a virtual environment with a pip install -r requirements.txt after this is completed we can create a main.py as shown below.
import os
import aiohttp
from typing import List, Dict, Any
from promptflow.evals.synthetic import AdversarialSimulator
from promptflow.evals.synthetic.adversarial_scenario import AdversarialScenario
from azure.identity import DefaultAzureCredential
import asyncio
from openai import AzureOpenAI
# Azure AI project details
azure_ai_project = {
"subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
"resource_group_name": os.environ["AZURE_RESOURCE_GROUP"],
"project_name": os.environ["AZURE_WORKSPACE_NAME"],
"credential": DefaultAzureCredential(),
}
# Define the scenario
scenario = AdversarialScenario.ADVERSARIAL_QA
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project)
# Get the Azure AI project details and API Key
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
deployment = os.environ["CHAT_COMPLETIONS_DEPLOYMENT_NAME"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
# Define the function to call your endpoint -> for AzureOpenAI follow this example/pattern
async def function_call_to_your_endpoint(query: str) -> str:
client = AzureOpenAI(
azure_endpoint=endpoint,
azure_deployment=deployment,
api_key=api_key,
api_version="2024-02-01",
)
# Call the Azure OpenAI API, the exception will return if any errors occurs ensure you have all environment variables declared (EastUS2 is supported as of 6/4/2024)
try:
response = client.chat.completions.create(model=deployment, messages=[{"role": "user", "content": query}])
# Extract the generated text from the response
generated_text = [choice.message.content for choice in response.choices]
return {"generated_text": generated_text}
except Exception as e:
print(f"An error occurred: {e}")
return None
async def callback(
messages: List[Dict],
stream: bool = False,
session_state: Any = None,
) -> dict:
query = messages["messages"][0]["content"]
context = None
# Add file contents for summarization or re-write
if 'file_content' in messages["template_parameters"]:
query += messages["template_parameters"]['file_content']
# Call your own endpoint and pass your query as input. Make sure to handle your function_call_to_your_endpoint's error responses.
response = await function_call_to_your_endpoint(query)
# Format responses in OpenAI message protocol
formatted_response = {
"content": response,
"role": "assistant",
"context": {},
}
messages["messages"].append(formatted_response)
return {
"messages": messages["messages"],
"stream": stream,
"session_state": session_state
}
# Define the async function to run the simulation - this is the main function that will be called to run the simulation - its is also encouraged to use the jailbreak in a separate simulation
async def run_simulation():
outputs = await simulator(
scenario=scenario,
target=callback,
max_conversation_turns=1,
max_simulation_results=3,
jailbreak=False
)
return outputs
# Outside of the async function
outputs = asyncio.run(run_simulation())
print(outputs.to_eval_qa_json_lines())
Few items we can declare either in .env and load via dotenv or use our environment variables that interpretation is entirely up to you the core portion is the Azure AI Studio project will be the core.
A few items to note in our async def run_simulation() where we call our scenario, target, max conversation turns, and max simulation.
These are further broken down in the following parameters with documentation screenshot.
We are using the default Adversarial_QA which has a total maximum number of 1,384 simulations this dataset is to evaluate hateful and unfair content, sexual content, violent content, self-harm related content.
If we run the simulation in python with all our values declared as the environment variable this should urn and show the following for the harmful content, I’ve blurred out the response.
The simulation runs three specific turns with discriminatory remarks and sees what the returned output as you can see, we can use this to evaluate how the frontier model responds and the controls we can put in place. Ideally this is caught with some input filter/sanitization that checks prior to sending to the API.
If we re-run the simulation with the jailbreak equal to true the return output
When this is ran again we can see the following outputs which try to inject parameters that turn off. Test_mode=on/Test_mode=off.
Summary
Working with Generative AI is a large field that requires evaluation and testing for various outputs. Sanitation and checks from the input parameters that are able to test your frontier model for safety checks. Content filter in the Azure AI studio can be tailored up and down to find the correct area of filtering you’d want checked, you should also perform Reinforcement Human Learning Feedback on evaluations to assure you are securing the use of Large Language Models pushed out in production. This is a sneak peek of what is covered in depth in the Azure AI studio course currently in development.