Azure Data Factory Moving Data into CosmosDB

Azure Data Factory is Microsoft’s Service for transferring data but also building (Extract, Load, Transform) of Data or what is also known as Extract, Transform, Load. However, this is one way of getting data into your CosmosDB DB and respective containers for this post, I’ll deploy two items

  • Azure Cosmos DB Account (SQL Core API)
  • Azure Data Factory
  • Azure CosmosDB SQL (Populated with Sample Data)

A few things on resources in Azure, some services are what are known as Global resources such as Azure Front Door what these terms mean that that specific service won’t have limitations of the regions deployed not all services are global so keep that in mind when you’re strategizing your solutions and also what type of latency/performance is outlined by your SLA.

So what is shown here is the Azure Cosmos DB Account (Core SQL API) and the SQL DB with the SQL Server. Now to create Azure Data Factory and I’ll show how that view is once we’re up and running.

az datafactory create --factory-name "CosmosDBWest-CA" --resource-group "CosmosDBProd-WR"

You’re output as you notice is “WestUS” this is because the declared –resource-group you see is based on West US so the data factory deploys to that region.

CLI Output

Once we open our resource we’ve created you’re screen will look like the below image we are selecting the option Azure Data Factory Studio.

Resource of Azure Data Factory

As you can see the pipeline I’ve created I’ve named as “CosmosDB” now we will add our data source and our (sink) will be our output the CosmosDB Account.

After we drag from the “Move and Transform” Copy Data we’ll start populating the source with the settings.

If you’ve set up your Azure CosmosDB with a SQL Username/Password like me you’ll go to the Source and input that data and put our Sink as the CosmosDB.

Think of this communication as authorizing the access via the Azure Subscription/Server Name and DB you’re going to use, the Authentication type has a few options for this we are using SQL Authentication.

Once the data is mapped I’ve wrote a small statement to extract from the table.

Once satisfied select “Publish All”

Now you will select “Trigger Now” to run the pipeline

Now the data is moving to our sink being the Azure Cosmos DB Account let’s navigate to the Data Explorer.

Back in the Data Explorer in CosmosDB you’ll see the data we’ve queried.

I’ve selected one of the resources to show what is in the data itself as you see this is JSON formatted.

You’ve just transferred your data in CosmosDB now it’s time to start interacting with the data in how you plan on optimizing this database for scale.

In this you’ve been exposed to using a pipeline via Azure Data Factory and also utilizing the access from our Serverless Azure SQL Database and transferring the sample data to a NoSQL Azure Cosmos DB Account (Core SQL API).

References: https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal

https://docs.microsoft.com/en-us/cli/azure/datafactory?view=azure-cli-latest#az-datafactory-create