This week we are going to continue exploring Linked Services within Azure Data Factory. This is in conjunction with the Azure SQL linked service we made in a previous ‘How to Azure Data Factory’ post. Each post will build off the last until we have incrementally built an Azure Data Factory pipeline. We will ultimately be creating a Linked Service to connect to an Azure Data Lake Storage Gen2.
What is a Linked Service, and Why Does it Matter?
At its core, a Linked Service is a connection to a data source and/or destination. Azure Data Factory relies on Linked Services as the backbone to the platform. It enables the user to be able to “Bring Your Own Connection String,” and the number of options to select from is extensive. You can connect to virtually anywhere that holds data, and you can even copy a dataset straight from the URL! The Linked Service idea and support is really what makes Azure Data Factory such a powerful platform.
Azure Data Factory relies on Linked Services as the backbone to the platform.
How to Create an Azure Data Lake Storage Gen2 Linked Service
Before we begin, make sure to have your Azure Data Factory instance open.
1) First, navigate to the Manage tab in Azure Data Factory.
2) Then, select the ‘Linked Services’ tab and click the ‘New’ button.
3) A list of linked services will appear. Search and select Azure Data Lake Storage Gen2, and click ‘Create.’
4) Name your linked service appropriately. I like to start every linked service name starting with ‘ls_’ to denote the type within the name. This is useful when searching for your linked services within your pipelines.
5) Now we need to tell Azure Data Factory where our specific Gen2 Data Lake is located. You can select it through your Azure subscription or the URL for the data lake you would like to use, shown below.
6) For this instance, we’ll use a managed identity authentication method. Be sure to take note of the managed identity name and object id at the bottom of the page. Next, we will need to add the managed identity credentials to our Azure Data Lake Storage Gen2 resource within the Azure Portal.
7) Open a new browser tab and navigate to your Azure portal at https://portal.azure.com. At the Azure Data Lake Storage Gen2 resource page, select Access Control (IAM), click ‘Add,’ followed by ‘Add Role Assignment.’
8) Give the assignment the role of Storage Blob Data Contributor, assign the access to the object Data Factory and then search for your Azure Data Factory resource from your subscription. After all of those are filled out, click ‘Save’ and close your tab.
9) Switch back to the Azure Data Factory tab. After all information is entered, you can test the connection to the Azure Data Lake Storage Gen2. If the test is returned with a green indicator, simply click ‘Create.’ You have successfully created an Azure Data Lake Storage Gen2 Linked Service in Azure Data Factory!
If you would like to explore options in partnering with Tallan to help build out your businesses cloud data analytics platform, please reach out to me at Conner.Wulf@tallan.com or connect on LinkedIn.
Click here to view all of Tallan’s latest offerings, and find what’s right for your organization.