Welcome back! This post is the final part of a series walking through how to set up a basic copy data pipeline in Azure Data Factory. Thus far we have achieved the following:
- Created a SQL Database Linked Service in Azure Data Factory
- Created an Azure Data Lake Gen2 Storage Account Linked Service
- Created an Azure SQL Database Dataset
- Created a Data Lake Storage Gen2 Dataset
In our final step, we will create a pipeline which will utilize the datasets and linked services created in the past posts to copy data from a SQL Server Table to a parquet file, stored inside of an Azure Data Lake Storage Account Gen2.
What is a Pipeline, and Where Does it Fit in?
A pipeline is logical grouping of activities. An activity utilizes one or more datasets to produce one of more datasets. Pipelines can be used to Extract and Load your data. This blog will explore how to configure a pipeline to use the two linked services and datasets we created in earlier posts.
Creating an Empty Pipeline
- Navigate to the ‘Author’ Tab and expand the ‘Pipelines’ dropdown.
- Select the folder you would like to create a pipeline under, as we’ll create it under the ‘Training’ sub-directory. Click the ellipses next to the folder name and click ‘new pipeline.’
A pipeline is logical grouping of activities, while an activity utilizes one or more datasets to produce one of more datasets.
3. Name you pipeline. It should start with ‘pl_<<Name>>.’ For our example we will name our pipeline pl_Training_Cars.
- Within the empty pipeline you created, navigate over to the Activities pane, and drag over the ‘Copy Data’ activity.
- Name your activity appropriately and the select the Source tab. We will now make a new dataset for our SQL Table in our Training Database.
3. Now we add the SQL server dataset we created previously to the source tab in our copy data. Once loaded, the source tab enables you to run a specific query or pull the whole table.
4. Next, Tab over to the sink tab within the pipeline, and select your Data Lake parquet dataset as the sink dataset within your copy activity.
5. Your pipeline is now configured. If you click “Debug” on the top ribbon, your pipeline should be validated and queue up to be ran. Notice, on the output dialog, you can inspect the details, input, and output of a pipeline while it runs.
6. If you need to define the specific schema for your copy activity click the Mapping tab under the copy activity and select ‘Import Schema.’ ADF will automatically import the schema from your linked service.
If you would like to explore options in partnering with Tallan to help build out your businesses cloud data analytics platform, please reach out to me at Conner.Wulf@tallan.com or connect on LinkedIn.
Click here to view all of Tallan’s latest offerings, and find what’s right for your organization.