Published on: 8 March 2022
Author: Ramesh Kanjinghat
Recently I had to copy data from SQL Server and store it as XML files in Azure Data Lake Storage. I had to use XML format because the client system understands only XML format. We use Azure Data Factory (ADF) and unfortunately ADF doesn't have built in datasets to store data in XML format. We can save data in formats like AVRO, Parquet, JSON etc but XML is not an option.
It is kind of a workaround but I was able to do it using DelimitedText format. So, I thought I will share it just in case. At high level we use FOR XML clause and DeimitedText dataset.
SQL Server select statement can return data in XML format. So, I have update my select statement to use this clause. Please refer, https://docs.microsoft.com/en-us/sql/relational-databases/xml/for-xml-sql-server?view=sql-server-ver15 to learn more about SQL Server FOR XML clause .
Step 1: Create a dataset
- Go to Author > Datasets > New dataset.
- Select Azure Data Lake Storage Gen2 from source list and hit Continue.
- Select DelimitedText as format and hit Continue.
In the above screenshot you can see that XML is an option while creating a dataset. But once you create a dataset as XML format then that dataset cannot be used as Sink in the pipelines.
- Provide a Name, select appropriate Linked service.
- Provide the File path. Make sure that you only provide the folder path and not the file name. Leave the file box, number 5 in below image.
- Hit OK.
- Once the dataset is created select it and on the Connection tab select No quote character for the property Quote character .
This is very important. If we don't do this then the whole XML will be stored as a string value in the file. That is the whole xml content will be enclosed in either single quotes or double quotes, based on what you have selected.
- Publish the dataset.
Step 2: Create a pipeline
- Go to Author > Pipelines > New pipeline.
- Provide a name , if you want to change the default name.
- Drag and drop and copy activity and go to Source tab.
- Select your Source dataset (SQL Server dataset).
- Select Query as Use query option.
If it is a stored procedure then make sure that you update your SP have FOR XML clause.
- Complete rest of the fields.
- Go to Sink tab and select our DelimitedText dataset, in this case GenericFormat.
- Change File extension to .xml.
- Publish your pipeline.
Now run the pipeline. If it completes successfully then you can see an xml file created in your storage.