Azure data factory – rename a file in the data lake using the gen2 REST API

I recently needed to rename a file in a datalake, ideally I wanted to do this using Azure data factory as it was part of a large import running in data factory. Searching the web there are quite a few posts about how to use the ADLS gen2 REST API to rename files in a data lake, however I struggled to find any examples on how to do this using Azure data factory and a web activity. I got stung on a number of things in data factory whilst trying to set this up, mostly related to authorisation and the odd missing slash! So I figured I’d do a quick post here to help anyone else encountering similar issues and hopefully save other some of the pains I experienced 😉

Before we go any further, for reference purposes we’re using the ADLS data lake gen2 REST API, more specifically the ‘path create’ method. See the link here to the Microsoft reference documentation https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/create

Okay, now that’s out of the way, lets look at the authorisation part. The easiest way I found to do this using data factory is to just use the managed identity of the data factory you’re doing this with. To do this you need to go to your data lake in question and add the relevant data factory identity as a ‘storage blob data contributor’ so it can read/write/create the files. On my particular lake the data factory also has ‘contributor’ permissions as its doing a lot more other stuff than just renaming files.

Now back to data factory, creating a new pipeline I’ve added the following parameters for this example shown below. Hopefully the names are self explanatory! The only things to note are that the file path parameters should not contain the name of the container/filesystem, these are just the paths underneath. So you might have the following parameter values for example:-

  • StorageAccountName = mydatalake
  • ContainerName = somecontainername
  • OldFilePath = somefolder/subfolder/myfile.csv
  • NewFilePath = somefolder/subfolder/newfilename.csv

As you can see above, we’ve added a ‘web activity’ onto the pipeline canvas. I’ve called mine ‘Rename file in datalake’ but of course you could name it whatever you like. Next lets change some of the web activity’s settings:

  • URL = this is the full url of the new file in the data lake, NOT the location of the current existing file!
  • Method = this is the web method for the request, we need to use ‘PUT
  • Body = this particular request does not require a body, however data factory complains about an empty body so we use a workaround. Click the ‘add dynamic content‘ under the Body value and set the expression to be @toLower(”) then this will produce an empty body and stop data factory complaining 🙂
  • Authentication = set this to use the ‘System Assigned Managed Identity‘ as the this will make the web request to the API as the data factory’s own managed identity (make sure you’ve added permissions for the data factory’s account on the datalake!) and we don’t need to worry about getting or setting tokens and all that jazz
  • Resource = set this to ‘https://storage.azure.com
  • Headers = we can get away with just a single header here, the ‘x-ms-rename-source‘ header. This is the folder location of the current existing file. In addition, and although not technically required for the call to work, we maybe should add the API version header incase things change in the future. If you want to do this add another header called ‘x-ms-version‘ and set the value to the current version shown in the Microsoft documentation link given previously (at the time of writing the API version is ‘2021-10-04’)

Now we’ll take a look at the URL setting and the dynamic expression we’re using the build the URL. Here I’ve used the ‘concat’ function to build the string value we need:

Now, we need another slightly different expression for the ‘x-ms-rename-source‘ header value, this is not a url value, it’s just the location of the existing file (but it includes the name of the container too). See below:

That’s it. As long as you’ve set the correct permissions for the data factory to access the data lake and got the file locations and URL’s correct you’re good to go. Hope that helps someone, let me know any comments if I’ve missed anything or if you found this useful. See you next time…