REST API – Aventius

I recently needed to rename a file in a datalake, ideally I wanted to do this using Azure data factory as it was part of a large import running in data factory. Searching the web there are quite a few posts about how to use the ADLS gen2 REST API to rename files in a data lake, however I struggled to find any examples on how to do this using Azure data factory and a web activity. I got stung on a number of things in data factory whilst trying to set this up, mostly related to authorisation and the odd missing slash! So I figured I’d do a quick post here to help anyone else encountering similar issues and hopefully save other some of the pains I experienced 😉

Before we go any further, for reference purposes we’re using the ADLS data lake gen2 REST API, more specifically the ‘path create’ method. See the link here to the Microsoft reference documentation https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/create

Okay, now that’s out of the way, lets look at the authorisation part. The easiest way I found to do this using data factory is to just use the managed identity of the data factory you’re doing this with. To do this you need to go to your data lake in question and add the relevant data factory identity as a ‘storage blob data contributor’ so it can read/write/create the files. On my particular lake the data factory also has ‘contributor’ permissions as its doing a lot more other stuff than just renaming files.

Now back to data factory, creating a new pipeline I’ve added the following parameters for this example shown below. Hopefully the names are self explanatory! So you might have the following parameter values for example (obviously I’ve just made these up, but you get the idea):-

StorageAccountName = somestorageaccount
ContainerName = somecontainername
Directory = some/folder/path
OldFileName = exist-file.csv
NewFileName = renamed-file.csv

As you can see above, we’ve added a ‘web activity’ onto the pipeline canvas. I’ve called mine ‘Rename file’ but of course you could name it whatever you like. Next lets change some of the web activity’s settings:

URL = this is the full URL of the new file in the storage account, it is NOT the location of the current existing file!
Method = this is the web method for the request, we need to use ‘PUT‘
Body = this particular request does not require a body, however data factory complains about an empty body so we use a workaround. Click the ‘add dynamic content‘ under the Body value and set the expression to be @toLower(”) then this will produce an empty body and stop data factory complaining 🙂
Authentication = set this to use the ‘System Assigned Managed Identity‘ as the this will make the web request to the API as the data factory’s own managed identity (make sure you’ve added permissions for the data factory’s account on the datalake storage account as detailed at the start of this post!). Using the managed identity we don’t need to worry about getting or setting tokens and all that jazz
Resource = set this to ‘https://storage.azure.com‘
Headers = we can get away with just a single header here, the ‘x-ms-rename-source‘ header. This is the folder location (not URL) of the current existing file (i.e. the file name and directory path of the file you want to rename). In addition, and although not technically required for the call to work, we maybe should add the API version header in case things change in the future. If you want to do this, add another header called ‘x-ms-version‘ and set the value to the current version shown in the Microsoft documentation link given previously (at the time of writing the API version is ‘2021-10-04’)

Now we’ll take a look at the URL setting and the dynamic expression we’re using the build the URL:-

Now, we need another slightly different expression for the ‘x-ms-rename-source‘ header value, this is not a URL value, it’s just the directory location and file name of the existing file (but it includes the name of the container too). See below:

That’s it. As long as you’ve set the correct permissions (see top of this blog post for instructions) for the data factory to access the data lake and got the file locations and URL’s correct you should be good to go. For convenience I’ve pasted the entire pipeline JSON code below (obviously change the parameters to whatever your storage account, folder and file names are for your purposes):-

{
    "name": "rename file in azure storage using API",
    "properties": {
        "activities": [
            {
                "name": "Rename file",
                "type": "WebActivity",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "method": "PUT",
                    "headers": {
                        "x-ms-rename-source": {
                            "value": "/@{pipeline().parameters.ContainerName}/@{pipeline().parameters.Directory}/@{pipeline().parameters.OldFileName}",
                            "type": "Expression"
                        }
                    },
                    "url": {
                        "value": "https://@{pipeline().parameters.StorageAccountName}.dfs.core.windows.net/@{pipeline().parameters.ContainerName}/@{pipeline().parameters.Directory}/@{pipeline().parameters.NewFileName}",
                        "type": "Expression"
                    },
                    "body": {
                        "value": "@toLower('')",
                        "type": "Expression"
                    },
                    "authentication": {
                        "type": "MSI",
                        "resource": "https://storage.azure.com"
                    }
                }
            }
        ],
        "parameters": {
            "StorageAccountName": {
                "type": "string",
                "defaultValue": "yourstorageaccount"
            },
            "ContainerName": {
                "type": "string",
                "defaultValue": "some-container"
            },
            "Directory": {
                "type": "string",
                "defaultValue": "some-folder/another-folder"
            },
            "OldFileName": {
                "type": "string",
                "defaultValue": "original-file-name.txt"
            },
            "NewFileName": {
                "type": "string",
                "defaultValue": "new-named-file.txt"
            }
        },
        "annotations": []
    }
}

Hope that helps someone, let me know any comments if I’ve missed anything, any bugs or improvements, or if you found this useful. See you next time…

Tag: REST API

Azure data factory – rename a file in the data lake using the gen2 REST API