Querying an Azure SQL database from a Azure Synapse notebook (using managed identity)

question mark on chalk board — Photo by Pixabay on Pexels.com

Recently I was playing around with Synapse notebooks trying to query an Azure SQL database. There’s a fair bit of documentation on how to do this using usernames/passwords and service principals (i.e. app registrations). However, there’s not much out there if you’d like to use managed identity. It was much more of a faff than I expected trying find out what to use to connect, but after some playing around I found a couple of methods that seem to be reasonably straightforward – at least IMHO.

As stated above, I wanted to connect using Azure Active Directory authentication (or bloody ‘Microsoft Entra’ as its now been renamed, why Microsoft, why?!?). We can do this from a notebook using the Spark utilities and mssparkutils.credentials.getToken() with the audience parameter value of “DW” (see here for the official documentation). So it boils down to the following pyspark code:-

# Set our server and database names
server_name = "some-sql-server-here.database.windows.net"
database_name = "SomeDatabaseName"

# We're connecting using the SQL server driver ;-)
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

# Build our connection string
jdbc_url = f"jdbc:sqlserver://{server_name}:1433;database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"

# Get Azure AD (sorry, Entra ID) for the workspace managed identity, see the docs at this link below for more (and
# a list of the different 'audience' parameter values you can pass. For Synapse SQL or Azure SQL, we use "DW"
# https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-python#mssparkutilscredentialsgettoken
token = mssparkutils.credentials.getToken("DW")

# Build our query
query = "(SELECT TOP 1 * FROM [SomeSchema].[SomeTable]) as table_query"

# Set the connection properties
connection_properties = {
    "driver" : driver,
    "accessToken" : token
}

# Run the query
df = spark.read.jdbc(url = jdbc_url, table = query, properties = connection_properties)

# Display output (if you like)
display(df)

Or using an alternative syntax, depending on your preference:-

# Set our server and database names
server_name = "some-sql-server-here.database.windows.net"
database_name = "SomeDatabaseName"

# We're connecting using the SQL server driver ;-)
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

# Build our connection string
jdbc_url = f"jdbc:sqlserver://{server_name}:1433;database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"

# Get Azure AD (sorry, Entra ID) for the workspace managed identity, see the docs at this link below for more (and
# a list of the different 'audience' parameter values you can pass. For Synapse SQL or Azure SQL, we use "DW"
# https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-python#mssparkutilscredentialsgettoken
token = mssparkutils.credentials.getToken("DW")

# Build our query
query = "SELECT TOP 1 * FROM [SomeSchema].[SomeTable]"

# Run and load results into a data frame
df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("driver", driver)
    .option("query", query)
    .option("accessToken", token)    
    .load())

# Display output (if you like)
display(df)

photo of a we have more signage — Photo by RDNE Stock project on Pexels.com

However, there is also another method we can use, which again involves using the Spark utilities package. This uses mssparkutils.credentials.getPropertiesAll() function to read the properties of a linked service. Just pass the name of the linked service as the parameter. This method means we can (or could) effectively use a linked service to parameterise our code (yes, I know I could just pass parameters to the notebook, but anyway…).

positive senior man in eyeglasses showing thumbs up and looking at camera — Photo by Andrea Piacquadio on Pexels.com

I personally quite like this approach (I like both actually) as it centralises the connection details in a linked service that might be used for several other things besides your notebooks. So in the spirit of not repeating yourself, it seems like a nice way to go if this is your architecture. Using this approach also means that you don’t need to pull in connection parameters from somewhere (like a metadata or configuration database) into a data factory pipeline and then pass them to the notebook. Although saying that, you might want to pass the name of the linked service to the notebook (or not) as a parameter.

It might not suit all situations, such as if you’re using a parameterised linked service, but it could easily be modified to pass parameters to the notebook as normal. Anyway, on to the code…

# We'll use json library functions to read the linked service properties
import json

# Need to create an Azure SQL linked service first if you haven't already, then
# just specify its name here ;-)
linked_service_name = "Name of your linked service"

# Get access token to access the server using Entra (Azure AD) authentication with the managed identity 
# of the Synapse workspace. If we don't connect using this method then we'd need to connect with a 
# username/password (or use service principal with app id etc...), so this is more secure). For more 
# information on this see this link below:-
#
# https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#credentials-utilities
#
# Note:
# When creating the linked service you'll need to make sure under the 'Version' option that you choose 'legacy' 
# instead of 'recommended' otherwise a 'connection-string' property won't be created in the underlying 'JSON', and 
# this needs that property to work!
access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)

# Get all the properties of the linked service, as we'll use them to build our own connection string
# without having to specify the details in the notebook or pass them as parameters. Translate into a
# JSON object so we can easily get the property values later
linked_service_properties = json.loads(mssparkutils.credentials.getPropertiesAll(linked_service_name))

# Example properties for an SQL linked service... try a print(linked_service_properties) to see them all
#AuthType = OAuth2
#AuthKey = [REDACTED]
#Id = None
#Type = AzureSqlDatabase
#Endpoint = some-server-name-would-be-here.database.windows.net
#Database = name of your database would be here

# Set connection, server and database details
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
server_name = linked_service_properties.get("Endpoint")
database_name = linked_service_properties.get("Database")

# By default it just pulls from a named table...
table_name = "SomeSchema.SomeTable"

# ...but you could wrap a custom query as a 'table'
table_name = "(SELECT TOP 1 * FROM [SomeSchema].[SomeTable]) as table_query"

# Build the SQL database url
jdbc_url = f"jdbc:sqlserver://{server_name}:1433;database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"

# Set the connection properties
connection_properties = {
    "driver" : driver,
    "accessToken" : access_token
}

# Read the table/query into a dataframe
df = spark.read.jdbc(url = jdbc_url, table = table_name, properties = connection_properties)

# Show results
display(df)

notebook with sticker on black background — Photo by KATRIN BOLOVTSOVA on Pexels.com

You’ll also (if you haven’t already) need to add permissions to your Azure SQL database so that the managed identity of the Synapse workspace can read (or do whatever you need) the tables. This is the same as for any ~~Azure AD~~ Entra permissions on Azure SQL. Note that the code below I’ve added it to the db_owner role which will allow it read/write/drop/etc… you may not want it to have such powerful permissions, so you might want to add it to different role instead for your purposes:-

CREATE USER [Synapse workspace name here] FROM EXTERNAL PROVIDER
ALTER ROLE db_owner ADD MEMBER [Synapse workspace name here];

a book on gray sofa fabric — Photo by Content Pixie on Pexels.com

Oh, and I almost forgot, if you’re using the ‘linked service’ approach. Make sure your linked service is configured to use managed identity (otherwise it won’t work, duh!). This setting:-

Anyway, hope some of this comes in useful for someone. I’ve not tried this on Microsoft Fabric yet, but now its got workspace ‘managed’ identities this should be possible I guess (maybe with some code tweaks).

Till next time 😉

Update: Tried to get this working in Microsoft Fabric, however, the mssparkutils.credentials.getToken() function doesn’t yet support the “DW” audience required to connect to Azure SQL. Currently (at the time of writing) it only supports a small number of Fabric related audiences, see here for the list…

Querying an Azure SQL database from a Azure Synapse notebook (using managed identity)

Like this:

Related

Leave a ReplyCancel reply

Querying an Azure SQL database from a Azure Synapse notebook (using managed identity)

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Aventius