Microsoft Fabric – NHS data processing… part 7, the semantic model and DAX

person pointing on paper
Photo by Lukas on Pexels.com

So we’ve got our warehouse ready, with a nice star schema structure. Let’s get working with PowerBI and see what we can do. Remember from our earlier posts, we’re looking to answer questions that the business has regarding its data. So we need to bear in mind who we’re creating dashboards and their related semantic models for…

Before we can do any visualisations though, first we need to create a semantic model that PowerBI can use to read the data in the warehouse. Note that by default Fabric will automatically create a semantic model for you for every lakehouse or warehouse. However, it might not be the structure you want, plus there are some restrictions with editing the default model, like renaming columns. Instead we’ll create a fresh new separate semantic model which we can use with PowerBI for this example.

In Fabric, change to ‘persona’ to ‘Power BI’ by clicking the icon at the bottom left of the browser window:-

Now, if we go to our workspace and click the ‘+ New’ button at the top of the page, we’ll get the following options which include ‘Semantic model’, this is what we want…

Adding all the dimensions and fact tables from our data warehouse to the model, we end up with a model that looks something like this…

Don’t forget to create relationships between the fact table and each dimension table, by dragging the dimension key field in the fact table to the respective dimension table key field. In addition, we want to make sure each relationship has the ‘Assume referential integrity’ box checked (see below), otherwise the resulting queries generated by PowerBI will be made with LEFT joins instead of more efficient INNER joins!

yellow tape measure
Photo by Rodolfo Clix on Pexels.com

What we need to do now is create any measures that we’ll need for our visualisations. This is where we need to use some simple DAX. You need to know or have some idea of the types of visualisations you’ll want to create, as this will influence the need for certain measures. We’ll need to create a number of similar ones for various situations in our eventual dashboard, but also note that we’ll need to create some that only apply at the end of a chosen period – this are usually referred to as semi-additive measures, see this great post by SQLBI here for a good description and related DAX. If we don’t create these we could get our aggregations adding up data over an entire period of time instead of only at the end of the period. Sometimes we might want it one way, or the other, or even both… I’ve also placed these in clear display folders so it doesn’t get too messy.

Also note that I’ve hidden any measures or fields in all tables that should not be visible or available to end users or dashboard designers. This is always a good practice to keep your semantic model clean, simple and easy to understand and use. Remember, we want the model to be almost so easy that anyone without any knowledge can look at it and grasp what does what. If your model doesn’t do that, it needs improvement. I personally like to think about how an executive at the business might see it… if it doesn’t make instant sense to them, they’ll just get frustrated… 🤦‍♂️

The first and most important measure of course is the number of patients waiting. This one is just a SUM of the field that contains the number of patients waiting (duh!).

Total Number Of Patients = SUM('Patients Waiting Monthly'[Number Of Patients Waiting])

For the ‘at period end’ calculations there are a number of methods we can use. In this example I’ve used the LASTNONBLANK function get the last date for the specified measure so that we’re calculating only on that date, depending on what period is currently selected.

Total Number Of Patients (At Period End) = CALCULATE(
    [Total Number Of Patients],
    LASTNONBLANK(
        'Calendar'[Date],
        [Total Number Of Patients]
    )
)
text on letter board
Photo by Leeloo The First on Pexels.com

Last but not least, one thing I like to do is rename visible fields from the Pascal case we’ve used in the database, to normal descriptive English with spaces etc… Why bother? The reason I recommend doing this is that often your semantic model will or could be exposed to users to perform their own queries or create reports, but when someone drags a field onto the canvas for a report you want it to look professional. So do you want someone being able to create a slicer with the value ‘ThisIsSomeFieldInPascalCase’ or do you think that ‘This is some field in Pascal case’ looks more professional? Of course they could do the renaming themselves, but are you really going to rely on them to do this? Should they even have to?

What, do I have to manually rename each damn field?!?

In a previous blog post (see here) I showed how you could use some M and PowerQuery (in PowerBI desktop) to automatically transform the column names to descriptive English with spaces, however that isn’t something we can do here. Although, saying that, it would be a nice future feature for Fabric… I remember the old multidimensional cubes in SSAS where Visual Studio would automatically split your column names and insert spaces for you if you used Pascal case or snake case, I miss those days… 😎

However, in this case, yes, your only choices are either manually renaming each and every column or using a 3rd party tool to ‘script’ the change for you. I’ll maybe do another follow up post on how you could do this…

crop male employee writing on whiteboard in office
Photo by Malte Luk on Pexels.com

Ok, sorry, just one more thing… Whilst building semantic models, remember, you’re still in world of data warehousing… so you want any common changes as close to the source (but not in the actual source source data!) so that people further on down the line don’t have to each make the same (sometimes tedious) changes. Same goes for measures, dimensions, almost everything. I’ve been at places where several people have written the same measure calculations several different ways in several places, each with different results! 🤦‍♂️

So, do it in one place, if you can… obviously report-time only measures such as percentages cannot be pushed to the warehouse, you could only provide numerators and denominators in the warehouse and then create the percentage measure in the semantic model. But if a calculation, or even calculated column, can be pushed to the warehouse instead of being calculated in the semantic model, that’s where it should be. If nothing else, it should almost always improve performance as the calculation will already have been pre-calculated in the warehouse layer.

Right, ok, I’ll stop there… In the next post we’ll look at how we can create some simple visualisations for a ‘referral to treatment’ dashboard 🫡