Microsoft Fabric – NHS data processing… part 6, transforming the data

transformers sculptures under the blue sky
Photo by Soly Moses on Pexels.com

Ok, so we’ve figured out how we’re going to model this data in the previous article, now we need to actually transform the raw data into our data model. In this example we’re going to use the warehouse part of Fabric, we could of course use a lakehouse instead, but as we’ve already used lakehouses for other parts of this walkthrough I figured we’d take a look at the warehouse side of Fabric this time.

Anyone coming from an SQL Server background will be very familiar here, even more so if you’ve used Synapse dedicated SQL pools or Synapse serverless SQL before, as some of the same ‘limitations’ there also exist (at least at the time of writing) with Fabric warehouse too. Some of these limitations (again, at the time of writing) are things like no identity columns, no scalar functions, only unenforced primary and foreign keys etc…

However, some of the warehouse features which aren’t offered by the lakehouse are (at least IMHO) pretty useful:-

  • Schemas – You can separate out tables, views and procs like you would normally do in good old SQL server. Also, you can of course use security over specific schemas instead of individual tables.
  • Stored procedures – If you love your stored procs, this approach is for you. Stored procedures do not exist in lakehouses so you’ve got to code either with loads of notebooks, or store your code in tables and dynamically execute using a notebook.

That being said though, currently there are some (what I’d consider key) things which lakehouses have over warehouses, such as partitioning. Although I’d say its highly likely that particular feature will come to the warehouse in time. Also, there is no merge statement yet in the warehouse, but again I suspect that will come soon. These are two things that have simple workarounds though, so its not really a deal breaker…

Actually, my only real pain point with the warehouse at the moment is that you can’t create scalar functions yet. I’ve got loads of scalar functions from decades of SQL development, but if I wanted to use them I’d need to refactor the code into a stored procedure… Although again, I’d be surprised if this feature didn’t arrive eventually.

Anyway, enough of that. Lets fire up our good friend SQL server management studio (yes, I’ve tried Azure data explorer, but I’ve got a lot of love and history with management studio) and get building some tables, views and stored procedures! 🫡

diverse men giving fist bump in aged hallway
Photo by William Fortunato on Pexels.com

Oh actually, just a quick recap before I forget. In terms of our databases, this is what we’ve got so far. Looking at the SSMS (SQL Management Studio) object explorer for our workspace we see this…

I’ve named them appropriately, so ‘****Lakehouse’ is using lakehouse tech and ‘****Warehouse’ is using the warehouse side of Fabric. Hopefully that was obvious anyway though 😉. We’ve got our incremental, transient data arriving in the LandingLakehouse, which is then merged into full history tables in the StagingLakehouse. Now we’re going to take that data from the history tables and transform it in the DataWarehouse. This database would effectively be the ‘gold’ layer in the medallion architecture if you prefer that nomenclature.

photo of a we have more signage
Photo by RDNE Stock project on Pexels.com

You could potentially have another database after this if you like, separating out further aggregated tables and maybe also more views. A bit like the ‘platinum’ layer sometimes seen in medallion architecture. Although its likely most data would stay in same form or structure as it is in the DataWarehouse. So the natural question would be, do you copy (i.e. duplicate) existing data and aggregate further in a separate database? Or do you just stay within the DataWarehouse? Or do you even separate out data into business domain related data marts? I’ll leave that up to you, if there is a good business reason to do it, then go ahead. Remember, in data warehousing its not a crime to duplicate data. We’re often sacrificing things (e.g. processing time, duplication of data etc…) to get the best and most efficient final query performance. That’s also why we’re using Kimball and the star schema design, not because its pretty, but because of the performance benefits over other approaches when querying the data in its final form…

However, before I go on another data warehousing rant, lets get back on topic 😮

As I mentioned, in this example we’re going to use schemas to separate out our data into related areas. If you were using a lakehouse instead, you’d have to clearly name all your tables as you don’t have the luxury of schemas over there, sorry! This also means you could apply security if you needed to at a schema level, but its fine if you prefer not to use them. In terms of the actual table names themselves, my personal preference is to name the tables by their content (or by entity) rather than their content and type (e.g. DimPatient) as I like to keep the table name itself ‘pure’ and instead postfix the table type to the schema name. So, what we’re looking at then is schema/table names in the following format…

  • “Dataset Name, Table Type” . “Content Name”

Which equates to the following tables for our dataset:-

  • ReferralToTreatmentDimension.CommissionerOrganisation
  • ReferralToTreatmentDimension.ProviderOrganisation
  • ReferralToTreatmentDimension.TreatmentFunction
  • ReferralToTreatmentDimension.WaitingStatus
  • ReferralToTreatmentFact.PatientsWaitingMonthly

Again, this naming convention is purely my personal preference, feel free to use whatever convention you prefer. Either way though, remember to stick to that convention and also build processes around it. Why? This way, rather than just asking developers to use a specific naming convention, this approach should force any developers to always use the convention. If they don’t, things in the process won’t work properly during debugging and errors will get thrown. Thus, to get things working, they have no choice but to adopt the conventions you’ve agreed on. This helps prevent that old situation of everyone just using whatever conventions they like and the whole thing becoming like the wild west, with who knows how many different naming schemes in use (I’m sure you’ve seen this before).

We just need a few more tables though to complete this initial ‘build’, we’ll need the following:-

  • Calendar dimension – Virtually every data warehouse needs one of these, so we can break down data at various chronological levels, like month, quarter, year etc…
  • Week banding dimension – As discussed in the previous post, we’re going to create a dimension containing all the different week bandings (e.g. 1 to 2 weeks, 3 to 4 weeks or under 18 weeks, over 18 weeks etc…) so we can split our measures by various levels of week bandings.

What we’ll end up with is this list of tables…

We’ll create a stored per table to handle the building and processing of the table, but we’ll also create a ‘master’ stored procedure per dataset (e.g. Referral To Treatment) to process all tables for a particular dataset. This way, the data factory only needs to worry about calling a single procedure per dataset to actually build that dataset, its in the warehouse how we decide how the tables and processed.

So, below we’ll eventually have the following list of procedures:-

Lets look at some code for a few of the key tables above, essentially we’ll re-use similar code for each procedure and change the transformation part of the code where appropriate. One thing I’ve not done here which I tend to do for most data warehouse projects is create ‘code generator’ procedures to save me having to type out similar pieces of code again and again. I’ll maybe do an article on that in the future, but for this example I’ve just manually written the procedures (with a bit of copy and paste of course 😉).

First, we’ve got the calendar, note that for this I’ve actually created a view which generates the calendar values so this procedure is a simple insert/update or upsert (as there isn’t a MERGE command available yet).

CREATE PROCEDURE [Chronological].[uspProcessCalendar] AS
BEGIN
	-- Insert any new rows
	INSERT INTO
    [Chronological].[Calendar] (
		[ChronologicalCalendarKey]
		,[CalendarDate]
		,[CalendarDateTime]
		,[CalendarYear]
		,[CalendarQuarter]
		,[CalendarMonth]
		,[CalendarQuarterNameAndYearSortOrder]
		,[CalendarQuarterNameAndYear]
		,[CalendarMonthNameAndYearSortOrder]
		,[CalendarMonthNameAndYear]
		,[DayOfTheYear]
		,[DayOfTheQuarter]
		,[DayOfTheMonth]
		,[DayOfTheWeek]
		,[FiscalYearSortOrder]
		,[FiscalYear]
		,[FiscalQuarter]
		,[FiscalMonth]
		,[FiscalDay]
		,[FiscalQuarterName]
		,[FiscalQuarterNameAndYearSortOrder]
		,[FiscalQuarterNameAndYear]
		,[FiscalMonthNameAndYearSortOrder]
		,[FiscalMonthNameAndYear]
		,[FiscalYearBeginningDate]
		,[FiscalYearEndingDate]
		,[YearBeginningDate]
		,[QuarterBeginningDate]
		,[MonthBeginningDate]
		,[WeekBeginningDate]
		,[YearEndingDate]
		,[QuarterEndingDate]
		,[MonthEndingDate]
		,[WeekEndingDate]
		,[MonthNameSortOrder]
		,[MonthNameShort]
		,[MonthNameLong]
		,[DayNameSortOrder]
		,[DayNameShort]
		,[DayNameLong]
		,[NumberOfDaysInMonth]
		,[IsWeekend]
		,[IsWorkingDay]
		,[IsUKBankHoliday]
		,[IsNewYearsDay]
		,[IsGoodFriday]
		,[IsEasterSunday]
		,[IsEasterMonday]
		,[IsEarlyMayBankHoliday]
		,[IsSpringBankHoliday]
		,[IsSummerBankHoliday]
		,[IsChristmasDay]
		,[IsBoxingDay]
		,[IsLeapYear]
		,[IsPastDate]
		,[IsYesterday]
		,[IsToday]
		,[IsTomorrow]
		,[IsFutureDate]
		,[IsCurrentCalendarYear]
		,[IsCurrentCalendarQuarter]
		,[IsCurrentMonth]
		,[IsCurrentWeek]
		,[IsCurrentFiscalYear]
		,[IsCurrentFiscalQuarter]
		,[IsLastCalendarYear]
		,[IsLastCalendarQuarter]
		,[IsLastMonth]
		,[IsLastWeek]
		,[IsLastFiscalYear]
		,[IsLastFiscalQuarter]
		,[IsFirstDayOfTheWeek]
		,[IsFirstDayOfTheMonth]
		,[IsFirstDayOfTheQuarter]
		,[IsFirstDayOfTheCalendarYear]
		,[IsFirstDayOfTheFiscalYear]
		,[IsFirstDayOfTheDecade]
		,[IsFirstDayOfTheCentury]
		,[IsLastDayOfTheWeek]
		,[IsLastDayOfTheMonth]
		,[IsLastDayOfTheQuarter]
		,[IsLastDayOfTheCalendarYear]
		,[IsLastDayOfTheFiscalYear]
		,[IsLastDayOfTheDecade]
		,[IsLastDayOfTheCentury]
	)

	SELECT
	[ChronologicalCalendarKey]
	,[CalendarDate]
	,[CalendarDateTime]
	,[CalendarYear]
	,[CalendarQuarter]
	,[CalendarMonth]
	,[CalendarQuarterNameAndYearSortOrder]
	,[CalendarQuarterNameAndYear]
	,[CalendarMonthNameAndYearSortOrder]
	,[CalendarMonthNameAndYear]
	,[DayOfTheYear]
	,[DayOfTheQuarter]
	,[DayOfTheMonth]
	,[DayOfTheWeek]
	,[FiscalYearSortOrder]
	,[FiscalYear]
	,[FiscalQuarter]
	,[FiscalMonth]
	,[FiscalDay]
	,[FiscalQuarterName]
	,[FiscalQuarterNameAndYearSortOrder]
	,[FiscalQuarterNameAndYear]
	,[FiscalMonthNameAndYearSortOrder]
	,[FiscalMonthNameAndYear]
	,[FiscalYearBeginningDate]
	,[FiscalYearEndingDate]
	,[YearBeginningDate]
	,[QuarterBeginningDate]
	,[MonthBeginningDate]
	,[WeekBeginningDate]
	,[YearEndingDate]
	,[QuarterEndingDate]
	,[MonthEndingDate]
	,[WeekEndingDate]
	,[MonthNameSortOrder]
	,[MonthNameShort]
	,[MonthNameLong]
	,[DayNameSortOrder]
	,[DayNameShort]
	,[DayNameLong]
	,[NumberOfDaysInMonth]
	,[IsWeekend]
	,[IsWorkingDay]
	,[IsUKBankHoliday]
	,[IsNewYearsDay]
	,[IsGoodFriday]
	,[IsEasterSunday]
	,[IsEasterMonday]
	,[IsEarlyMayBankHoliday]
	,[IsSpringBankHoliday]
	,[IsSummerBankHoliday]
	,[IsChristmasDay]
	,[IsBoxingDay]
	,[IsLeapYear]
	,[IsPastDate]
	,[IsYesterday]
	,[IsToday]
	,[IsTomorrow]
	,[IsFutureDate]
	,[IsCurrentCalendarYear]
	,[IsCurrentCalendarQuarter]
	,[IsCurrentMonth]
	,[IsCurrentWeek]
	,[IsCurrentFiscalYear]
	,[IsCurrentFiscalQuarter]
	,[IsLastCalendarYear]
	,[IsLastCalendarQuarter]
	,[IsLastMonth]
	,[IsLastWeek]
	,[IsLastFiscalYear]
	,[IsLastFiscalQuarter]
	,[IsFirstDayOfTheWeek]
	,[IsFirstDayOfTheMonth]
	,[IsFirstDayOfTheQuarter]
	,[IsFirstDayOfTheCalendarYear]
	,[IsFirstDayOfTheFiscalYear]
	,[IsFirstDayOfTheDecade]
	,[IsFirstDayOfTheCentury]
	,[IsLastDayOfTheWeek]
	,[IsLastDayOfTheMonth]
	,[IsLastDayOfTheQuarter]
	,[IsLastDayOfTheCalendarYear]
	,[IsLastDayOfTheFiscalYear]
	,[IsLastDayOfTheDecade]
	,[IsLastDayOfTheCentury]
	
	FROM 
	[Chronological].[vwCalendarValues] [src]

	WHERE
    NOT EXISTS(
		SELECT
        1

		FROM
        [Chronological].[Calendar] [trg]

		WHERE
        [trg].[ChronologicalCalendarKey] = [src].[ChronologicalCalendarKey]
	)

	-- Update existing rows where there are differences
	UPDATE
	[trg]

	SET
	[CalendarDate]							= [src].[CalendarDate]
	,[CalendarDateTime]						= [src].[CalendarDateTime]
	,[CalendarYear]							= [src].[CalendarYear]
	,[CalendarQuarter]						= [src].[CalendarQuarter]
	,[CalendarMonth]						= [src].[CalendarMonth]
	,[CalendarQuarterNameAndYearSortOrder]	= [src].[CalendarQuarterNameAndYearSortOrder]
	,[CalendarQuarterNameAndYear]			= [src].[CalendarQuarterNameAndYear]
	,[CalendarMonthNameAndYearSortOrder]	= [src].[CalendarMonthNameAndYearSortOrder]
	,[CalendarMonthNameAndYear]				= [src].[CalendarMonthNameAndYear]
	,[DayOfTheYear]							= [src].[DayOfTheYear]
	,[DayOfTheQuarter]						= [src].[DayOfTheQuarter]
	,[DayOfTheMonth]						= [src].[DayOfTheMonth]
	,[DayOfTheWeek]							= [src].[DayOfTheWeek]
	,[FiscalYearSortOrder]					= [src].[FiscalYearSortOrder]
	,[FiscalYear]							= [src].[FiscalYear]
	,[FiscalQuarter]						= [src].[FiscalQuarter]
	,[FiscalMonth]							= [src].[FiscalMonth]
	,[FiscalDay]							= [src].[FiscalDay]
	,[FiscalQuarterName]					= [src].[FiscalQuarterName]
	,[FiscalQuarterNameAndYearSortOrder]	= [src].[FiscalQuarterNameAndYearSortOrder]
	,[FiscalQuarterNameAndYear]				= [src].[FiscalQuarterNameAndYear]
	,[FiscalMonthNameAndYearSortOrder]		= [src].[FiscalMonthNameAndYearSortOrder]
	,[FiscalMonthNameAndYear]				= [src].[FiscalMonthNameAndYear]
	,[FiscalYearBeginningDate]				= [src].[FiscalYearBeginningDate]
	,[FiscalYearEndingDate]					= [src].[FiscalYearEndingDate]
	,[YearBeginningDate]					= [src].[YearBeginningDate]
	,[QuarterBeginningDate]					= [src].[QuarterBeginningDate]
	,[MonthBeginningDate]					= [src].[MonthBeginningDate]
	,[WeekBeginningDate]					= [src].[WeekBeginningDate]
	,[YearEndingDate]						= [src].[YearEndingDate]
	,[QuarterEndingDate]					= [src].[QuarterEndingDate]
	,[MonthEndingDate]						= [src].[MonthEndingDate]
	,[WeekEndingDate]						= [src].[WeekEndingDate]
	,[MonthNameSortOrder]					= [src].[MonthNameSortOrder]
	,[MonthNameShort]						= [src].[MonthNameShort]
	,[MonthNameLong]						= [src].[MonthNameLong]
	,[DayNameSortOrder]						= [src].[DayNameSortOrder]
	,[DayNameShort]							= [src].[DayNameShort]
	,[DayNameLong]							= [src].[DayNameLong]
	,[NumberOfDaysInMonth]					= [src].[NumberOfDaysInMonth]
	,[IsWeekend]							= [src].[IsWeekend]
	,[IsWorkingDay]							= [src].[IsWorkingDay]
	,[IsUKBankHoliday]						= [src].[IsUKBankHoliday]
	,[IsNewYearsDay]						= [src].[IsNewYearsDay]
	,[IsGoodFriday]							= [src].[IsGoodFriday]
	,[IsEasterSunday]						= [src].[IsEasterSunday]
	,[IsEasterMonday]						= [src].[IsEasterMonday]
	,[IsEarlyMayBankHoliday]				= [src].[IsEarlyMayBankHoliday]
	,[IsSpringBankHoliday]					= [src].[IsSpringBankHoliday]
	,[IsSummerBankHoliday]					= [src].[IsSummerBankHoliday]
	,[IsChristmasDay]						= [src].[IsChristmasDay]
	,[IsBoxingDay]							= [src].[IsBoxingDay]
	,[IsLeapYear]							= [src].[IsLeapYear]
	,[IsPastDate]							= [src].[IsPastDate]
	,[IsYesterday]							= [src].[IsYesterday]
	,[IsToday]								= [src].[IsToday]
	,[IsTomorrow]							= [src].[IsTomorrow]
	,[IsFutureDate]							= [src].[IsFutureDate]
	,[IsCurrentCalendarYear]				= [src].[IsCurrentCalendarYear]
	,[IsCurrentCalendarQuarter]				= [src].[IsCurrentCalendarQuarter]
	,[IsCurrentMonth]						= [src].[IsCurrentMonth]
	,[IsCurrentWeek]						= [src].[IsCurrentWeek]
	,[IsCurrentFiscalYear]					= [src].[IsCurrentFiscalYear]
	,[IsCurrentFiscalQuarter]				= [src].[IsCurrentFiscalQuarter]
	,[IsLastCalendarYear]					= [src].[IsLastCalendarYear]
	,[IsLastCalendarQuarter]				= [src].[IsLastCalendarQuarter]
	,[IsLastMonth]							= [src].[IsLastMonth]
	,[IsLastWeek]							= [src].[IsLastWeek]
	,[IsLastFiscalYear]						= [src].[IsLastFiscalYear]
	,[IsLastFiscalQuarter]					= [src].[IsLastFiscalQuarter]
	,[IsFirstDayOfTheWeek]					= [src].[IsFirstDayOfTheWeek]
	,[IsFirstDayOfTheMonth]					= [src].[IsFirstDayOfTheMonth]
	,[IsFirstDayOfTheQuarter]				= [src].[IsFirstDayOfTheQuarter]
	,[IsFirstDayOfTheCalendarYear]			= [src].[IsFirstDayOfTheCalendarYear]
	,[IsFirstDayOfTheFiscalYear]			= [src].[IsFirstDayOfTheFiscalYear]
	,[IsFirstDayOfTheDecade]				= [src].[IsFirstDayOfTheDecade]
	,[IsFirstDayOfTheCentury]				= [src].[IsFirstDayOfTheCentury]
	,[IsLastDayOfTheWeek]					= [src].[IsLastDayOfTheWeek]
	,[IsLastDayOfTheMonth]					= [src].[IsLastDayOfTheMonth]
	,[IsLastDayOfTheQuarter]				= [src].[IsLastDayOfTheQuarter]
	,[IsLastDayOfTheCalendarYear]			= [src].[IsLastDayOfTheCalendarYear]
	,[IsLastDayOfTheFiscalYear]				= [src].[IsLastDayOfTheFiscalYear]
	,[IsLastDayOfTheDecade]					= [src].[IsLastDayOfTheDecade]
	,[IsLastDayOfTheCentury]				= [src].[IsLastDayOfTheCentury]

	FROM
    [Chronological].[Calendar] [trg]

	INNER JOIN
    [Chronological].[vwCalendarValues] [src]
	ON
	[trg].[ChronologicalCalendarKey] = [src].[ChronologicalCalendarKey]
	AND (		
		[trg].[CalendarDate]							<> [src].[CalendarDate]
		OR [trg].[CalendarDateTime]						<> [src].[CalendarDateTime]
		OR [trg].[CalendarYear]							<> [src].[CalendarYear]
		OR [trg].[CalendarQuarter]						<> [src].[CalendarQuarter]
		OR [trg].[CalendarMonth]						<> [src].[CalendarMonth]
		OR [trg].[CalendarQuarterNameAndYearSortOrder]	<> [src].[CalendarQuarterNameAndYearSortOrder]
		OR [trg].[CalendarQuarterNameAndYear]			<> [src].[CalendarQuarterNameAndYear]
		OR [trg].[CalendarMonthNameAndYearSortOrder]	<> [src].[CalendarMonthNameAndYearSortOrder]
		OR [trg].[CalendarMonthNameAndYear]				<> [src].[CalendarMonthNameAndYear]
		OR [trg].[DayOfTheYear]							<> [src].[DayOfTheYear]
		OR [trg].[DayOfTheQuarter]						<> [src].[DayOfTheQuarter]
		OR [trg].[DayOfTheMonth]						<> [src].[DayOfTheMonth]
		OR [trg].[DayOfTheWeek]							<> [src].[DayOfTheWeek]
		OR [trg].[FiscalYearSortOrder]					<> [src].[FiscalYearSortOrder]
		OR [trg].[FiscalYear]							<> [src].[FiscalYear]
		OR [trg].[FiscalQuarter]						<> [src].[FiscalQuarter]
		OR [trg].[FiscalMonth]							<> [src].[FiscalMonth]
		OR [trg].[FiscalDay]							<> [src].[FiscalDay]
		OR [trg].[FiscalQuarterName]					<> [src].[FiscalQuarterName]
		OR [trg].[FiscalQuarterNameAndYearSortOrder]	<> [src].[FiscalQuarterNameAndYearSortOrder]
		OR [trg].[FiscalQuarterNameAndYear]				<> [src].[FiscalQuarterNameAndYear]
		OR [trg].[FiscalMonthNameAndYearSortOrder]		<> [src].[FiscalMonthNameAndYearSortOrder]
		OR [trg].[FiscalMonthNameAndYear]				<> [src].[FiscalMonthNameAndYear]
		OR [trg].[FiscalYearBeginningDate]				<> [src].[FiscalYearBeginningDate]
		OR [trg].[FiscalYearEndingDate]					<> [src].[FiscalYearEndingDate]
		OR [trg].[YearBeginningDate]					<> [src].[YearBeginningDate]
		OR [trg].[QuarterBeginningDate]					<> [src].[QuarterBeginningDate]
		OR [trg].[MonthBeginningDate]					<> [src].[MonthBeginningDate]
		OR [trg].[WeekBeginningDate]					<> [src].[WeekBeginningDate]
		OR [trg].[YearEndingDate]						<> [src].[YearEndingDate]
		OR [trg].[QuarterEndingDate]					<> [src].[QuarterEndingDate]
		OR [trg].[MonthEndingDate]						<> [src].[MonthEndingDate]
		OR [trg].[WeekEndingDate]						<> [src].[WeekEndingDate]
		OR [trg].[MonthNameSortOrder]					<> [src].[MonthNameSortOrder]
		OR [trg].[MonthNameShort]						<> [src].[MonthNameShort]
		OR [trg].[MonthNameLong]						<> [src].[MonthNameLong]
		OR [trg].[DayNameSortOrder]						<> [src].[DayNameSortOrder]
		OR [trg].[DayNameShort]							<> [src].[DayNameShort]
		OR [trg].[DayNameLong]							<> [src].[DayNameLong]
		OR [trg].[NumberOfDaysInMonth]					<> [src].[NumberOfDaysInMonth]
		OR [trg].[IsWeekend]							<> [src].[IsWeekend]
		OR [trg].[IsWorkingDay]							<> [src].[IsWorkingDay]
		OR [trg].[IsUKBankHoliday]						<> [src].[IsUKBankHoliday]
		OR [trg].[IsNewYearsDay]						<> [src].[IsNewYearsDay]
		OR [trg].[IsGoodFriday]							<> [src].[IsGoodFriday]
		OR [trg].[IsEasterSunday]						<> [src].[IsEasterSunday]
		OR [trg].[IsEasterMonday]						<> [src].[IsEasterMonday]
		OR [trg].[IsEarlyMayBankHoliday]				<> [src].[IsEarlyMayBankHoliday]
		OR [trg].[IsSpringBankHoliday]					<> [src].[IsSpringBankHoliday]
		OR [trg].[IsSummerBankHoliday]					<> [src].[IsSummerBankHoliday]
		OR [trg].[IsChristmasDay]						<> [src].[IsChristmasDay]
		OR [trg].[IsBoxingDay]							<> [src].[IsBoxingDay]
		OR [trg].[IsLeapYear]							<> [src].[IsLeapYear]
		OR [trg].[IsPastDate]							<> [src].[IsPastDate]
		OR [trg].[IsYesterday]							<> [src].[IsYesterday]
		OR [trg].[IsToday]								<> [src].[IsToday]
		OR [trg].[IsTomorrow]							<> [src].[IsTomorrow]
		OR [trg].[IsFutureDate]							<> [src].[IsFutureDate]
		OR [trg].[IsCurrentCalendarYear]				<> [src].[IsCurrentCalendarYear]
		OR [trg].[IsCurrentCalendarQuarter]				<> [src].[IsCurrentCalendarQuarter]
		OR [trg].[IsCurrentMonth]						<> [src].[IsCurrentMonth]
		OR [trg].[IsCurrentWeek]						<> [src].[IsCurrentWeek]
		OR [trg].[IsCurrentFiscalYear]					<> [src].[IsCurrentFiscalYear]
		OR [trg].[IsCurrentFiscalQuarter]				<> [src].[IsCurrentFiscalQuarter]
		OR [trg].[IsLastCalendarYear]					<> [src].[IsLastCalendarYear]
		OR [trg].[IsLastCalendarQuarter]				<> [src].[IsLastCalendarQuarter]
		OR [trg].[IsLastMonth]							<> [src].[IsLastMonth]
		OR [trg].[IsLastWeek]							<> [src].[IsLastWeek]
		OR [trg].[IsLastFiscalYear]						<> [src].[IsLastFiscalYear]
		OR [trg].[IsLastFiscalQuarter]					<> [src].[IsLastFiscalQuarter]
		OR [trg].[IsFirstDayOfTheWeek]					<> [src].[IsFirstDayOfTheWeek]
		OR [trg].[IsFirstDayOfTheMonth]					<> [src].[IsFirstDayOfTheMonth]
		OR [trg].[IsFirstDayOfTheQuarter]				<> [src].[IsFirstDayOfTheQuarter]
		OR [trg].[IsFirstDayOfTheCalendarYear]			<> [src].[IsFirstDayOfTheCalendarYear]
		OR [trg].[IsFirstDayOfTheFiscalYear]			<> [src].[IsFirstDayOfTheFiscalYear]
		OR [trg].[IsFirstDayOfTheDecade]				<> [src].[IsFirstDayOfTheDecade]
		OR [trg].[IsFirstDayOfTheCentury]				<> [src].[IsFirstDayOfTheCentury]
		OR [trg].[IsLastDayOfTheWeek]					<> [src].[IsLastDayOfTheWeek]
		OR [trg].[IsLastDayOfTheMonth]					<> [src].[IsLastDayOfTheMonth]
		OR [trg].[IsLastDayOfTheQuarter]				<> [src].[IsLastDayOfTheQuarter]
		OR [trg].[IsLastDayOfTheCalendarYear]			<> [src].[IsLastDayOfTheCalendarYear]
		OR [trg].[IsLastDayOfTheFiscalYear]				<> [src].[IsLastDayOfTheFiscalYear]
		OR [trg].[IsLastDayOfTheDecade]					<> [src].[IsLastDayOfTheDecade]
		OR [trg].[IsLastDayOfTheCentury]				<> [src].[IsLastDayOfTheCentury]
	)
END

For the view that generates the calendar, this is the code I’m using. Note that I’ve only populated certain fields, so not all the calendar is fully populated. I still need to either refactor my old scalar functions I previously used or migrate this into the procedure with the code from the functions.

CREATE VIEW [Chronological].[vwCalendarValues] AS

WITH [cteDigits] AS (
	SELECT n FROM (
		VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
	) v(n)
), [cteNumberSequence] AS (
	SELECT TOP (DATEDIFF(DAY, '18991231', GETDATE()))
	[RowNumber] = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
	
	FROM 
	[cteDigits] [units]
	
	CROSS JOIN [cteDigits] [tens]
	CROSS JOIN [cteDigits] [hundreds]
	CROSS JOIN [cteDigits] [thousands]
	CROSS JOIN [cteDigits] [hundredthousands]
	
	ORDER BY 
	[RowNumber]
), [cteDates] AS (
	SELECT
	[DateValue] = CAST(DATEADD(DAY, [RowNumber], '18991231') AS DATE)
	
	FROM 
	[cteNumberSequence]
), [cteFiscalDates] AS (
	SELECT
	[DateValue]

	,[FiscalYear] = CAST(CASE
		WHEN MONTH([DateValue]) < 4 THEN YEAR([DateValue]) - 1
		ELSE YEAR([DateValue])
	END AS VARCHAR(4)) + '-' + RIGHT(CAST(1 + CASE
		WHEN MONTH([DateValue]) < 4 THEN YEAR([DateValue]) - 1
		ELSE YEAR([DateValue])
	END AS VARCHAR(4)), 2)
	
	,[FiscalQuarter] = CASE DATEPART(QUARTER, [DateValue])
		WHEN 1 THEN 4
		WHEN 2 THEN	1
		WHEN 3 THEN	2
		ELSE 3
	END

	,[FiscalMonth] = (DATEPART(MONTH, [DateValue]) + 8) % 12 + 1

	,[FiscalDay] = 1 + DATEDIFF(DAY, CASE
		WHEN MONTH([DateValue]) < 4 THEN CAST(YEAR([DateValue]) - 1 AS VARCHAR(4)) + '0401' 
		ELSE CAST(YEAR([DateValue]) AS VARCHAR(4)) + '0401'
	END, [DateValue])

	FROM
    [cteDates]
)

SELECT
[ChronologicalCalendarKey] = CAST(FORMAT([d].[DateValue], 'yyyyMMdd') AS INT)
    
,[CalendarDate] = [d].[DateValue]
,[CalendarDateTime] = CAST([d].[DateValue] AS DATETIME2(0))
	
,[CalendarYear] = YEAR([d].[DateValue])
,[CalendarQuarter] = DATEPART(QUARTER, [d].[DateValue])
,[CalendarMonth] = MONTH([d].[DateValue])

,[CalendarQuarterNameAndYearSortOrder] = CAST(FORMAT([d].[DateValue], 'yyyy') + CAST(DATEPART(QUARTER, [d].[DateValue]) AS VARCHAR(1)) AS INT)
,[CalendarQuarterNameAndYear] = 'Q' + CAST(DATEPART(QUARTER, [d].[DateValue]) AS VARCHAR(1)) + ' ' + FORMAT([d].[DateValue], 'yyyy')
,[CalendarMonthNameAndYearSortOrder] = CAST(FORMAT([d].[DateValue], 'yyyyMM') AS INT)
,[CalendarMonthNameAndYear] = FORMAT([d].[DateValue], 'MMM yyyy')

,[DayOfTheYear] = DATEPART(DAYOFYEAR, [d].[DateValue])
,[DayOfTheQuarter] = 0
,[DayOfTheMonth] = DAY([d].[DateValue])
,[DayOfTheWeek] = 0
    
,[FiscalYearSortOrder] = CAST(REPLACE([d].[FiscalYear], '-', '') AS INT)
,[FiscalYear] = [d].[FiscalYear]
,[FiscalQuarter] = [d].[FiscalQuarter]
,[FiscalMonth] = [d].[FiscalMonth]
,[FiscalDay] = [d].[FiscalDay]

,[FiscalQuarterName] = 'Q' + CAST([d].[FiscalQuarter] AS VARCHAR(1))
,[FiscalQuarterNameAndYearSortOrder] = CAST(REPLACE([d].[FiscalYear], '-', '') + CAST([d].[FiscalQuarter] AS VARCHAR(1)) AS INT)
,[FiscalQuarterNameAndYear] = 'Q' + CAST([d].[FiscalQuarter] AS VARCHAR(1)) + ' ' + [d].[FiscalYear]
,[FiscalMonthNameAndYearSortOrder] = CAST(REPLACE([d].[FiscalYear], '-', '') + FORMAT([d].[FiscalMonth], '00') AS INT)
,[FiscalMonthNameAndYear] = FORMAT([d].[DateValue], 'MMM') + ' ' + [d].[FiscalYear]

,[FiscalYearBeginningDate] = '99991231'
,[FiscalYearEndingDate] = '99991231'

,[YearBeginningDate] = '99991231'
,[QuarterBeginningDate] = '99991231'
,[MonthBeginningDate] = '99991231'
,[WeekBeginningDate] = '99991231'

,[YearEndingDate] = '99991231'
,[QuarterEndingDate] = '99991231'
,[MonthEndingDate] = '99991231'
,[WeekEndingDate] = '99991231'

,[MonthNameSortOrder] = MONTH([d].[DateValue])
,[MonthNameShort] = LEFT(DATENAME(MONTH, [d].[DateValue]), 3)
,[MonthNameLong] = DATENAME(MONTH, [d].[DateValue])

,[DayNameSortOrder] = DATEPART(WEEKDAY, [d].[DateValue]) -- todo
,[DayNameShort] = LEFT(DATENAME(WEEKDAY, [d].[DateValue]), 3)
,[DayNameLong] = DATENAME(WEEKDAY, [d].[DateValue])

,[NumberOfDaysInMonth] = 0

,[IsWeekend] = 0
,[IsWorkingDay] = 0
    	
,[IsUKBankHoliday] = 0
,[IsNewYearsDay] = 0
,[IsGoodFriday] = 0
,[IsEasterSunday] = 0
,[IsEasterMonday] = 0
,[IsEarlyMayBankHoliday] = 0
,[IsSpringBankHoliday] = 0
,[IsSummerBankHoliday] = 0
,[IsChristmasDay] = 0
,[IsBoxingDay] = 0
	
,[IsLeapYear] = 0

,[IsPastDate] = 0
,[IsYesterday] = 0
,[IsToday] = 0
,[IsTomorrow] = 0
,[IsFutureDate] = 0
    
,[IsCurrentCalendarYear] = 0
,[IsCurrentCalendarQuarter] = 0
,[IsCurrentMonth] = 0
,[IsCurrentWeek] = 0
    
,[IsCurrentFiscalYear] = 0
,[IsCurrentFiscalQuarter] = 0
    
,[IsLastCalendarYear] = 0
,[IsLastCalendarQuarter] = 0
,[IsLastMonth] = 0
,[IsLastWeek] = 0
    
,[IsLastFiscalYear] = 0
,[IsLastFiscalQuarter] = 0
    	
,[IsFirstDayOfTheWeek] = 0
,[IsFirstDayOfTheMonth] = 0
,[IsFirstDayOfTheQuarter] = 0
,[IsFirstDayOfTheCalendarYear] = 0
,[IsFirstDayOfTheFiscalYear] = 0
,[IsFirstDayOfTheDecade] = 0
,[IsFirstDayOfTheCentury] = 0
    
,[IsLastDayOfTheWeek] = 0
,[IsLastDayOfTheMonth] = 0
,[IsLastDayOfTheQuarter] = 0
,[IsLastDayOfTheCalendarYear] = 0
,[IsLastDayOfTheFiscalYear] = 0
,[IsLastDayOfTheDecade] = 0
,[IsLastDayOfTheCentury] = 0

FROM 
[cteFiscalDates] [d]

Next the week banding dimension, again I’ve used a view to generate the source data for this, then just created the procedure to do the ‘upsert’ like with the calendar.

CREATE VIEW [Chronological].[vwWeekTimeBandingValues] AS

WITH [cteDigits] AS (
	SELECT 
	[Digit] 
	
	FROM (
		VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
	) [v](
		[Digit]
	)
), [cteNumberSequence] AS (
	SELECT TOP (200)
	[NumberOfWeeks] = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
	
	FROM 
	[cteDigits] [units]
	
	CROSS JOIN [cteDigits] [tens]
	CROSS JOIN [cteDigits] [hundreds]

	ORDER BY 
	[NumberOfWeeks]
)

SELECT
[ChronologicalWeekTimeBandingKey] = [NumberOfWeeks] + 1
,[NumberOfWeeks]

-- Greater than bands. Logic is = greater than X weeks, up to and including X+1 weeks
,[GreaterThanOneWeekBandShortDescription] = FORMAT(1 * ([dv].[NumberOfWeeks] / 1), '00') 
	+ ' to ' 
	+ FORMAT(1 * ([dv].[NumberOfWeeks] / 1) + 1, '00')
	
,[GreaterThanOneWeekBandLongDescription] = 'Greater than '
	+ FORMAT(1 * ([dv].[NumberOfWeeks] / 1), '00') 
	+ ' to ' 
	+ FORMAT(1 * ([dv].[NumberOfWeeks] / 1) + 1, '00')
	+ ' week(s)'

,[GreaterThanTwoWeeksBandShortDescription] = FORMAT(2 * ([dv].[NumberOfWeeks] / 2), '00') 
	+ ' to ' 
	+ FORMAT(2 * ([dv].[NumberOfWeeks] / 2) + 2, '00')
	
,[GreaterThanTwoWeeksBandLongDescription] = 'Greater than '
	+ FORMAT(2 * ([dv].[NumberOfWeeks] / 2), '00') 
	+ ' to ' 
	+ FORMAT(2 * ([dv].[NumberOfWeeks] / 2) + 2, '00')
	+ ' week(s)'

,[GreaterThanThreeWeeksBandShortDescription] = FORMAT(3 * ([dv].[NumberOfWeeks] / 3), '00') 
	+ ' to ' 
	+ FORMAT(3 * ([dv].[NumberOfWeeks] / 3) + 3, '00')

,[GreaterThanThreeWeeksBandLongDescription] = 'Greater than '
	+ FORMAT(3 * ([dv].[NumberOfWeeks] / 3), '00') 
	+ ' to ' 
	+ FORMAT(3 * ([dv].[NumberOfWeeks] / 3) + 3, '00')
	+ ' week(s)'

-- 104+ weeks (useful for some NHS RTT reporting)
,[GreaterThanOneWeekBandShortDescriptionLimitedTo104Weeks] = CASE
	WHEN [dv].[NumberOfWeeks] < 104 THEN FORMAT([dv].[NumberOfWeeks], '00') + ' to ' + FORMAT([dv].[NumberOfWeeks] + 1, '00')
	WHEN [dv].[NumberOfWeeks] = 104 THEN '104+'
	ELSE 'Not recorded'
END

-- Over bands. Logic is just whether its X weeks or over
,[OneWeekOrOverDescription] = FORMAT(1 * ([dv].[NumberOfWeeks] / 1), '0') + ' week(s) or over'
,[TwoWeeksOrOverDescription] = FORMAT(2 * ([dv].[NumberOfWeeks] / 2), '0') + ' week(s) or over'
,[ThreeWeeksOrOverDescription] = FORMAT(3 * ([dv].[NumberOfWeeks] / 3), '0') + ' week(s) or over'

-- Under bands. Logic is just whether its X weeks or under
,[OneWeekOrUnderDescription] = FORMAT(1 * ((1 + [dv].[NumberOfWeeks]) / 1), '0') + ' week(s) or under'
,[TwoWeeksOrUnderDescription] = FORMAT(2 * ((2 + [dv].[NumberOfWeeks]) / 2), '0') + ' week(s) or under'
,[ThreeWeeksOrUnderDescription] = FORMAT(3 * ((3 + [dv].[NumberOfWeeks]) / 3), '0') + ' week(s) or under'

-- 18 weeks
,[IsUnder18Weeks] = CASE
	WHEN [dv].[NumberOfWeeks] < 18 THEN 1
	ELSE 0
END

,[IsUnderOrEqualTo18Weeks] = CASE
	WHEN [dv].[NumberOfWeeks] <= 18 THEN 1
	ELSE 0
END

,[OverOrUnder18WeeksDescription] = CASE
	WHEN [dv].[NumberOfWeeks] < 18 THEN 'Under 18 weeks'
	ELSE 'Over 18 weeks'
END

,[OverOrUnderOrEqualTo18WeeksDescription] = CASE
	WHEN [dv].[NumberOfWeeks] <= 18 THEN 'Under or equal to 18 weeks'
	ELSE 'Over 18 weeks'
END

FROM
[cteNumberSequence] [dv]

UNION ALL 

-- 'Unknown'
SELECT
[ChronologicalWeekTimeBandingKey] = -1
,[NumberOfWeeks] = -1

,[GreaterThanOneWeekBandShortDescription] = 'Unknown'
,[GreaterThanOneWeekBandLongDescription] = 'Unknown number of weeks'
,[GreaterThanTwoWeeksBandShortDescription] = 'Unknown'
,[GreaterThanTwoWeeksBandLongDescription] = 'Unknown number of weeks'
,[GreaterThanThreeWeeksBandShortDescription] = 'Unknown'
,[GreaterThanThreeWeeksBandLongDescription] = 'Unknown number of weeks'

,[GreaterThanOneWeekBandShortDescriptionLimitedTo104Weeks] = 'Unknown number of weeks'

,[OneWeekOrOverDescription] = 'Unknown number of weeks'
,[TwoWeeksOrOverDescription] = 'Unknown number of weeks'
,[ThreeWeeksOrOverDescription] = 'Unknown number of weeks'
,[OneWeekOrUnderDescription] = 'Unknown number of weeks'
,[TwoWeeksOrUnderDescription] = 'Unknown number of weeks'
,[ThreeWeeksOrUnderDescription] = 'Unknown number of weeks'

,[IsUnder18Weeks] = 0
,[IsUnderOrEqualTo18Weeks] = 0
,[OverOrUnder18WeeksDescription] = 'Unknown number of weeks'
,[OverOrUnderOrEqualTo18WeeksDescription] = 'Unknown number of weeks'

UNION ALL

-- 'Not specified'
SELECT
[ChronologicalWeekTimeBandingKey] = -2
,[NumberOfWeeks] = -2

,[GreaterThanOneWeekBandCode] = 'Unspecified'
,[GreaterThanOneWeekBandLongDescription] = 'Unspecified number of weeks'
,[GreaterThanTwoWeeksBandShortDescription] = 'Unspecified'
,[GreaterThanTwoWeeksBandLongDescription] = 'Unspecified number of weeks'
,[GreaterThanThreeWeeksBandShortDescription] = 'Unspecified'
,[GreaterThanThreeWeeksBandLongDescription] = 'Unspecified number of weeks'

,[GreaterThanOneWeekBandShortDescriptionLimitedTo104Weeks] = 'Unspecified number of weeks'

,[OneWeekOrOverDescription] = 'Unspecified number of weeks'
,[TwoWeeksOrOverDescription] = 'Unspecified number of weeks'
,[ThreeWeeksOrOverDescription] = 'Unspecified number of weeks'
,[OneWeekOrUnderDescription] = 'Unspecified number of weeks'
,[TwoWeeksOrUnderDescription] = 'Unspecified number of weeks'
,[ThreeWeeksOrUnderDescription] = 'Unspecified number of weeks'

,[IsUnder18Weeks] = 0
,[IsUnderOrEqualTo18Weeks] = 0
,[OverOrUnder18WeeksDescription] = 'Unspecified number of weeks'
,[OverOrUnderOrEqualTo18WeeksDescription] = 'Unspecified number of weeks'

UNION ALL

-- NHS specific - New RTT Periodd
SELECT
[ChronologicalWeekTimeBandingKey] = -99
,[NumberOfWeeks] = -99

,[GreaterThanOneWeekBandCode] = 'New RTT Period'
,[GreaterThanOneWeekBandLongDescription] = 'New RTT Period'
,[GreaterThanTwoWeeksBandShortDescription] = 'New RTT Period'
,[GreaterThanTwoWeeksBandLongDescription] = 'New RTT Period'
,[GreaterThanThreeWeeksBandShortDescription] = 'New RTT Period'
,[GreaterThanThreeWeeksBandLongDescription] = 'New RTT Period'

,[GreaterThanOneWeekBandShortDescriptionLimitedTo104Weeks] = 'New RTT Period'

,[OneWeekOrOverDescription] = 'New RTT Period'
,[TwoWeeksOrOverDescription] = 'New RTT Period'
,[ThreeWeeksOrOverDescription] = 'New RTT Period'
,[OneWeekOrUnderDescription] = 'New RTT Period'
,[TwoWeeksOrUnderDescription] = 'New RTT Period'
,[ThreeWeeksOrUnderDescription] = 'New RTT Period'

,[IsUnder18Weeks] = 1
,[IsUnderOrEqualTo18Weeks] = 1
,[OverOrUnder18WeeksDescription] = 'New RTT Period'
,[OverOrUnderOrEqualTo18WeeksDescription] = 'New RTT Period'

So in terms of views we’ve got just these two:-

For the referral to treatment dimensions we’ve got the following code for example. Note that we need to manually calculate an ‘identity’ column value. Secondly, note the ‘high watermark’ field. I’m using this to determine which rows to pull from the history tables, but also I’m keeping it in the dimension table itself to then be able to check each rows watermark later and figure out the latest data we’ve got. To be honest, its much more common to store the watermark in a ‘control table’ as a single row, and I probably should do this, however for this example I was investigating the possibility of keeping it ‘local’ and thus not requiring another external ‘control table’. This particular way is probably less performant that using a separate table, so keep that in mind, maybe if you take this further try using a separate table and see what you think (anyway, that’s my excuse for experimenting with something outside the norm 😮):-

Oh, one more thing, if you’re coming from an SQL background, you’ll soon realise that there are no ‘temp’ tables in Fabric warehouses. To combat this, you just have to use normal tables, just remember to drop them after use to keep things clean…

CREATE PROCEDURE [ReferralToTreatmentDimension].[uspProcessCommissionerOrganisation] AS
BEGIN
	-- Figure out the current max ID
	DECLARE @maxID INT = (
		SELECT 
		MAX([CommissionerOrganisationKey]) 
	
		FROM 
		[ReferralToTreatmentDimension].[CommissionerOrganisation]
	)

	IF @maxID IS NULL SET @maxID = 0

	-- Figure out current high watermark
	DECLARE @currentHighWatermark DATETIME2(0) = (
		SELECT
        MAX([HighWatermark])

		FROM
        [ReferralToTreatmentDimension].[CommissionerOrganisation]
	)
	
	IF @currentHighWatermark IS NULL SET @currentHighWatermark = '19700101'
	
	-- Create ICB geography staging table
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[ICBGeography];
	CREATE TABLE [ReferralToTreatmentStaging].[ICBGeography] 
	AS
	SELECT DISTINCT
	[ICBOrganisationCode] = [icb].[ICB23CDH]
	,[ICBCounty] = CAST([cn].[_c1] AS VARCHAR(100))

	FROM 
	[LandingLakehouse].[dbo].[icb_names_and_codes] [icb]

	INNER JOIN
	[LandingLakehouse].[dbo].[gridall] [ga]
	ON
	[icb].[ICB23CDH] = [ga].[_c42]

	INNER JOIN
	[LandingLakehouse].[dbo].[county_names_and_codes] [cn]
	ON
	[ga].[_c6] = [cn].[_c0]

	-- Create staging table with transformed data
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[CommissionerOrganisation];
	CREATE TABLE [ReferralToTreatmentStaging].[CommissionerOrganisation] 
	AS
	SELECT
	[HighWatermark] = MAX(ISNULL([staging].[DateTimeUpdated], [staging].[DateTimeLoaded]))
	,[OrganisationCode] = [staging].[CommissionerOrgCode]
	,[OrganisationName] = ISNULL([staging].[CommissionerOrgName], 'Name not specified')
	,[ICBCode] = ISNULL([staging].[CommissionerParentOrgCode], 'No ICB code')
	,[ICBName] = ISNULL([staging].[CommissionerParentName], 'No ICB name')
	,[icb].[ICBCounty]
	
	FROM
	[Staginglakehouse].[dbo].[referral_to_treatment] [staging]

	LEFT JOIN
	[ReferralToTreatmentStaging].[ICBGeography] [icb]
	ON
	[staging].[CommissionerParentOrgCode] = [icb].[ICBOrganisationCode]

	WHERE
	CAST(ISNULL([staging].[DateTimeUpdated], [staging].[DateTimeLoaded]) AS DATETIME2(0)) > @currentHighWatermark

	GROUP BY
	[staging].[CommissionerOrgCode]
	,[staging].[CommissionerOrgName]
	,[staging].[CommissionerParentOrgCode]
	,[staging].[CommissionerParentName]
	,[icb].[ICBCounty]
		
	-- Insert new records
	INSERT INTO 
	[ReferralToTreatmentDimension].[CommissionerOrganisation] (
		[CommissionerOrganisationKey]
		,[DateTimeLoaded]
		,[DateTimeUpdated]
		,[DateTimeDeleted]
		,[IsCurrent]
		,[ValidFromDate]
		,[ValidToDate]
		,[HighWatermark]
		,[OrganisationCode]
		,[OrganisationName]
		,[ICBCode]
		,[ICBName]
		,[ICBCounty]
	) 

	SELECT
	[CommissionerOrganisationKey] = @maxID + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
	,[DateTimeLoaded] = GETDATE()
	,[DateTimeUpdated] = NULL
	,[DateTimeDeleted] = NULL
	,[IsCurrent] = 1
	,[ValidFromDate] = CAST('19000101' AS DATE)
	,[ValidToDate] = CAST('99991231' AS DATE)
	,[HighWatermark] = [src].[HighWatermark]
	,[OrganisationCode] = [src].[OrganisationCode]
	,[OrganisationName] = [src].[OrganisationName]	
	,[ICBCode] = [src].[ICBCode]
	,[ICBName] = [src].[ICBName]
	,[ICBCounty] = [src].[ICBCounty]

	FROM
	[ReferralToTreatmentStaging].[CommissionerOrganisation] [src]

	WHERE 
	NOT EXISTS(
		SELECT
		1

		FROM
		[ReferralToTreatmentDimension].[CommissionerOrganisation] [trg]

		WHERE
		[src].[OrganisationCode] = [trg].[OrganisationCode]
	)

	-- Update existing rows	
	UPDATE
	[trg]

	SET
	[DateTimeUpdated] = GETDATE()
	,[HighWatermark] = [src].[HighWatermark]
	,[OrganisationName] = [src].[OrganisationName]	
	,[ICBCode] = [src].[ICBCode]
	,[ICBName] = [src].[ICBName]
	,[ICBCounty] = [src].[ICBCounty]

	FROM
	[ReferralToTreatmentDimension].[CommissionerOrganisation] [trg]	

	INNER JOIN
	[ReferralToTreatmentStaging].[CommissionerOrganisation] [src]
	ON
	[trg].[OrganisationCode] = [src].[OrganisationCode]
	AND (
		[trg].[OrganisationName] <> [src].[OrganisationName]		
		OR [trg].[ICBCode] <> [src].[ICBCode]
		OR [trg].[ICBName] <> [src].[ICBName]
		OR [trg].[ICBCounty] <> [src].[ICBCounty]
	)

	-- Bye
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[ICBGeography];
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[CommissionerOrganisation];
END

Finally, we’ve got the fact table. Note that here we’re doing some pivoting and other bits of transforms to get the data in the structure we want:-

CREATE PROCEDURE [ReferralToTreatmentFact].[uspProcessPatientsWaitingMonthly] AS
BEGIN
	-- Figure out the current max ID
	DECLARE @maxID INT = (
		SELECT 
		MAX([PatientsWaitingMonthlyKey]) 
	
		FROM 
		[ReferralToTreatmentFact].[PatientsWaitingMonthly]
	)

	IF @maxID IS NULL SET @maxID = 0

	-- Figure out current high watermark
	DECLARE @currentHighWatermark DATETIME2(0) = (
		SELECT
        MAX([HighWatermark])

		FROM
        [ReferralToTreatmentFact].[PatientsWaitingMonthly]
	)
	
	IF @currentHighWatermark IS NULL SET @currentHighWatermark = '19700101'

	-- Create staging table
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[PatientsWaitingMonthlyUnpivoted];
	CREATE TABLE [ReferralToTreatmentStaging].[PatientsWaitingMonthlyUnpivoted] AS
	SELECT
	[HighWatermark] = ISNULL([DateTimeUpdated], [DateTimeLoaded])

	--,[staging].[Period]

	-- Translate period into period ending date for the calendar later	
	,[PeriodEndingDate] = DATEADD(DAY, -1, DATEADD(MONTH, 1, 
		CAST('01-'+ RIGHT([staging].[Period], LEN([staging].[Period]) - 4) AS DATE)
	))
		
	,[staging].[ProviderParentOrgCode]
	--,[staging].[ProviderParentName]
	,[staging].[ProviderOrgCode]
	--,[staging].[ProviderOrgName]
	,[staging].[CommissionerParentOrgCode]
	--,[staging].[CommissionerParentName]
	,[staging].[CommissionerOrgCode]
	--,[staging].[CommissionerOrgName]
	,[staging].[RTTPartType]
	--,[staging].[RTTPartDescription]
	,[staging].[TreatmentFunctionCode]
	--,[staging].[TreatmentFunctionName]

	-- Unpivot the week banding into rows
	,[unpvt].[WeekBandingCode]

	-- Main measure
	,[NumberOfPatientsWaiting] = CAST([unpvt].[NumberOfPatientsWaiting] AS INT)	
	
	FROM
	[StagingLakehouse].[dbo].[referral_to_treatment] [staging]

	-- We take the week banding columns and unpivot them into
	-- rows so we can join the 'code' column onto a dimension
	CROSS APPLY (
		VALUES
		 ('00 To 01'				,[staging].[Gt00To01WeeksSUM1])
		,('01 To 02'				,[staging].[Gt01To02WeeksSUM1])
		,('02 To 03'				,[staging].[Gt02To03WeeksSUM1])
		,('03 To 04'				,[staging].[Gt03To04WeeksSUM1])
		,('04 To 05'				,[staging].[Gt04To05WeeksSUM1])
		,('05 To 06'				,[staging].[Gt05To06WeeksSUM1])
		,('06 To 07'				,[staging].[Gt06To07WeeksSUM1])
		,('07 To 08'				,[staging].[Gt07To08WeeksSUM1])
		,('08 To 09'				,[staging].[Gt08To09WeeksSUM1])
		,('09 To 10'				,[staging].[Gt09To10WeeksSUM1])
		,('10 To 11'				,[staging].[Gt10To11WeeksSUM1])
		,('11 To 12'				,[staging].[Gt11To12WeeksSUM1])
		,('12 To 13'				,[staging].[Gt12To13WeeksSUM1])
		,('13 To 14'				,[staging].[Gt13To14WeeksSUM1])
		,('14 To 15'				,[staging].[Gt14To15WeeksSUM1])
		,('15 To 16'				,[staging].[Gt15To16WeeksSUM1])
		,('16 To 17'				,[staging].[Gt16To17WeeksSUM1])
		,('17 To 18'				,[staging].[Gt17To18WeeksSUM1])
		,('18 To 19'				,[staging].[Gt18To19WeeksSUM1])
		,('19 To 20'				,[staging].[Gt19To20WeeksSUM1])
		,('20 To 21'				,[staging].[Gt20To21WeeksSUM1])
		,('21 To 22'				,[staging].[Gt21To22WeeksSUM1])
		,('22 To 23'				,[staging].[Gt22To23WeeksSUM1])
		,('23 To 24'				,[staging].[Gt23To24WeeksSUM1])
		,('24 To 25'				,[staging].[Gt24To25WeeksSUM1])
		,('25 To 26'				,[staging].[Gt25To26WeeksSUM1])
		,('26 To 27'				,[staging].[Gt26To27WeeksSUM1])
		,('27 To 28'				,[staging].[Gt27To28WeeksSUM1])
		,('28 To 29'				,[staging].[Gt28To29WeeksSUM1])
		,('29 To 30'				,[staging].[Gt29To30WeeksSUM1])
		,('30 To 31'				,[staging].[Gt30To31WeeksSUM1])
		,('31 To 32'				,[staging].[Gt31To32WeeksSUM1])
		,('32 To 33'				,[staging].[Gt32To33WeeksSUM1])
		,('33 To 34'				,[staging].[Gt33To34WeeksSUM1])
		,('34 To 35'				,[staging].[Gt34To35WeeksSUM1])
		,('35 To 36'				,[staging].[Gt35To36WeeksSUM1])
		,('36 To 37'				,[staging].[Gt36To37WeeksSUM1])
		,('37 To 38'				,[staging].[Gt37To38WeeksSUM1])
		,('38 To 39'				,[staging].[Gt38To39WeeksSUM1])
		,('39 To 40'				,[staging].[Gt39To40WeeksSUM1])
		,('40 To 41'				,[staging].[Gt40To41WeeksSUM1])
		,('41 To 42'				,[staging].[Gt41To42WeeksSUM1])
		,('42 To 43'				,[staging].[Gt42To43WeeksSUM1])
		,('43 To 44'				,[staging].[Gt43To44WeeksSUM1])
		,('44 To 45'				,[staging].[Gt44To45WeeksSUM1])
		,('45 To 46'				,[staging].[Gt45To46WeeksSUM1])
		,('46 To 47'				,[staging].[Gt46To47WeeksSUM1])
		,('47 To 48'				,[staging].[Gt47To48WeeksSUM1])
		,('48 To 49'				,[staging].[Gt48To49WeeksSUM1])
		,('49 To 50'				,[staging].[Gt49To50WeeksSUM1])
		,('50 To 51'				,[staging].[Gt50To51WeeksSUM1])
		,('51 To 52'				,[staging].[Gt51To52WeeksSUM1])
		,('52 To 53'				,[staging].[Gt52To53WeeksSUM1])
		,('53 To 54'				,[staging].[Gt53To54WeeksSUM1])
		,('54 To 55'				,[staging].[Gt54To55WeeksSUM1])
		,('55 To 56'				,[staging].[Gt55To56WeeksSUM1])
		,('56 To 57'				,[staging].[Gt56To57WeeksSUM1])
		,('57 To 58'				,[staging].[Gt57To58WeeksSUM1])
		,('58 To 59'				,[staging].[Gt58To59WeeksSUM1])
		,('59 To 60'				,[staging].[Gt59To60WeeksSUM1])
		,('60 To 61'				,[staging].[Gt60To61WeeksSUM1])
		,('61 To 62'				,[staging].[Gt61To62WeeksSUM1])
		,('62 To 63'				,[staging].[Gt62To63WeeksSUM1])
		,('63 To 64'				,[staging].[Gt63To64WeeksSUM1])
		,('64 To 65'				,[staging].[Gt64To65WeeksSUM1])
		,('65 To 66'				,[staging].[Gt65To66WeeksSUM1])
		,('66 To 67'				,[staging].[Gt66To67WeeksSUM1])
		,('67 To 68'				,[staging].[Gt67To68WeeksSUM1])
		,('68 To 69'				,[staging].[Gt68To69WeeksSUM1])
		,('69 To 70'				,[staging].[Gt69To70WeeksSUM1])
		,('70 To 71'				,[staging].[Gt70To71WeeksSUM1])
		,('71 To 72'				,[staging].[Gt71To72WeeksSUM1])
		,('72 To 73'				,[staging].[Gt72To73WeeksSUM1])
		,('73 To 74'				,[staging].[Gt73To74WeeksSUM1])
		,('74 To 75'				,[staging].[Gt74To75WeeksSUM1])
		,('75 To 76'				,[staging].[Gt75To76WeeksSUM1])
		,('76 To 77'				,[staging].[Gt76To77WeeksSUM1])
		,('77 To 78'				,[staging].[Gt77To78WeeksSUM1])
		,('78 To 79'				,[staging].[Gt78To79WeeksSUM1])
		,('79 To 80'				,[staging].[Gt79To80WeeksSUM1])
		,('80 To 81'				,[staging].[Gt80To81WeeksSUM1])
		,('81 To 82'				,[staging].[Gt81To82WeeksSUM1])
		,('82 To 83'				,[staging].[Gt82To83WeeksSUM1])
		,('83 To 84'				,[staging].[Gt83To84WeeksSUM1])
		,('84 To 85'				,[staging].[Gt84To85WeeksSUM1])
		,('85 To 86'				,[staging].[Gt85To86WeeksSUM1])
		,('86 To 87'				,[staging].[Gt86To87WeeksSUM1])
		,('87 To 88'				,[staging].[Gt87To88WeeksSUM1])
		,('88 To 89'				,[staging].[Gt88To89WeeksSUM1])
		,('89 To 90'				,[staging].[Gt89To90WeeksSUM1])
		,('90 To 91'				,[staging].[Gt90To91WeeksSUM1])
		,('91 To 92'				,[staging].[Gt91To92WeeksSUM1])
		,('92 To 93'				,[staging].[Gt92To93WeeksSUM1])
		,('93 To 94'				,[staging].[Gt93To94WeeksSUM1])
		,('94 To 95'				,[staging].[Gt94To95WeeksSUM1])
		,('95 To 96'				,[staging].[Gt95To96WeeksSUM1])
		,('96 To 97'				,[staging].[Gt96To97WeeksSUM1])
		,('97 To 98'				,[staging].[Gt97To98WeeksSUM1])
		,('98 To 99'				,[staging].[Gt98To99WeeksSUM1])
		,('99 To 100'				,[staging].[Gt99To100WeeksSUM1])
		,('100 To 101'				,[staging].[Gt100To101WeeksSUM1])
		,('101 To 102'				,[staging].[Gt101To102WeeksSUM1])
		,('102 To 103'				,[staging].[Gt102To103WeeksSUM1])
		,('103 To 104'				,[staging].[Gt103To104WeeksSUM1])
		,('104+'					,[staging].[Gt104WeeksSUM1])
		,('Unknown number of weeks'	,[staging].[Patientswithunknownclockstartdate])
		,('New RTT Period'			,IIF([RTTPartType] = 'Part_3', [staging].[TotalAll], 0))
	) [unpvt] (
		[WeekBandingCode]
		,[NumberOfPatientsWaiting]
	)
	
	WHERE
	CAST(ISNULL([staging].[DateTimeUpdated], [staging].[DateTimeLoaded]) AS DATETIME2(0)) > @currentHighWatermark
	AND [TreatmentFunctionCode] <> 'C_999' -- Ignore total rows!

	-- Need to add unknown member rows!
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[PatientsWaitingMonthly];
	CREATE TABLE [ReferralToTreatmentStaging].[PatientsWaitingMonthly] AS
	SELECT
	[unpvt].[HighWatermark]
	
	--,[unpvt].[PeriodEndingDate]
	,[ChronologicalCalendarKey] = [dim_calendar].[ChronologicalCalendarKey]
	
	--,[unpvt].[WeekBandingCode]
	,[ChronologicalWeekTimeBandingKey] = [dim_week_banding].[ChronologicalWeekTimeBandingKey]

	--,[unpvt].[ProviderOrgCode]
	,[ProviderOrganisationKey] = [dim_provider].[ProviderOrganisationKey]

	--,[unpvt].[CommissionerOrgCode]
	,[CommissionerOrganisationKey] = [dim_commissioner].[CommissionerOrganisationKey]

	--,[unpvt].[RTTPartType]
	,[WaitingStatusKey] = [dim_status].[WaitingStatusKey]

	--,[unpvt].[TreatmentFunctionCode]
	,[TreatmentFunctionKey] = [dim_treatment_function].[TreatmentFunctionKey]
		
	,[unpvt].[NumberOfPatientsWaiting]

	FROM
	[ReferralToTreatmentStaging].[PatientsWaitingMonthlyUnpivoted] [unpvt]

	INNER JOIN
	[Chronological].[Calendar] [dim_calendar]
	ON
	[unpvt].[PeriodEndingDate] = [dim_calendar].[CalendarDate]

	INNER JOIN
	[Chronological].[WeekTimeBanding] [dim_week_banding]
	ON
	LOWER([unpvt].[WeekBandingCode]) = LOWER([dim_week_banding].[GreaterThanOneWeekBandShortDescriptionLimitedTo104Weeks])

	INNER JOIN
	[ReferralToTreatmentDimension].[ProviderOrganisation] [dim_provider]
	ON
	[unpvt].[ProviderOrgCode] = [dim_provider].[OrganisationCode]

	INNER JOIN
	[ReferralToTreatmentDimension].[CommissionerOrganisation] [dim_commissioner]
	ON
	[unpvt].[CommissionerOrgCode] = [dim_commissioner].[OrganisationCode]

	INNER JOIN
	[ReferralToTreatmentDimension].[WaitingStatus] [dim_status]		
	ON
	[unpvt].[RTTPartType] = [dim_status].[WaitingStatusType]

	INNER JOIN
	[ReferralToTreatmentDimension].[TreatmentFunction] [dim_treatment_function]
	ON
	[unpvt].[TreatmentFunctionCode] = [dim_treatment_function].[TreatmentFunctionCode]

	-- Load new rows
	INSERT INTO
    [ReferralToTreatmentFact].[PatientsWaitingMonthly] (
		[PatientsWaitingMonthlyKey]
		,[DateTimeLoaded]
		,[DateTimeUpdated]
		,[DateTimeDeleted]
		,[HighWatermark]
		,[ChronologicalCalendarKey]
		,[ProviderOrganisationKey]
		,[CommissionerOrganisationKey]
		,[WaitingStatusKey]
		,[TreatmentFunctionKey]
		,[ChronologicalWeekTimeBandingKey]
		,[NumberOfPatientsWaiting]
	)

	SELECT
    [PatientsWaitingMonthlyKey] = @maxID + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
	,[DateTimeLoaded] = GETDATE()
	,[DateTimeUpdated] = NULL
	,[DateTimeDeleted] = NULL
	,[HighWatermark]
	,[ChronologicalCalendarKey]
	,[ProviderOrganisationKey]
	,[CommissionerOrganisationKey]
	,[WaitingStatusKey]
	,[TreatmentFunctionKey]
	,[ChronologicalWeekTimeBandingKey]
	,[NumberOfPatientsWaiting]

	FROM
    [ReferralToTreatmentStaging].[PatientsWaitingMonthly] [src]

	WHERE
    NOT EXISTS(
		SELECT
        1

		FROM
        [ReferralToTreatmentFact].[PatientsWaitingMonthly] [trg]

		WHERE
        [trg].[ChronologicalCalendarKey]				= [src].[ChronologicalCalendarKey]
		AND [trg].[ProviderOrganisationKey]				= [src].[ProviderOrganisationKey]
		AND [trg].[CommissionerOrganisationKey]			= [src].[CommissionerOrganisationKey]
		AND [trg].[WaitingStatusKey]					= [src].[WaitingStatusKey]
		AND [trg].[TreatmentFunctionKey]				= [src].[TreatmentFunctionKey]
		AND [trg].[ChronologicalWeekTimeBandingKey]		= [src].[ChronologicalWeekTimeBandingKey]
	)

	-- Update existing rows	where there are differences
	UPDATE
	[trg]

	SET
	[DateTimeUpdated]					= GETDATE()
	,[HighWatermark]					= [src].[HighWatermark]
	,[ChronologicalCalendarKey]			= [src].[ChronologicalCalendarKey]
	,[ProviderOrganisationKey]			= [src].[ProviderOrganisationKey]
	,[CommissionerOrganisationKey]		= [src].[CommissionerOrganisationKey]
	,[WaitingStatusKey]					= [src].[WaitingStatusKey]
	,[TreatmentFunctionKey]				= [src].[TreatmentFunctionKey]
	,[ChronologicalWeekTimeBandingKey]	= [src].[ChronologicalWeekTimeBandingKey]
	,[NumberOfPatientsWaiting]			= [src].[NumberOfPatientsWaiting]

	FROM
	[ReferralToTreatmentFact].[PatientsWaitingMonthly] [trg]	
	
	INNER JOIN
	[ReferralToTreatmentStaging].[PatientsWaitingMonthly] [src]
	ON
	[trg].[ChronologicalCalendarKey]				= [src].[ChronologicalCalendarKey]
	AND [trg].[ProviderOrganisationKey]				= [src].[ProviderOrganisationKey]
	AND [trg].[CommissionerOrganisationKey]			= [src].[CommissionerOrganisationKey]
	AND [trg].[WaitingStatusKey]					= [src].[WaitingStatusKey]
	AND [trg].[TreatmentFunctionKey]				= [src].[TreatmentFunctionKey]
	AND [trg].[ChronologicalWeekTimeBandingKey]		= [src].[ChronologicalWeekTimeBandingKey]
	AND (
		NOT EXISTS(SELECT [trg].[NumberOfPatientsWaiting] INTERSECT SELECT [src].[NumberOfPatientsWaiting])
	)
	
	-- Delete rows from the period being loaded/updated if no longer there in source!
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[PatientsWaitingMonthlyUnpivoted];
	DROP TABLE IF EXISTS [ReferralToTreatmentStaging].[PatientsWaitingMonthly];
END

Are we done now finally? Sort of… although, something sneaky is hiding in the data that could scupper the whole thing 😮

photo of a boy near leaves
Photo by Min An on Pexels.com

Remember earlier in the series I mentioned that NHS data can be quite challenging? Well, no matter the source of the data you’re dealing with, you should always take a good look at it and have a play around with it because you can often discover things that are not clear from just looking at the names of the fields (even if you understand the definitions). This could range from simple queries and manually looking at data or even profiling data. In our case taking a closer look at the raw data and doing some quick calculations produces figures that don’t match the figures given in some of the PDF ‘overview’ reports that you can also download from the same page as the raw data. So what gives?

Well, this is a case of a single file that contains several levels of granularity… oh yes, hiding in the data are some of those sneaky ‘total’ rows I mentioned in earlier posts in this series. These rows effectively just total up other rows. So, if you just ‘sum’ up any of the measures without taking this into account you’ll get a much higher number than you would be expecting. So, in the fact processing procedure I’ve made sure to delete this particular ‘total’ row so it doesn’t affect our figures, that’s what’s going on here 🤦‍♂️

With that overview of the transformations, we’re almost done. Note that I’ll put all of this code on our GitHub if you need want to grab any of the other bits of code not shown here. It’ll be the whole Fabric workspace git repo for anyone to clone/fork/copy and fiddle about with in their own Fabric workspace if they want to.

Next time, we’ll look into building the semantic model we’ll use with PowerBI and the new awesome ‘direct lake’ mode to create a simple basic dashboard (no more PowerBI model refreshes required, oh yes).

See you then 🫡