Solution Pattern - SQL Server Family - Creating a Dimension from a PIT
Purpose
Section titled “Purpose”This Solution Pattern shows how to create a (time-variant) dimension from a Point-In-Time (PIT) table with a single SQL statement.
Motivation
Section titled “Motivation”A PIT table pre-computes the combined timeline across the Satellites of a Hub: for every key and every change moment it records which Satellite record was in effect, by its surrogate key, inscription timestamp, and inscription record identifier. With the temporal alignment already resolved, building a dimension becomes straightforward — the Satellites are attached with simple equi-joins, without any range predicates or window logic over the source objects.
Applicability
Section titled “Applicability”This pattern applies when a PIT table exists for the Hub the dimension describes, and the dimension should reflect the full change history (a time-variant, ‘type 2’ style dimension). It is the preferred construction when the combination of Satellites is queried frequently enough to justify maintaining the PIT.
Structure
Section titled “Structure”The statement selects from the PIT and decorates it with the descriptive context:
- Each Satellite is joined on the exact record locator the PIT carries for it: surrogate key, inscription timestamp, and inscription record identifier. These are equi-joins — the PIT already did the temporal work.
- The dimension key is generated at runtime with
ROW_NUMBERover a deterministic ordering. - The time period of each dimension record is the PIT’s state from timestamp, closed by a
LEADto the next state per key (with the9999-12-31high-end default). - The remaining columns map the business key and the descriptive attributes from the joined Satellites.
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Creating A Dimension From A PIT. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
SELECT -- Dimension key ROW_NUMBER() OVER (ORDER BY [sat1].[Inscription_Timestamp], [sat1].[Inscription_Record_Id], [PIT_Sales].[State_From_Timestamp], [PIT_Sales].[Customer_Code] ) AS [Customer_Key],
-- Time periods [PIT_Sales].[State_From_Timestamp] AS [From_Timestamp], LEAD([PIT_Sales].[State_From_Timestamp],1,'9999-12-31') OVER ( PARTITION BY [PIT_Sales].[Customer_Surrogate_Key] ORDER BY [PIT_Sales].[State_From_Timestamp] ASC ) AS [Before_Timestamp],
-- Descriptive columns [PIT_Sales].[Customer_Code] AS [Identifier], [sat1].[Customer_First_Name] AS [First_Name], [sat1].[Customer_Birth_Date] AS [Birth_Date], [sat2].[Customer_Contact_Number] AS [Contact_Number], [sat2].[Customer_Email_Address] AS [Email_Address], [sat2].[Customer_Contact_Preferences] AS [Contact_Preference]FROM PIT_Sales
-- Join Satellite1: Customer Details SalesLEFT JOIN [Sat_Customer_Details_Sales] sat1 ON PIT_Sales.Sat_Customer_Details_Sales_Customer_Surrogate_Key = sat1.Customer_Surrogate_Key AND PIT_Sales.Sat_Customer_Details_Sales_Inscription_Timestamp = sat1.Inscription_Timestamp AND PIT_SALES.Sat_Customer_Details_Sales_Inscription_Record_Id = sat1.Inscription_Record_Id
-- Join Satellite 2: Customer Details ContactLEFT JOIN [Sat_Customer_Details_Contact] sat2 ON PIT_Sales.Sat_Customer_Details_Contact_Customer_Surrogate_Key = sat1.Customer_Surrogate_Key AND PIT_Sales.Sat_Customer_Details_Contact_Inscription_Timestamp = sat1.Inscription_Timestamp AND PIT_SALES.Sat_Customer_Details_Contact_Inscription_Record_Id = sat1.Inscription_Record_IdImplementation guidelines
Section titled “Implementation guidelines”- Generate the statement from metadata: the PIT, the participating Satellites, their record locator columns, and the data item mappings are all known in the solution metadata.
- Use
LEFT JOINs to the Satellites, so periods where a Satellite has no record yet still produce a dimension record (withNULLs for that Satellite’s attributes). - Keep the
ROW_NUMBERordering deterministic and stable (for example inscription timestamp, inscription record identifier, state from timestamp, business key), so reruns produce the same keys for the same data. - Alternatively, persist the dimension key in the PIT itself; the dimension statement then simply selects it, and facts referencing the same PIT use identical keys by construction.
- Derive the closing
Before_Timestampper key withLEAD, consistent with the closed-open convention used across the solution.
Considerations and consequences
Section titled “Considerations and consequences”- Runtime-generated dimension keys are only stable as long as the underlying data and ordering are unchanged: a reload after new data arrives produces different key values. Consumers must treat the dimension as fully refreshed, or the keys must be persisted (in the PIT or a key map) for stability.
- The dimension’s grain equals the PIT’s grain: one record per key per change moment. Snapshot-based PITs produce snapshot-grain dimensions accordingly.
- The pattern’s simplicity is bought upstream — the PIT must be maintained as Satellites load. Where no PIT is warranted, a dimension can be created directly from the integration layer objects instead.
- Virtualising the dimension (as a view over the PIT) is a natural fit: the statement is deterministic and inexpensive relative to the PIT construction itself.