Solution Pattern - SQL Server Family - Creating a Dimension from Integration Layer Objects
Purpose
Section titled “Purpose”This Solution Pattern shows how to create a dimension directly from integration layer objects — Hubs and Satellites — without an intermediate Point-In-Time (PIT) table, as a single layered SQL statement.
Motivation
Section titled “Motivation”Maintaining a PIT table is worthwhile when a combination of Satellites is queried often, but it is not a prerequisite for delivery. A dimension can be assembled straight from the integration layer: the timeline construction that a PIT would otherwise pre-compute is simply performed inline, and a checksum comparison reduces the result to the records that represent genuine changes for the selected columns.
This keeps the presentation layer flexible — dimensions can be defined, adjusted, and virtualised without first refactoring helper structures.
Applicability
Section titled “Applicability”This pattern applies when a time-variant dimension is needed and no PIT table exists (or is warranted) for the Hub it describes. It is well suited to virtualised delivery, where the dimension is a view over the integration layer.
Structure
Section titled “Structure”The statement is a set of nested layers, each with one responsibility:
- Combine all from timestamp values. The union of change moments across the participating objects forms the timeline, as in combining multiple time-variant objects.
- Derive time periods. A
LEADper key closes each interval. - Integration layer column selection. The Hub and Satellites are joined point-in-time against the periods, and the dimension’s columns are selected.
- Checksum preparation and comparison. A checksum across the selected columns is calculated and compared with the previous record per key, so only records that changed within the dimension’s column scope remain.
- Final formatting. Column aliasing and final presentation-layer formatting, with the comparison filter applied.
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - Structure for creating a dimension directly from integration layer * objects, as a layered (virtual) statement. * * Disclaimer: * - See disclaimer.md in the repository root. This is an illustrative * structure; placeholders in angle brackets are generated from metadata. * ******************************************************************************/
SELECT <Final column formatting and aliasing>FROM( SELECT <Checksum comparison> FROM ( SELECT <Checksum preparation> FROM ( SELECT <Integration layer column selection> FROM ( SELECT <Derive time periods> FROM ( <Combine all from timestamp values> ) Timestamps ) Timeperiods JOIN <Integration layer objects> ) Timelines ) Calculate_Checksum) FinalWHERE <Checksum comparison filter>Implementation guidelines
Section titled “Implementation guidelines”- Generate the statement from metadata: the participating objects, the column selection, and the checksum definition all derive from the dimension’s mapping metadata.
- Scope the checksum to the columns the dimension actually selects. The time periods apply to the whole record, so narrowing the column selection creates adjacent duplicates — the checksum comparison is what compacts them away.
- Keep the layers strictly nested as shown; each layer only references the previous one, which keeps the generated SQL predictable and debuggable.
- Use the solution’s standard low-end and high-end timestamp defaults, and closed-open period semantics, throughout.
- The same construction serves persisted and virtual dimensions; for a virtual dimension, deploy the statement as a view.
Considerations and consequences
Section titled “Considerations and consequences”- The timeline construction and checksum comparison run at query time. For frequently queried or heavily combined dimensions, a PIT table moves that work upstream — see Creating a Dimension from a PIT.
- Because compacting is driven by the selected columns, changing the dimension’s column selection changes which records survive; the dimension is consistent for its definition, not a fixed subset of the integration layer.
- Window functions and point-in-time joins over large Satellites benefit from supporting indexes on key and timestamp columns.
- The dimension key concern is the same as for the PIT-based construction: runtime-generated keys are not stable across reloads unless persisted.