Solution Pattern - SQL Server Family - Data Vault Satellite
Purpose
Section titled “Purpose”This Solution Pattern shows how to load a Data Vault Satellite with set-based SQL, covering the preparation of the incoming data delta and both unitemporal and bitemporal loading. It is a metadata-driven implementation of the Design Pattern - Data Vault - Satellite.
Motivation
Section titled “Motivation”The Satellite stores the descriptive context for a Core Business Concept or Natural Business Relationship, and tracks how that context changes over time. Loading a Satellite is more involved than loading a Hub or Link: the incoming data must be prepared into a clean change set, compared against what is already recorded, and inserted in a way that preserves the timeline. Structuring this as a small number of set-based, metadata-generated statements keeps the process repeatable across every Satellite in the solution.
Applicability
Section titled “Applicability”This pattern applies to SQL-based Satellite loads in the integration layer. The unitemporal form applies when history is tracked by inscription timestamp only; the bitemporal form applies when an additional state timeline (state from / state before timestamps) is maintained, and incoming records can affect time periods that are already recorded.
Structure
Section titled “Structure”The Satellite load consists of a preparation step followed by the load itself.
Preparing the data delta
Section titled “Preparing the data delta”The incoming data is shaped into a clean change set before it is compared against the target. The inner selection gathers the records and derives the supporting attributes (surrogate key, checksum, change data indicator); a compacting preparation and filter then remove records that do not represent a genuine change, so only meaningful changes are carried forward:
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Preparing the Satellite. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
SELECT <compacting selection>FROM( SELECT <compacting preparation> FROM ( SELECT <inner selection> FROM ) compacting_preparation) compacting_selectionWHERE <compacting filter>Loading a unitemporal Satellite
Section titled “Loading a unitemporal Satellite”The unitemporal load inserts the prepared changes that are not yet recorded in the target Satellite:
- A
LEFT OUTER JOINon the surrogate key, inscription timestamp, and inscription record identifier prevents reprocessing of records that have already been loaded. - A
ROW_NUMBERper surrogate key establishes the order of the incoming changes. - The most recent incoming change per key (
KEY_ROW_NUMBER = 1) is compared against the current record in the target — by checksum, and by change data indicator when the checksums match — so an unchanged record is not inserted again. - Older incoming changes (
KEY_ROW_NUMBER != 1) are inserted as-is, since they precede the change that was just evaluated.
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Loading an unitemporal Satellite. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
INSERT INTO [Sat Customer]( [CUSTOMER_SK], [Inscription_Timestamp], [Inscription_Record_Id], [State_From_Timestamp], [State_Before_Timestamp], [Change_Data_Indicator], [Checksum], [Audit_Trail_Id], [CUSTOMER_FIRST_NAME], [CUSTOMER_DATE_OF_BIRTH])SELECT […]FROM( SELECT compacting_selection.[CUSTOMER_SK], compacting_selection.[Inscription_Timestamp], compacting_selection.[Inscription_Record_Id], compacting_selection.[State_From_Timestamp], compacting_selection.[State_Before_Timestamp], compacting_selection.[Change_Data_Indicator], compacting_selection.[Checksum], Compacting_selection.[Audit_Trail_Id], compacting_selection.[CUSTOMER_FIRST_NAME], compacting_selection.[CUSTOMER_DATE_OF_BIRTH], CAST(ROW_NUMBER() OVER ( PARTITION BY compacting_selection.[CUSTOMER_SK] ORDER BY compacting_selection.[CUSTOMER_SK], compacting_selection.[Inscription_Timestamp], compacting_selection.[Inscription_Record_Id] ) AS INT) AS KEY_ROW_NUMBER, target_object.[Checksum] AS [Lookup_Checksum], target_object.[Change_Data_Indicator] AS [Lookup_Change_Data_Indicator] FROM ( <Compacting selection> ) compacting_selection -- Prevent reprocessing LEFT OUTER JOIN [Sat Customer] target_object ON compacting_selection.[CUSTOMER_SK] = target_object.[CUSTOMER_SK] AND compacting_selection.[Inscription_Timestamp] = target_object.[Inscription_Timestamp] AND compacting_selection.[Inscription_Record_Id] = target_object.[Inscription_Record_Id] WHERE satellite.[CUSTOMER_SK] IS NULL) final-- Change mergingWHERE( KEY_ROW_NUMBER = 1 AND (( [Checksum] != [Lookup_Checksum] ) -- The checksums are different OR ( [Checksum] = [Lookup_Checksum] AND [Change_Data_Indicator] != [Lookup_Change_Data_Indicator]) -- The checksums are the same but the CDC is different ))OR( -- It's not the most recent change in the set, so the record can be inserted as-is KEY_ROW_NUMBER != 1)Loading a bitemporal Satellite
Section titled “Loading a bitemporal Satellite”The bitemporal load additionally maintains the state timeline. An incoming record can overlap time periods that are already recorded, in which case the existing records must be split so the timeline remains continuous:
- The
NewRecordsset contains the prepared changes, guarded against reprocessing. - Existing records that overlap an incoming record on the state timeline are identified using their interval relationships.
- For each overlap, the existing record is duplicated to preserve the portion of its period to the right and to the left of the incoming change, with the state from / state before timestamps adjusted accordingly.
- The final insert combines the new records with the preserved left and right portions of the affected existing records.
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Loading an bitemporal Satellite. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
WITHNewRecords AS( SELECT […] ( <compacting selection> ) LEFT JOIN <prevent reprocessing>), ExistingRight( All new records, and all records from the target object with overlapping time periods SELECT […] FROM <NewRecords> UNION SELECT […] FROM <target data object with overlaps>
), PreserveRightPartOfExistingRecords AS( -- Duplicate the existing record, preserving the right portion SELECT NewRecords.[CUSTOMER_SK] -- Use the new inscription timestamp and record id ,NewRecords.[Inscription_Timestamp] ,NewRecords.[Inscription_Record_Id] -- Replace the existing state from timestamp with new state before timestamp ,COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') AS [State_From_Timestamp] -- Keep the existing state before timestamp ,ExistingRight.[State_Before_Timestamp] -- Map the available existing data ,ExistingRight.<columns> -- Order the records - Only the most recent record is of interest ,ROW_NUMBER() OVER (PARTITION BY ExistingRight.[CODE] ,NewRecords.[State_Before_Timestamp] -- New State From Timestamp ORDER BY ExistingRight.[Inscription_Timestamp] DESC ,ExistingRight.[Inscription_Record_Id] DESC) AS rownum FROM <NewRecords> JOIN <ExistingRight> ON NewRecords.[CUSTOMER_SK] = ExistingRight.[CUSTOMER_SK] WHERE -- Compare only with previous records NewRecords.[Inscription_Timestamp] > ExistingRight.[Inscription_Timestamp] -- Find Allen Relationship (right) AND COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') > ExistingRight.[State_From_Timestamp] AND COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') < ExistingRight.[State_Before_Timestamp]), ExistingLeft( SELECT […] FROM <PreserveRightPartOfExistingRecords> UNION SELECT […] FROM <NewRecords> UNION SELECT […] FROM <target data object with overlaps>), PreserveLeftPartOfExistingRecords AS( -- Duplicate the existing record, preserving the left portion SELECT NewRecords.[CUSTOMER_SK] -- Use new inscription timestamp and record id ,NewRecords.[Inscription_Timestamp] ,NewRecords.[Inscription_Record_Id] -- Keep the existing state before timestamp ,ExistingLeft.[State_From_Timestamp] AS [State_From_Timestamp] -- Replace the existing state before timestamp with new state from timestamp ,NewRecords.[State_From_Timestamp] AS [State_Before_Timestamp] -- Map the available existing data ,ExistingLeft. <available columns> ,[…] -- Order the records - Only the most recent record is of interest ,ROW_NUMBER() OVER (PARTITION BY ExistingLeft.[CODE] ,NewRecords.State_From_Timestamp -- New State Before Timestamp ORDER BY ExistingLeft.[Inscription_Timestamp] DESC ,ExistingLeft.[Inscription_Record_Id] DESC) AS rownum FROM <NewRecords> JOIN <ExistingLeft> ON NewRecords.[CUSTOMER_SK] = ExistingLeft.[CUSTOMER_SK] WHERE -- Compare only with previous records NewRecords.Inscription_Timestamp > ExistingLeft.Inscription_Timestamp -- Find Allen Relationship (left) AND NewRecords.State_From_Timestamp > ExistingLeft.State_From_Timestamp AND NewRecords.State_From_Timestamp < ExistingLeft.State_Before_Timestamp)INSERT INTO <target data object>( <column list>)SELECT <PreserveLeftPartOfExistingRecords>WHERE rownum = 1UNIONSELECT <PreserveRightPartOfExistingRecords>WHERE rownum = 1UNIONSELECT <NewRecords>Implementation guidelines
Section titled “Implementation guidelines”- Generate the statements from metadata: the data object names, surrogate key, attribute columns, checksum definition, and temporal columns are supplied by the solution metadata, so the same templates serve every Satellite.
- Use a checksum across the descriptive attributes to detect changes without comparing every column individually, and keep the checksum definition consistent across the solution.
- Keep the change data indicator in the comparison: when checksums match but the indicator differs (for example, a delete followed by a re-insert of identical values), the record still represents a change.
- Compare only the most recent incoming change per key against the target. Earlier changes in the same batch are by definition new history and can be inserted directly — this is what the
ROW_NUMBERsplit achieves. - Guard against reprocessing on the full logical key (surrogate key, inscription timestamp, inscription record identifier), so reruns of the same batch do not duplicate records.
- For bitemporal loads, derive the timeline adjustments deterministically from the interval relationships between incoming and existing records, and only keep the most recent duplicate per affected period (
rownum = 1). - End-dating (deriving the state before timestamp from the following record) can be maintained as part of the load or derived afterwards; either way, derive it consistently.
Considerations and consequences
Section titled “Considerations and consequences”- The Satellite is insert-only: changes, including logical deletes, are recorded as new records. Corrections to the timeline in the bitemporal form are also implemented as inserts of adjusted duplicates, which preserves auditability at the cost of additional records.
- Record compacting during preparation discards records that do not represent genuine changes. The definition of “genuine change” — which attributes participate in the checksum — therefore determines what history is retained.
- The bitemporal pattern is markedly more complex than the unitemporal one. Apply it only where a second timeline is genuinely required; a unitemporal Satellite is sufficient when the inscription order is the only timeline of interest.
- Set-based timeline splitting reads and joins the target Satellite. On large Satellites, indexing on the surrogate key and temporal columns is essential to keep the overlap detection efficient.