Skip to content

Solution Pattern - SQL Server Family - Data Vault Satellite

This Solution Pattern shows how to load a Data Vault Satellite with set-based SQL, covering the preparation of the incoming data delta and both unitemporal and bitemporal loading. It is a metadata-driven implementation of the Design Pattern - Data Vault - Satellite.

The Satellite stores the descriptive context for a Core Business Concept or Natural Business Relationship, and tracks how that context changes over time. Loading a Satellite is more involved than loading a Hub or Link: the incoming data must be prepared into a clean change set, compared against what is already recorded, and inserted in a way that preserves the timeline. Structuring this as a small number of set-based, metadata-generated statements keeps the process repeatable across every Satellite in the solution.

This pattern applies to SQL-based Satellite loads in the integration layer. The unitemporal form applies when history is tracked by inscription timestamp only; the bitemporal form applies when an additional state timeline (state from / state before timestamps) is maintained, and incoming records can affect time periods that are already recorded.

The Satellite load consists of a preparation step followed by the load itself.

The incoming data is shaped into a clean change set before it is compared against the target. The inner selection gathers the records and derives the supporting attributes (surrogate key, checksum, change data indicator); a compacting preparation and filter then remove records that do not represent a genuine change, so only meaningful changes are carried forward:

/*******************************************************************************
* Data Engine Thinking
*******************************************************************************
*
* Purpose:
* - SQL example for Preparing the Satellite.
*
* Disclaimer:
* - See disclaimer.md in the repository root.
*
******************************************************************************/
SELECT
<compacting selection>
FROM
(
SELECT
<compacting preparation>
FROM
(
SELECT
<inner selection>
FROM
) compacting_preparation
) compacting_selection
WHERE <compacting filter>

The unitemporal load inserts the prepared changes that are not yet recorded in the target Satellite:

  • A LEFT OUTER JOIN on the surrogate key, inscription timestamp, and inscription record identifier prevents reprocessing of records that have already been loaded.
  • A ROW_NUMBER per surrogate key establishes the order of the incoming changes.
  • The most recent incoming change per key (KEY_ROW_NUMBER = 1) is compared against the current record in the target — by checksum, and by change data indicator when the checksums match — so an unchanged record is not inserted again.
  • Older incoming changes (KEY_ROW_NUMBER != 1) are inserted as-is, since they precede the change that was just evaluated.
/*******************************************************************************
* Data Engine Thinking
*******************************************************************************
*
* Purpose:
* - SQL example for Loading an unitemporal Satellite.
*
* Disclaimer:
* - See disclaimer.md in the repository root.
*
******************************************************************************/
INSERT INTO [Sat Customer]
(
[CUSTOMER_SK],
[Inscription_Timestamp],
[Inscription_Record_Id],
[State_From_Timestamp],
[State_Before_Timestamp],
[Change_Data_Indicator],
[Checksum],
[Audit_Trail_Id],
[CUSTOMER_FIRST_NAME],
[CUSTOMER_DATE_OF_BIRTH]
)
SELECT
[…]
FROM
(
SELECT
compacting_selection.[CUSTOMER_SK],
compacting_selection.[Inscription_Timestamp],
compacting_selection.[Inscription_Record_Id],
compacting_selection.[State_From_Timestamp],
compacting_selection.[State_Before_Timestamp],
compacting_selection.[Change_Data_Indicator],
compacting_selection.[Checksum],
Compacting_selection.[Audit_Trail_Id],
compacting_selection.[CUSTOMER_FIRST_NAME],
compacting_selection.[CUSTOMER_DATE_OF_BIRTH],
CAST(ROW_NUMBER() OVER (
PARTITION BY compacting_selection.[CUSTOMER_SK]
ORDER BY compacting_selection.[CUSTOMER_SK],
compacting_selection.[Inscription_Timestamp],
compacting_selection.[Inscription_Record_Id]
) AS INT) AS KEY_ROW_NUMBER,
target_object.[Checksum] AS [Lookup_Checksum],
target_object.[Change_Data_Indicator] AS [Lookup_Change_Data_Indicator]
FROM
(
<Compacting selection>
) compacting_selection
-- Prevent reprocessing
LEFT OUTER JOIN [Sat Customer] target_object
ON compacting_selection.[CUSTOMER_SK] = target_object.[CUSTOMER_SK]
AND compacting_selection.[Inscription_Timestamp] = target_object.[Inscription_Timestamp]
AND compacting_selection.[Inscription_Record_Id] = target_object.[Inscription_Record_Id]
WHERE satellite.[CUSTOMER_SK] IS NULL
) final
-- Change merging
WHERE
(
KEY_ROW_NUMBER = 1
AND (( [Checksum] != [Lookup_Checksum] )
-- The checksums are different
OR ( [Checksum] = [Lookup_Checksum]
AND [Change_Data_Indicator] != [Lookup_Change_Data_Indicator])
-- The checksums are the same but the CDC is different
)
)
OR
(
-- It's not the most recent change in the set, so the record can be inserted as-is
KEY_ROW_NUMBER != 1
)

The bitemporal load additionally maintains the state timeline. An incoming record can overlap time periods that are already recorded, in which case the existing records must be split so the timeline remains continuous:

  • The NewRecords set contains the prepared changes, guarded against reprocessing.
  • Existing records that overlap an incoming record on the state timeline are identified using their interval relationships.
  • For each overlap, the existing record is duplicated to preserve the portion of its period to the right and to the left of the incoming change, with the state from / state before timestamps adjusted accordingly.
  • The final insert combines the new records with the preserved left and right portions of the affected existing records.
/*******************************************************************************
* Data Engine Thinking
*******************************************************************************
*
* Purpose:
* - SQL example for Loading an bitemporal Satellite.
*
* Disclaimer:
* - See disclaimer.md in the repository root.
*
******************************************************************************/
WITH
NewRecords AS
(
SELECT […]
(
<compacting selection>
)
LEFT JOIN
<prevent reprocessing>
)
, ExistingRight
(
All new records, and all records from the target object with overlapping time periods
SELECT […] FROM <NewRecords>
UNION
SELECT […] FROM <target data object with overlaps>
)
, PreserveRightPartOfExistingRecords AS
(
-- Duplicate the existing record, preserving the right portion
SELECT
NewRecords.[CUSTOMER_SK]
-- Use the new inscription timestamp and record id
,NewRecords.[Inscription_Timestamp]
,NewRecords.[Inscription_Record_Id]
-- Replace the existing state from timestamp with new state before timestamp
,COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') AS [State_From_Timestamp]
-- Keep the existing state before timestamp
,ExistingRight.[State_Before_Timestamp]
-- Map the available existing data
,ExistingRight.<columns>
-- Order the records - Only the most recent record is of interest
,ROW_NUMBER()
OVER (PARTITION BY ExistingRight.[CODE]
,NewRecords.[State_Before_Timestamp] -- New State From Timestamp
ORDER BY ExistingRight.[Inscription_Timestamp] DESC
,ExistingRight.[Inscription_Record_Id] DESC)
AS rownum
FROM <NewRecords>
JOIN <ExistingRight>
ON NewRecords.[CUSTOMER_SK] = ExistingRight.[CUSTOMER_SK]
WHERE
-- Compare only with previous records
NewRecords.[Inscription_Timestamp] > ExistingRight.[Inscription_Timestamp]
-- Find Allen Relationship (right)
AND COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') > ExistingRight.[State_From_Timestamp]
AND COALESCE(NewRecords.State_Before_Timestamp,'9999-12-31') < ExistingRight.[State_Before_Timestamp]
)
, ExistingLeft
(
SELECT […] FROM <PreserveRightPartOfExistingRecords>
UNION
SELECT […] FROM <NewRecords>
UNION
SELECT […] FROM <target data object with overlaps>
)
, PreserveLeftPartOfExistingRecords AS
(
-- Duplicate the existing record, preserving the left portion
SELECT
NewRecords.[CUSTOMER_SK]
-- Use new inscription timestamp and record id
,NewRecords.[Inscription_Timestamp]
,NewRecords.[Inscription_Record_Id]
-- Keep the existing state before timestamp
,ExistingLeft.[State_From_Timestamp] AS [State_From_Timestamp]
-- Replace the existing state before timestamp with new state from timestamp
,NewRecords.[State_From_Timestamp] AS [State_Before_Timestamp]
-- Map the available existing data
,ExistingLeft. <available columns>
,[…]
-- Order the records - Only the most recent record is of interest
,ROW_NUMBER()
OVER (PARTITION BY ExistingLeft.[CODE]
,NewRecords.State_From_Timestamp -- New State Before Timestamp
ORDER BY ExistingLeft.[Inscription_Timestamp] DESC
,ExistingLeft.[Inscription_Record_Id] DESC)
AS rownum
FROM <NewRecords>
JOIN <ExistingLeft>
ON NewRecords.[CUSTOMER_SK] = ExistingLeft.[CUSTOMER_SK]
WHERE
-- Compare only with previous records
NewRecords.Inscription_Timestamp > ExistingLeft.Inscription_Timestamp
-- Find Allen Relationship (left)
AND NewRecords.State_From_Timestamp > ExistingLeft.State_From_Timestamp
AND NewRecords.State_From_Timestamp < ExistingLeft.State_Before_Timestamp
)
INSERT INTO <target data object>
(
<column list>
)
SELECT <PreserveLeftPartOfExistingRecords>
WHERE rownum = 1
UNION
SELECT <PreserveRightPartOfExistingRecords>
WHERE rownum = 1
UNION
SELECT <NewRecords>
  • Generate the statements from metadata: the data object names, surrogate key, attribute columns, checksum definition, and temporal columns are supplied by the solution metadata, so the same templates serve every Satellite.
  • Use a checksum across the descriptive attributes to detect changes without comparing every column individually, and keep the checksum definition consistent across the solution.
  • Keep the change data indicator in the comparison: when checksums match but the indicator differs (for example, a delete followed by a re-insert of identical values), the record still represents a change.
  • Compare only the most recent incoming change per key against the target. Earlier changes in the same batch are by definition new history and can be inserted directly — this is what the ROW_NUMBER split achieves.
  • Guard against reprocessing on the full logical key (surrogate key, inscription timestamp, inscription record identifier), so reruns of the same batch do not duplicate records.
  • For bitemporal loads, derive the timeline adjustments deterministically from the interval relationships between incoming and existing records, and only keep the most recent duplicate per affected period (rownum = 1).
  • End-dating (deriving the state before timestamp from the following record) can be maintained as part of the load or derived afterwards; either way, derive it consistently.
  • The Satellite is insert-only: changes, including logical deletes, are recorded as new records. Corrections to the timeline in the bitemporal form are also implemented as inserts of adjusted duplicates, which preserves auditability at the cost of additional records.
  • Record compacting during preparation discards records that do not represent genuine changes. The definition of “genuine change” — which attributes participate in the checksum — therefore determines what history is retained.
  • The bitemporal pattern is markedly more complex than the unitemporal one. Apply it only where a second timeline is genuinely required; a unitemporal Satellite is sufficient when the inscription order is the only timeline of interest.
  • Set-based timeline splitting reads and joins the target Satellite. On large Satellites, indexing on the surrogate key and temporal columns is essential to keep the overlap detection efficient.