Skip to content

Solution Pattern - SQL Server Family - Persistent Staging Area

This Solution Pattern shows how to load a Persistent Staging Area (PSA) data object from the Landing Area with a single, set-based, insert-only SQL statement. It is a metadata-driven implementation of the Design Pattern - Generic - Persistent Staging Area.

The PSA records every change presented to the data solution, in arrival order, without ever updating or deleting. The load must therefore do three things reliably: never reprocess what has already been recorded, never record a non-change, and preserve the arrival order of genuine changes — all while remaining safe to re-run.

This pattern applies to SQL-based PSA loads from a Landing Area that delivers a data delta with the staging layer’s standard attributes (inscription timestamp, inscription record identifier, change data indicator, and checksum). The same template serves every PSA object, with the object names and key columns supplied by the solution metadata.

The statement inserts the Landing Area records that represent genuine changes:

  • A LEFT OUTER JOIN against the target PSA on the natural key, inscription timestamp, and inscription record identifier prevents reprocessing records that have already been loaded.
  • A second lookup retrieves the most recently arrived PSA record per key that is not logically deleted, supplying the checksum and change data indicator to compare against.
  • A ROW_NUMBER per natural key establishes the arrival order of the incoming records.
  • The change-merging filter keeps the first incoming record per key only when it differs from the PSA’s current state — by checksum, or by change data indicator when the checksums match — and keeps all later incoming records as-is, since they follow a record that was just established as a change.

The generic template:

/*******************************************************************************
* Data Engine Thinking
*******************************************************************************
*
* Purpose:
* - SQL example for Solution pattern example.
*
* Disclaimer:
* - See disclaimer.md in the repository root.
*
******************************************************************************/
INSERT INTO <Persistent Staging Table> (<Columns>)
SELECT <Columns>, Lookup_Checksum, Lookup_Change_Data_Indicator, Key_Row_Number, <Framework Columns>
FROM
(
SELECT
LDA.<Columns>
COALESCE(maxsub.[Lookup_Checksum],'N/A') AS [Lookup_Checksum],
COALESCE(maxsub.[Change_Data_Indicator],'C') AS [Lookup_Change_Data_Indicator],
ROW_NUMBER() OVER
(
PARTITION BY LDA.<Natural Key(s)> ORDER BY LDA.<Natural Key(s)>, LDA.[Inscription_Timestamp], LDA.[Inscription_Timestamp]
) AS [Key_Row_Number]
FROM <Landing Area Table> LDA
-- Prevent reloading already processed data
LEFT OUTER JOIN <PSA Table> PSA
ON PSA.<Natural Key(s)> = LDA.<Natural Key(s)>
AND PSA.[Inscription_Timestamp] = LDA.[Inscription_Timestamp]
AND PSA.[Inscription_Record_Id] = LDA.[Inscription_Record_Id]
-- Query the most recently arrived PSA record which is not logically deleted.
LEFT OUTER JOIN
(
SELECT
A.<Natural Key(s)>,
A.[Lookup_Checksum] AS [Lookup_Lookup_Checksum],
A.[Change_Data_Indicator] AS [Lookup_Change_Data_Indicator]
FROM <Persistent Staging Table> A
JOIN
(
SELECT
<Natural Key(s)>,
[Inscription_Timestamp] AS [Max_Inscription_Timestamp],
MAX([Inscription_Record_Id]) OVER (PARTITION BY <Natural Key(s), [Inscription_Timestamp] ORDER BY <Natural Key(s), [Inscription_Timestamp]) AS [Max_Inscription_Record_Id],
ROW_NUMBER() OVER (PARTITION BY <Natural Key(s) ORDER BY <Natural Key(s), [Inscription_Timestamp] DESC) AS [Max_RowNum]
FROM <Persistent Staging Table>
) C
ON A.<Natural Key(s)> = C.<Natural Key(s)>
AND A.[Inscription_Timestamp] = C.[Max_Inscription_Timestamp]
AND A.[Inscription_Record_Id] = C.[Max_Inscription_Record_Id]
AND 1 = C.[Max_RowNum]
WHERE A.[Lookup_Change_Data_Indicator] != 'D'
) maxsub
ON LDA.<Natural Key(s)> = maxsub.<Natural Key(s)>
WHERE PSA.<Natural Key(s)> IS NULL
) sub
WHERE
[Key_Row_Number] != 1
OR
(
[Key_Row_Number] = 1 AND
(
([Checksum] != [Lookup_Checksum]) OR
(
[Checksum] = [Lookup_Lookup_Checksum] AND
[Change_Data_Indicator] != [Lookup_Change_Data_Indicator]
)
)
)

A worked, runnable example for the FastChangeCo case:

/*******************************************************************************
* Data Engine Thinking
*******************************************************************************
*
* Purpose:
* - SQL example for Solution Pattern Example - PSA.
*
* Disclaimer:
* - See disclaimer.md in the repository root.
*
******************************************************************************/
--------------------------------------------------------------------------------
-- PSA solution pattern
-- Version in book
INSERT INTO PSA.EntityClass (
InscriptionTimestamp
,InscriptionRecordID
,SourceTimestamp
,ChangeDataIndicator
,AuditTrailID
,ModelCode
,EntityClassCode
,EntityClassName
,Checksum
)
SELECT
InscriptionTimestamp
,InscriptionRecordID
,SourceTimestamp
,ChangeDataIndicator
,AuditTrailID
,ModelCode
,EntityClassCode
,EntityClassName
,Checksum
FROM (
SELECT
lda.InscriptionTimestamp
,lda.InscriptionRecordID
,lda.SourceTimestamp
,lda.ChangeDataIndicator
,lda.AuditTrailID -- take from LDA for better testing
,lda.ModelCode
,lda.EntityClassCode
,lda.EntityClassName
,lda.Checksum
,COALESCE(submaxpsa.Checksum, 'N/A') AS submaxpsaChecksum
,COALESCE(submaxpsa.ChangeDataIndicator, 'C') AS submaxpsaChangeDataIndicator -- Better indicator value due to datatype missmatch - Testcase (4)
,ROW_NUMBER() OVER (PARTITION BY lda.ModelCode
,lda.EntityClassCode
ORDER BY lda.ModelCode
,lda.EntityClassCode
,lda.InscriptionTimestamp)
AS rownum
FROM LDA.EntityClass AS lda
-- Prevent reprocessing
LEFT OUTER JOIN PSA.EntityClass AS psa
ON lda.ModelCode = psa.ModelCode
AND lda.EntityClassCode = psa.EntityClassCode
AND lda.InscriptionTimestamp = psa.InscriptionTimestamp
AND lda.InscriptionRecordID = psa.InscriptionRecordID
-- Most recently arrived PSA record
LEFT OUTER JOIN (
SELECT
psa.ModelCode
,psa.EntityClassCode
,psa.Checksum
,psa.ChangeDataIndicator
FROM PSA.EntityClass AS psa
INNER JOIN
(
SELECT
MAX(InscriptionTimestamp) AS maxInscriptionTimestamp
,MAX(InscriptionRecordID) AS maxInscriptionRecordID
,ModelCode
,EntityClassCode
FROM PSA.EntityClass
GROUP BY ModelCode
,EntityClassCode
) AS maxpsa
ON psa.ModelCode = maxpsa.ModelCode
AND psa.EntityClassCode = maxpsa.EntityClassCode
AND psa.InscriptionTimestamp = maxpsa.maxInscriptionTimestamp
AND psa.InscriptionRecordID = maxpsa.maxInscriptionRecordID
WHERE psa.ChangeDataIndicator != 'D' -- Don't re-compare already deleted rows.
) AS submaxpsa
ON lda.ModelCode = submaxpsa.ModelCode
AND lda.EntityClassCode = submaxpsa.EntityClassCode
WHERE psa.EntityClassCode IS NULL -- Prevent reprocessing
AND psa.ModelCode IS NULL -- Prevent reprocessing
) AS psalda
WHERE
-- all subsequent changes fo a bk
rownum != 1 OR
-- check on first/oldest change for a bk
(rownum = 1 AND (
(Checksum != submaxpsaChecksum) OR
(Checksum = submaxpsaChecksum AND ChangeDataIndicator != submaxpsaChangeDataIndicator))
)
;
SELECT *
FROM PSA.EntityClass
ORDER BY ModelCode,EntityClassCode, InscriptionTimestamp, InscriptionRecordID
;
--------------------------------------------------------------------------------
-- Test cases:
-- 1) Same oldest record in LDA as max record in PSA not inserted
-- 2) maxInscriptionTimestamp and maxInscriptionRecordID do not corelate since a lower InscriptionTimestamp contains maxInscriptionRecordID
-- 3) rownum - if InscriptionRecordID not sorted in table
-- 4) reinsert deleted record
  • Generate the statement from metadata: the Landing Area and PSA object names, the natural key, and the attribute columns are supplied by the solution metadata.
  • Guard against reprocessing on the full logical key — natural key, inscription timestamp, and inscription record identifier — so reruns of the same batch are harmless.
  • Compare only the first incoming record per key against the PSA’s current state; later records in the same batch are by definition new history.
  • Use the checksum for change detection across the attributes, and keep the change data indicator in the comparison so a delete followed by a re-insert of identical values is still recorded.
  • Exclude logically deleted records from the comparison baseline, so a key that re-appears after a delete is recorded as a new change.
  • The load is insert-only and idempotent: it can run continuously and in parallel with downstream processing, in line with the PSA’s role as a transaction log.
  • Change detection depends on a consistent checksum definition; differences in trimming, formatting, or NULL handling produce false changes or missed changes.
  • The most-recent-record lookup reads the target PSA per key. On large PSA objects, indexing on the natural key and inscription columns keeps this efficient.