Solution Pattern - SQL Server Family - Persistent Staging Area
Purpose
Section titled “Purpose”This Solution Pattern shows how to load a Persistent Staging Area (PSA) data object from the Landing Area with a single, set-based, insert-only SQL statement. It is a metadata-driven implementation of the Design Pattern - Generic - Persistent Staging Area.
Motivation
Section titled “Motivation”The PSA records every change presented to the data solution, in arrival order, without ever updating or deleting. The load must therefore do three things reliably: never reprocess what has already been recorded, never record a non-change, and preserve the arrival order of genuine changes — all while remaining safe to re-run.
Applicability
Section titled “Applicability”This pattern applies to SQL-based PSA loads from a Landing Area that delivers a data delta with the staging layer’s standard attributes (inscription timestamp, inscription record identifier, change data indicator, and checksum). The same template serves every PSA object, with the object names and key columns supplied by the solution metadata.
Structure
Section titled “Structure”The statement inserts the Landing Area records that represent genuine changes:
- A
LEFT OUTER JOINagainst the target PSA on the natural key, inscription timestamp, and inscription record identifier prevents reprocessing records that have already been loaded. - A second lookup retrieves the most recently arrived PSA record per key that is not logically deleted, supplying the checksum and change data indicator to compare against.
- A
ROW_NUMBERper natural key establishes the arrival order of the incoming records. - The change-merging filter keeps the first incoming record per key only when it differs from the PSA’s current state — by checksum, or by change data indicator when the checksums match — and keeps all later incoming records as-is, since they follow a record that was just established as a change.
The generic template:
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Solution pattern example. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
INSERT INTO <Persistent Staging Table> (<Columns>)SELECT <Columns>, Lookup_Checksum, Lookup_Change_Data_Indicator, Key_Row_Number, <Framework Columns>FROM( SELECT LDA.<Columns> COALESCE(maxsub.[Lookup_Checksum],'N/A') AS [Lookup_Checksum], COALESCE(maxsub.[Change_Data_Indicator],'C') AS [Lookup_Change_Data_Indicator], ROW_NUMBER() OVER ( PARTITION BY LDA.<Natural Key(s)> ORDER BY LDA.<Natural Key(s)>, LDA.[Inscription_Timestamp], LDA.[Inscription_Timestamp] ) AS [Key_Row_Number] FROM <Landing Area Table> LDA -- Prevent reloading already processed data LEFT OUTER JOIN <PSA Table> PSA ON PSA.<Natural Key(s)> = LDA.<Natural Key(s)> AND PSA.[Inscription_Timestamp] = LDA.[Inscription_Timestamp] AND PSA.[Inscription_Record_Id] = LDA.[Inscription_Record_Id] -- Query the most recently arrived PSA record which is not logically deleted. LEFT OUTER JOIN ( SELECT A.<Natural Key(s)>, A.[Lookup_Checksum] AS [Lookup_Lookup_Checksum], A.[Change_Data_Indicator] AS [Lookup_Change_Data_Indicator] FROM <Persistent Staging Table> A JOIN ( SELECT <Natural Key(s)>, [Inscription_Timestamp] AS [Max_Inscription_Timestamp], MAX([Inscription_Record_Id]) OVER (PARTITION BY <Natural Key(s), [Inscription_Timestamp] ORDER BY <Natural Key(s), [Inscription_Timestamp]) AS [Max_Inscription_Record_Id], ROW_NUMBER() OVER (PARTITION BY <Natural Key(s) ORDER BY <Natural Key(s), [Inscription_Timestamp] DESC) AS [Max_RowNum] FROM <Persistent Staging Table> ) C ON A.<Natural Key(s)> = C.<Natural Key(s)> AND A.[Inscription_Timestamp] = C.[Max_Inscription_Timestamp] AND A.[Inscription_Record_Id] = C.[Max_Inscription_Record_Id] AND 1 = C.[Max_RowNum] WHERE A.[Lookup_Change_Data_Indicator] != 'D' ) maxsub ON LDA.<Natural Key(s)> = maxsub.<Natural Key(s)> WHERE PSA.<Natural Key(s)> IS NULL) subWHERE [Key_Row_Number] != 1 OR ( [Key_Row_Number] = 1 AND ( ([Checksum] != [Lookup_Checksum]) OR ( [Checksum] = [Lookup_Lookup_Checksum] AND [Change_Data_Indicator] != [Lookup_Change_Data_Indicator] ) ) )A worked, runnable example for the FastChangeCo case:
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - SQL example for Solution Pattern Example - PSA. * * Disclaimer: * - See disclaimer.md in the repository root. * ******************************************************************************/
---------------------------------------------------------------------------------- PSA solution pattern-- Version in bookINSERT INTO PSA.EntityClass ( InscriptionTimestamp ,InscriptionRecordID ,SourceTimestamp ,ChangeDataIndicator ,AuditTrailID ,ModelCode ,EntityClassCode ,EntityClassName ,Checksum)SELECT InscriptionTimestamp ,InscriptionRecordID ,SourceTimestamp ,ChangeDataIndicator ,AuditTrailID ,ModelCode ,EntityClassCode ,EntityClassName ,ChecksumFROM (SELECT lda.InscriptionTimestamp ,lda.InscriptionRecordID ,lda.SourceTimestamp ,lda.ChangeDataIndicator ,lda.AuditTrailID -- take from LDA for better testing ,lda.ModelCode ,lda.EntityClassCode ,lda.EntityClassName ,lda.Checksum ,COALESCE(submaxpsa.Checksum, 'N/A') AS submaxpsaChecksum ,COALESCE(submaxpsa.ChangeDataIndicator, 'C') AS submaxpsaChangeDataIndicator -- Better indicator value due to datatype missmatch - Testcase (4) ,ROW_NUMBER() OVER (PARTITION BY lda.ModelCode ,lda.EntityClassCode ORDER BY lda.ModelCode ,lda.EntityClassCode ,lda.InscriptionTimestamp) AS rownumFROM LDA.EntityClass AS lda -- Prevent reprocessingLEFT OUTER JOIN PSA.EntityClass AS psa ON lda.ModelCode = psa.ModelCode AND lda.EntityClassCode = psa.EntityClassCode AND lda.InscriptionTimestamp = psa.InscriptionTimestamp AND lda.InscriptionRecordID = psa.InscriptionRecordID -- Most recently arrived PSA recordLEFT OUTER JOIN ( SELECT psa.ModelCode ,psa.EntityClassCode ,psa.Checksum ,psa.ChangeDataIndicator FROM PSA.EntityClass AS psa INNER JOIN ( SELECT MAX(InscriptionTimestamp) AS maxInscriptionTimestamp ,MAX(InscriptionRecordID) AS maxInscriptionRecordID ,ModelCode ,EntityClassCode FROM PSA.EntityClass GROUP BY ModelCode ,EntityClassCode ) AS maxpsa ON psa.ModelCode = maxpsa.ModelCode AND psa.EntityClassCode = maxpsa.EntityClassCode AND psa.InscriptionTimestamp = maxpsa.maxInscriptionTimestamp AND psa.InscriptionRecordID = maxpsa.maxInscriptionRecordID WHERE psa.ChangeDataIndicator != 'D' -- Don't re-compare already deleted rows.) AS submaxpsa ON lda.ModelCode = submaxpsa.ModelCode AND lda.EntityClassCode = submaxpsa.EntityClassCodeWHERE psa.EntityClassCode IS NULL -- Prevent reprocessing AND psa.ModelCode IS NULL -- Prevent reprocessing) AS psaldaWHERE -- all subsequent changes fo a bk rownum != 1 OR -- check on first/oldest change for a bk (rownum = 1 AND ( (Checksum != submaxpsaChecksum) OR (Checksum = submaxpsaChecksum AND ChangeDataIndicator != submaxpsaChangeDataIndicator)) );
SELECT *FROM PSA.EntityClassORDER BY ModelCode,EntityClassCode, InscriptionTimestamp, InscriptionRecordID;---------------------------------------------------------------------------------- Test cases:-- 1) Same oldest record in LDA as max record in PSA not inserted-- 2) maxInscriptionTimestamp and maxInscriptionRecordID do not corelate since a lower InscriptionTimestamp contains maxInscriptionRecordID-- 3) rownum - if InscriptionRecordID not sorted in table-- 4) reinsert deleted recordImplementation guidelines
Section titled “Implementation guidelines”- Generate the statement from metadata: the Landing Area and PSA object names, the natural key, and the attribute columns are supplied by the solution metadata.
- Guard against reprocessing on the full logical key — natural key, inscription timestamp, and inscription record identifier — so reruns of the same batch are harmless.
- Compare only the first incoming record per key against the PSA’s current state; later records in the same batch are by definition new history.
- Use the checksum for change detection across the attributes, and keep the change data indicator in the comparison so a delete followed by a re-insert of identical values is still recorded.
- Exclude logically deleted records from the comparison baseline, so a key that re-appears after a delete is recorded as a new change.
Considerations and consequences
Section titled “Considerations and consequences”- The load is insert-only and idempotent: it can run continuously and in parallel with downstream processing, in line with the PSA’s role as a transaction log.
- Change detection depends on a consistent checksum definition; differences in trimming, formatting, or
NULLhandling produce false changes or missed changes. - The most-recent-record lookup reads the target PSA per key. On large PSA objects, indexing on the natural key and inscription columns keeps this efficient.