Solution Pattern - SQL Server Family - Compacting as a Post-Process
Purpose
Section titled “Purpose”This Solution Pattern shows how to remove redundant records from already-stored time-variant data in a single set-based pass — rows whose values are identical to the preceding record for the same key, and therefore represent no actual change. This is record condensing applied as a post-process.
Motivation
Section titled “Motivation”After loading — for example into the Persistent Staging Area — or after combining data objects, a data set can contain consecutive records that carry no genuine change: the same values and the same change data indicator, separated only by a later timestamp. These rows add storage and obscure the real change history without conveying anything.
Because the data is already stored, the column scope and the timeline order are both known. That makes it possible to identify the redundant rows by comparing each record with its predecessor and remove them in one pass, without re-deriving anything.
Unlike Functional Compacting, which reduces a timeline by a rule about time and can discard real changes, this pattern removes only rows that represent no change at all — so it is lossless with respect to meaningful history.
Applicability
Section titled “Applicability”This pattern applies to SQL-based, stored time-variant data objects (the PSA in particular) where redundant no-change records should be removed after the fact. The same checksum comparison can also be expressed as a filter over a query result when a non-destructive form is required.
Structure
Section titled “Structure”The statement compares each record with the previous one for the same key and deletes the duplicates:
- A CTE computes a checksum over the business key and the tracked columns, using the solution’s standard derivation (trim, convert,
NULLsentinel, delimiter,HASHBYTES). LAGover the checksum and the change data indicator, partitioned by key and ordered by key and timestamp, fetches the value of the preceding record.- A row is redundant — and is deleted — when both its checksum and its change data indicator equal those of the preceding record. The first record per key has no predecessor (
LAGreturnsNULL) and is always kept.
/******************************************************************************* * Data Engine Thinking ******************************************************************************* * * Purpose: * - Record compacting as a post-process: because the data is already stored, * the column scope and timeline order are known, so redundant rows can be * removed in a single pass. * * Disclaimer: * - See disclaimer.md in the repository root. Placeholders such as <Key>, * <Column 1> and <table_name> must be substituted. * ******************************************************************************/
WITHCompactingCTE AS( SELECT HASHBYTES('SHA1', COALESCE(TRIM(CONVERT(NVARCHAR(MAX), <Key>)), '~!@|N/A') + '\@|' + COALESCE(TRIM(CONVERT(NVARCHAR(MAX), <Column 1>)), '~!@|N/A') + '\@|' + COALESCE(TRIM(CONVERT(NVARCHAR(MAX), <Column 2>)), '~!@|N/A') + '\@|' ) AS [Checksum], * FROM <table_name>),Subselect AS( SELECT *, LAG([Checksum]) OVER (PARTITION BY <Key> ORDER BY <Key>,<Timestamp>) AS [Next_Checksum], LAG([Change_Data_Indicator]) OVER (PARTITION BY <Key> ORDER BY <Key>,<Timestamp>) AS [Next_Change_Data_Indicator] FROM CompactingCTE)DELETE FROM SubselectWHERE [Checksum] = [Next_Checksum]AND [Change_Data_Indicator] = [Next_Change_Data_Indicator];To use this non-destructively over a result set, keep the inverse condition instead of deleting: retain rows where the checksum or the change data indicator differs from the preceding record (or where there is no predecessor).
Implementation guidelines
Section titled “Implementation guidelines”- Generate the checksum from metadata: the business key and tracked columns, with trimming, conversion,
NULLsentinel, delimiter, and hash algorithm identical to the rest of the solution, so the same definition of “no change” is applied everywhere. - Keep the change data indicator in the comparison: a record whose values match the previous one but whose indicator differs (for example, a logical delete followed by a re-insert of identical values) is a genuine change and must be retained.
- Order strictly by key and timestamp so each record is compared with its true predecessor; add a tie-breaker (such as the inscription record identifier) when timestamps are not unique per key.
- Index the key and timestamp columns supporting the
LAGpartition and order. - For a non-destructive context, apply the inverse filter as a
SELECTrather than aDELETE, so the source remains intact and the condensed set is produced on read.
Considerations and consequences
Section titled “Considerations and consequences”- The
DELETEform is destructive: it removes rows from the stored object in place. Run it only where the object is itself derived and reproducible, or retain the source of truth elsewhere. - It is lossless with respect to change: only rows that represent no change are removed, so the meaningful history is preserved exactly. This is the key difference from functional compacting.
- Correctness depends on a consistent checksum. Differences in trimming, formatting, or
NULLhandling produce false matches (real changes removed) or false differences (redundancy retained). - The window functions pass over the full object; on large data sets, ensure the supporting indexes exist and consider batching the delete.
Related patterns
Section titled “Related patterns”- Solution Pattern - SQL Server Family - Functional Compacting — the complementary approach: reducing a timeline by a rule about time, rather than removing only unchanged rows.
- Design Pattern - Generic - Loading Landing Area Tables Using Record Condensing
- Design Pattern - Generic - Using Checksums for Row Comparison
- Solution Pattern - SQL Server Family - Persistent Staging Area