Usually, in data migration or ETL projects there is a need for comparing source data with that of the destination once the data is loaded. The challenge is in comparing each field from the source with its counterpart on the destination, for each record – and you can have a whole lot of fields.
The query that you would write to do the comparison, would be a large tangle of comparison operators – even for tables of a few columns, increasing in complexity as the number of columns grow.
A simpler solution would be to first convert the source and the destination into tables of just three fields: The Key field, the Field field and the Value field. (In case you have a multi-column key, you would obviously have more than three fields – but the principle is the same)
Consider the following source:
Seq_code | Product_Name | Color | Unit_Price | Category_Name | Weight |
C01 | Crystal parrot | None | 130.00 | ORNAMENT | .75 |
C02 | Silver pirate | Silver | 35.00 | ORNAMENT | 1 |
C03 | Pearl necklace | White | 220.00 | JEWELRY | .35 |
C05 | Ceramic dog | Blue | 15.00 | ORNAMENT | .35 |
and then consider the following destination (assuming that data is already loaded through some sort of ETL process):
Code | Name | Color | Price | Category | Weight |
C01 | Crystal parrot | None | 130.00 | Ornament | 75 |
C02 | Silver pirate | Silver | 35.00 | Ornament | 1 |
C03 | Pearl necklace | White | 220.00 | Jewelry | 35 |
C05 | Ceramic dog | Blue | 15.00 | Ornament | 35 |
The Need
You need to compare these two, in order to pull out any discrepancies.
The Plan
You first mold each of these (i.e. the source and destination) to look like this:
Key | Field | Value |
C01 | Name | Crystal parrot |
C01 | Color | None |
C01 | Price | 130.00 |
C01 | Category | Ornament |
C01 | Weight | 75 |
C02 | Name | Silver pirate |
… | … | … |
C05 | Weight | 35 |
Here, you have the Key field (Code) of each record repeated for each field of that record along with the name of the field and the value of the field.
Before you do the molding you need to ensure that the field names in the original tables are the same. I.e. Seq_code in the source and Code in the destination should be changed to have the same name: Code, maybe.
Hence, the molded source should look like this:
Key | Field | Value |
C01 | Name | Crystal parrot |
C01 | Color | None |
C01 | Price | 130.00 |
C01 | Category | ORNAMENT |
C01 | Weight | .75 |
C02 | Name | Silver pirate |
… | … | … |
C05 | Weight | .35 |
You now do the comparison using INNER JOIN on the Key + Field columns, and a WHERE to filter out the values that do not match. The molding can be done using UNPIVOT, so that there is no need to create any intermediate tables.
The Code
WITH -- Pivot the source data SourcePivot AS ( SELECT [Code], [Field], [Value] FROM ( SELECT [Code] = [Seq_Code], [Name] = [Product_Name], [Color], [Price] = CAST([Unit_Price] AS VARCHAR(20)), -- Data type conversion since all [Category] = [Category_Name], -- fields have to be of the same [Weight] = CAST([Weight] AS VARCHAR(20)) -- data type. FROM Source ) AS Source UNPIVOT ( [Value] FOR [Field] IN ([Name], [Color], [Price], [Category], [Weight]) ) AS UnPvt ), -- Pivot the destination data DestinationPivot AS ( SELECT [Code], [Field], [Value] FROM ( SELECT [Code], [Name], [Color], [Price] = CAST([Price] AS VARCHAR(20)), [Category], [Weight] = CAST(CAST([Weight] AS DECIMAL(8, 2)) AS VARCHAR(20)) FROM Destination ) AS Destination UNPIVOT ( [Value] FOR [Field] IN ([Name], [Color], [Price], [Category], [Weight]) ) AS UnPvt ) -- Query for discrepancies SELECT SP.[Code], SP.[Field], SP.[Value] AS [SourceValue], DP.[Value] AS [DestinationValue] FROM SourcePivot SP INNER JOIN DestinationPivot DP ON SP.[Code] = DP.[Code] AND SP.[Field] = DP.[Field] WHERE SP.[Value] <> DP.[Value] COLLATE SQL_Latin1_General_CP1_CS_AS -- Use a case-sensitive collation on case insensitive -- tables/datbase if case difference is a concern.
The Result
Code | Field | SourceValue | DestinationValue |
C01 | Category | ORNAMENT | Ornament |
C01 | Weight | 0.75 | 75 |
C02 | Category | ORNAMENT | Ornament |
C03 | Category | JEWELRY | Jewelry |
C03 | Weight | 0.35 | 35 |
C05 | Category | ORNAMENT | Ornament |
C05 | Weight | 0.35 | 35 |