Comparing Results using T-SQL

Usually, in data migration or ETL projects there is a need for comparing source data with that of the destination once the data is loaded. The challenge is in comparing each field from the source with its counterpart on the destination, for each record – and you can have a whole lot of fields.

The query that you would write to do the comparison, would be a large tangle of comparison operators – even for tables of a few columns, increasing in complexity as the number of columns grow.

A simpler solution would be to first convert the source and the destination into tables of just three fields: The Key field, the Field field and the Value field. (In case you have a multi-column key, you would obviously have more than three fields – but the principle is the same)

Consider the following source:

Table 1: Source
Seq_code Product_Name Color Unit_Price Category_Name Weight
C01 Crystal parrot None 130.00 ORNAMENT .75
C02 Silver pirate Silver 35.00 ORNAMENT 1
C03 Pearl necklace White 220.00 JEWELRY .35
C05 Ceramic dog Blue 15.00 ORNAMENT .35

and then consider the following destination (assuming that data is already loaded through some sort of ETL process):

Table 2: Destination
Code Name Color Price Category Weight
C01 Crystal parrot None 130.00 Ornament 75
C02 Silver pirate Silver 35.00 Ornament 1
C03 Pearl necklace White 220.00 Jewelry 35
C05 Ceramic dog Blue 15.00 Ornament 35

The Need

You need to compare these two, in order to pull out any discrepancies.

The Plan

You first mold each of these (i.e. the source and destination) to look like this:

Table 3: Pivoted Destination
Key Field Value
C01 Name Crystal parrot
C01 Color None
C01 Price 130.00
C01 Category Ornament
C01 Weight 75
C02 Name Silver pirate
C05 Weight 35

Here, you have the Key field (Code) of each record repeated for each field of that record along with the name of the field and the value of the field.

Before you do the molding you need to ensure that the field names in the original tables are the same. I.e. Seq_code in the source and Code in the destination should be changed to have the same name: Code, maybe.

Hence, the molded source should look like this:

Table 4: Pivoted Source
Key Field Value
C01 Name Crystal parrot
C01 Color None
C01 Price 130.00
C01 Category ORNAMENT
C01 Weight .75
C02 Name Silver pirate
C05 Weight .35

You now do the comparison using INNER JOIN on the Key + Field columns, and a WHERE to filter out the values that do not match. The molding can be done using UNPIVOT, so that there is no need to create any intermediate tables.

The Code

WITH
-- Pivot the source data
SourcePivot
AS
(
	SELECT	[Code],
			[Field],
			[Value]
	FROM	(
				SELECT	[Code] =		[Seq_Code],
						[Name] =		[Product_Name],
						[Color],
						[Price] =		CAST([Unit_Price] AS VARCHAR(20)),	-- Data type conversion since all
						[Category] =	[Category_Name],					-- fields have to be of the same
						[Weight] =		CAST([Weight] AS VARCHAR(20))		-- data type.
				FROM	Source
			) AS Source
			UNPIVOT
			(
				[Value] FOR [Field] IN ([Name], [Color], [Price], [Category], [Weight])
			) AS UnPvt
),

-- Pivot the destination data
DestinationPivot
AS
(
	SELECT	[Code],
			[Field],
			[Value]
	FROM	(
				SELECT	[Code],
						[Name],
						[Color],
						[Price] =	CAST([Price] AS VARCHAR(20)),
						[Category],
						[Weight] =	CAST(CAST([Weight] AS DECIMAL(8, 2)) AS VARCHAR(20))
				FROM	Destination
			) AS Destination
			UNPIVOT
			(
				[Value] FOR [Field] IN ([Name], [Color], [Price], [Category], [Weight])
			) AS UnPvt
)

-- Query for discrepancies
SELECT	SP.[Code],
		SP.[Field],
		SP.[Value] AS [SourceValue],
		DP.[Value] AS [DestinationValue]
FROM	SourcePivot SP
		INNER JOIN DestinationPivot DP
			ON SP.[Code] = DP.[Code]
			AND SP.[Field] = DP.[Field]
WHERE	SP.[Value] <> DP.[Value] COLLATE SQL_Latin1_General_CP1_CS_AS -- Use a case-sensitive collation on case insensitive
																	  -- tables/datbase if case difference is a concern.

The Result

Table 5: Comparison Result
Code Field SourceValue DestinationValue
C01 Category ORNAMENT Ornament
C01 Weight 0.75 75
C02 Category ORNAMENT Ornament
C03 Category JEWELRY Jewelry
C03 Weight 0.35 35
C05 Category ORNAMENT Ornament
C05 Weight 0.35 35

<<Download sample code>>

Advertisements

Unpivot Transformation with Multiple Destination Columns

The need to unpivot data during an ETL operation is quite common. One method to perform the unpivoting is writing the source query in such a way that the data starts off unpivoted using the UNPIVOT T-SQL statement in SQL Server 2005 and later. This however would obviously not work out on a source that does not support unpivoting such as a text file or Office Excel or even older versions of popular RDBMSs. The other method of course is to use the Unpivot Transformation in Integration Services.

The Unpivot Transformation is quite a simple component, especially if you need to unpivot a bunch of columns to a single destination column. Unpivoting to multple destination columns however, seems a tad challenging the first time round, at least for me it did. Until I figured out how to. This post is all about unpivoting a columns to multiple destination columns.

Consider this scenario. A simple Product table (which I shall call Product_Source this point forward):

Source Table (Product_Source)

Needs to be transformed into the following Product table (which I shall call Product_Destination this point forward):

Destination Table (Product_Destination)

Configuring the Unpivot transformation for this would be quite simple, and would look like this:

Unpivot Transformation for Single Destinations

The problem starts when you need the Product_Destination table looking like this, with multiple destination columns (Quantity and Price):

Destination Table (Product_Destination) with Multiple Destination Columns

You add the price columns as well (as in the image below) to the Unpivot transformation,

Unpivot Transformation for Multiple Destination Columns

and get the following validation error (encountered by a lot of people):

PivotKeyValue is not valid. In an UnPivot transform with more than one unpivoted DestinationColumn, the set of PivotKeyValues per destination must match exactly.

The reason is this. If you look closely at the Pivot Key Value column of the Unpivot transformation you would notice that for the Quantity destination column, the set of values are ‘Quantity1’, ‘Quantity2’ and ‘Quantity3’. Whereas, for the Price destination column, the set of values are ‘Price1’, ‘Price2’ and ‘Price3’, which are clearly not matching. The solution to the problem here is quite trivial. Change the default values of the Pivot Key Value column of the offending destination column to match the values of the other destination columns. Like this:

Mapping for Unpivot Transformation for Multiple Destination Columns

And you’re good to go!

I’ve attached a sample solution of this scenario, so that you too could try it out first hand.