The 5 Year Difference

Data Warehousing/Data Marting nearly always required the need for ETL. And most of the time it was the ETLing of data to create dimensions. And the one type of dimension I have worked with most is the type now known as Slowly Changing Dimension (this term was not used much before the advent of SQL Server 2005, but still).

The ETL tool that I have been using, Data Transformation Services of SQL Server 2000 was a decent enough tool for the purpose. I was quite content with it until; along came SQL Server 2005 carrying with it DTS’s new form: SSIS (SQL Server Integration Services). SSIS gives you things which could only be dreamt when working with DTS.

Let me now explain this magic through, how I would go about ETLing data from the source to a destination using each of the two technologies. First, DTS and then SSIS.

The source is a SQL Server 2000 table of Products already arranged in a parent-child hierarchy, and looking like this:



Th
e destination for the DTS process will be a SQL Server 2000 table laid out in a structure which would later make up a star schema, like this:


while the destination for the SSIS process will be a SQL Server 2005 table laid out just like the one above, like this:

while the data after the ETL in both cases should look like this:

DTS Steps (SQL Server 2000)

  • Create a new DTS package under Data Transformation Services in SQL Server Enterprise Manager.
  • Create two database connections.
    • One for the source
    • The other for the destination
  • Add a Data Driven Query Task.
  • Write a SQL statement to build a nice result set to transfer, under Source tab.
  • Set the destination
  • Create a Lookup so that we could check whether the record which is being transfered is a new record or an existing record.
  • Write an ActiveX script with logic to check if the record exists (using the Lookup created earlier) in the destination and appropriately call the Insert Query or the Update Query. (Takes quite a long time)
  • Write the Insert Query and the Update Query and set their properties.

SSIS Steps (SQL Server 2005)

  • Create a new Integration Services project in Business Intelligence Development Studio.
  • Create two connection managers for the source and destination.
  • Add a Data Flow Task.
  • Add a Data Flow Source to the task and set the source to it.
  • Add a Slowly Changing Dimension object. (The wizard opens up to unveil the magic!)
  • Write a SQL statement to build a nice result set to transfer.
  • Continue with the wizard and voila!!! No ActiveX scripts, No lookups to worry about, No Insert Queries nor Update Queries!!! – Your ETL is done! Basically the only thing you wrote was the source query , which again can be built using the Query Builder.

So now, I can build my dimensions quickly and that also without writing any code; just a series of clicks and drags!

Advertisements