Populating an Intermediate Fact Table


This is but one scenario where and how we use an intermediate fact table. Intermediate fact tables are used in cases where you have a many-to-many relationship between two dimensions (also known as Many-to-many dimensions). To introduce a simple example:

Imagine a bank. Account holders. Bank accounts. A bank account can be held by multiple account holders (joint account), whereas an account holder may hold multiple accounts. Transactions are usually recorded based on accounts. However, when business users need to analyze the business data they would like to analyze from the Account holder point of view as well, hence the data model may have to be put up to look something like this:

IntermediateFactTable

Check out these links to know more about many-to-many dimensions; The Many to Many Relationship, Many-to-Many dimensions in Analysis Services or try binging it.

My scenario is this. The source system has Groups and Users. Both of them have been identified as dimensions (DimGroup and DimUser). ETLs already created and tested. Since a user can exist in multiple groups and also obviously a group can have multiple users in them, makes this an ideal many-to-many scenario. Something simple and noteworthy in this scenario is that historical tracking is not necessary. Which means I need not store the history of a user belonging to a particular group 3 months ago and now does not.

This is how the data population was done (image contains detailed steps):

PopulatingIntermediateTable

Reasoning: The data to be stored in the target is made up of only two columns that are composite (GroupID and UserID), hence using the slowly changing dimension would not do. The records that come in from the source after getting looked up for the appropriate keys in their respective dimension tables (i.e. DimGroup and DimUser) either shall be inserted if not matched with the destination (i.e. DimGroupDimUser) or be sent on their way (matching records). These matching records are right-outer-joined against records from the destination (i.e. DimGroupDimUser) in order to be deleted (where matching records’ UserID == NULL and matching records’ GroupID == NULL). There are no updates that can happen here. If an update happens at the source; a record will be deleted and another inserted at the destination (This is more or less why the Slowly Changing Dimension cannot be used in this scenario).

Of course, I feel that this could be done in other (maybe even better ways). I would love to hear your comments on this.

 

Leave a comment