Sunday, October 16, 2005

How to handle deleted records from a source system in a data warehouse?

Question from comp.databases.olap USENET group:

I would like to see if someone can share their experience in handling deleted records from legacy source system in data warehouse. I don't see much coverage on this topic in Kimball literatures on how to handle this in ETL and model design. Did I miss something?

My response:

Hi Doug,

This is an excellent question. In my experience, the event of deleting records can be a very valuable source of information about a business process, and thus, it is very useful to capture the event in the data warehouse. I have typically handled this by adding a DELETED column to the fact or dimension table that stores a value of Y or N (or 0 or 1) for deleted versus valid records.

Then, from the end-user tool, during query execution, you can modify your query criteria to check for records that are marked deleted versus valid. To ensure adequate performance, make sure that the DELETED column is indexed using the appropriate technique for your database. In Oracle DBs, low cardinality columns like this are usually retrieved most efficiently by using bitmap indexes.

Note that for auditing purposes, this method works very well, because the underlying integrity of the systems are not challenged by using flags to mark deleted records. It is entirely possible to see the deleted records with the appropriate query.

Good Luck,

Nenshad

1 comment:

  1. Explicit deletes are fairly easy to handle with a deleted flag, how do you propose to handle updates that replaces (ie deletes the old value) data.

    ReplyDelete