Avneet Sarang's Blog: Data Translations – A proven efficient solution

Background

At one of my clients, back in 2005, we came across a situation where data from Legacy systems needed to be migrated to a new application. Additionally, data was to flow down from the new application to the data warehouse. The requirement for certain data attributes was to translate legacy codes to new codes for the new application, and to translate the new codes back to legacy codes for the data warehouse (to meet the reporting requirements).

For example, let's say Gender in the legacy system had the following values –

MALE
FEMALE

The new application required –

M (instead of MALE)
F (instead of FEMALE)

The data warehouse required the gender to be the same as in legacy system –

MALE
FEMALE

There were a number of such data translation requirements.

Goal

I wanted to come up with a solution that would -

Avoid hard-coding of these data translations / business rules within the code
Give an ability to easily maintain (update) the translations if the business rules changed
Allow 'new' business rules / data translations to be accommodated within the same design
Make the business rules / data translations available at a centralized place for multiple applications / processes to access them

Solution

I decided to build a translation table within the database (Oracle, at that time; could be any database) that would have the following three primary columns –

Translation_Key
Original_Value
Translated_Value

The translation table had a unique key constraint defined, which was a combination of all the three columns mentioned above. This ensured that no duplicate entries would be created in the table.

The data in this table was as follows –

Translation_Key	Original_Value	Translated_Value
GENDER_CODE_FOR_APP	MALE	M
GENDER_CODE_FOR_APP	FEMALE	F
GENDER_CODE_FOR_DW	M	MALE
GENDER_CODE_FOR_DW	F	FEMALE
…..and so on…	…..	…..

The translation key identifies a set of translations for a particular attribute.

To access the table, the combination of first two columns (Translation_Key and Original_Value) was used to lookup and retrieve the required data translation (third column – Translated_Value).

Thus, for the data load to the new application,

Translation_Key = GENDER_CODE_FOR_APP and Original_Value (say, MALE) was used to retrieve the Translated_Value (M in this case).

Similarly, for the data warehouse load,

Translation_Key = GENDER_CODE_FOR_DW and Original_Value (say, F) was used to retrieve the Translated_Value (FEMALE in this case).

We did a one-time exercise initially and populated the translation table with all the required data translations / business rules. It was easy to sit with the business to review and get the necessary approval.

I have used the same solution at two of my most recent clients. This solution has worked very well and would be one my recommended solutions where value-to-value translations are required in an ETL process.

In my next few blog entries, I will discuss how to populate the translation table, and talk about the best approach to implement the solution in an ETL environment.

Avneet Sarang's Blog

Pages

Thursday, August 19, 2010

Data Translations – A proven efficient solution

No comments:

Post a Comment