Background
At one of my clients, back in 2005, we came across a situation where data from Legacy systems needed to be migrated to a new application. Additionally, data was to flow down from the new application to the data warehouse. The requirement for certain data attributes was to translate legacy codes to new codes for the new application, and to translate the new codes back to legacy codes for the data warehouse (to meet the reporting requirements).
For example, let's say Gender in the legacy system had the following values –
- MALE
- FEMALE
The new application required –
- M (instead of MALE)
- F (instead of FEMALE)
The data warehouse required the gender to be the same as in legacy system –
- MALE
- FEMALE
There were a number of such data translation requirements.
Goal
I wanted to come up with a solution that would -
- Avoid hard-coding of these data translations / business rules within the code
- Give an ability to easily maintain (update) the translations if the business rules changed
- Allow 'new' business rules / data translations to be accommodated within the same design
- Make the business rules / data translations available at a centralized place for multiple applications / processes to access them
Solution
I decided to build a translation table within the database (Oracle, at that time; could be any database) that would have the following three primary columns –
- Translation_Key
- Original_Value
- Translated_Value
The data in this table was as follows –
Translation_Key | Original_Value | Translated_Value |
GENDER_CODE_FOR_APP | MALE | M |
GENDER_CODE_FOR_APP | FEMALE | F |
GENDER_CODE_FOR_DW | M | MALE |
GENDER_CODE_FOR_DW | F | FEMALE |
…..and so on… | ….. | ….. |
The translation key identifies a set of translations for a particular attribute.
To access the table, the combination of first two columns (Translation_Key and Original_Value) was used to lookup and retrieve the required data translation (third column – Translated_Value).
To access the table, the combination of first two columns (Translation_Key and Original_Value) was used to lookup and retrieve the required data translation (third column – Translated_Value).
Thus, for the data load to the new application,
Translation_Key = GENDER_CODE_FOR_APP and Original_Value (say, MALE) was used to retrieve the Translated_Value (M in this case).
Similarly, for the data warehouse load,
Translation_Key = GENDER_CODE_FOR_DW and Original_Value (say, F) was used to retrieve the Translated_Value (FEMALE in this case).
We did a one-time exercise initially and populated the translation table with all the required data translations / business rules. It was easy to sit with the business to review and get the necessary approval.
I have used the same solution at two of my most recent clients. This solution has worked very well and would be one my recommended solutions where value-to-value translations are required in an ETL process.
In my next few blog entries, I will discuss how to populate the translation table, and talk about the best approach to implement the solution in an ETL environment.
No comments:
Post a Comment