Pages

Wednesday, August 25, 2010

Using conditions as parameters in Filter Transformation (Informatica)

This post might be a very basic one. 

I was in a discussion with my colleague and we were translating the requirements to design.

The requirement was to build a flexible functionality in our code to include/exclude records based on a certain value.

 For example, the source (flat file) has the following records –
ID
Value
1
0.3
2
0.5
3
0.6
4
1.0

Based on a business decision at a later stage on the project, the requirement could be any of the following –

Req1: Retrieve all the records that have a Value >= 0.5.
OR
Req2: Retrieve all the records that have a Value > 0.5 only (exclude 0.5 as well)

The condition to check attribute: Value, needed to be parametrized.

Challenge
In the filter transformation, unlike source qualifier transformation, you cannot implement a parameter to include the operator.

In other words, you cannot have a parameter -
$$Condition = >= 0.5
and put the following code in the filter transformation –
Value $$Condition

You will get an error:
<< PM Parse Error >> missing operator
Value >>>> <<<< $$Condition

Solution
The workaround is to use two parameters.

$$Condition1 (Datatype = double)
$$Condition2 (Datatype = double)

In the filter transformation, write the following code –
Value = $$Condition1 OR Value > $$Condition2

So, if the decision is to go with Req1 (Retrieve all the records that have a Value >= 0.5.), then in the parameter, assign:
$$Condition1 = 0.5
$$Condition2 = 0.5

The resultant condition in the filter transformation would be:
Value = 0.5 OR Value > 0.5

The output will be –
ID
Value
2
0.5
3
0.6
4
1.0


If the decision is to go with Req2 (Retrieve all the records that have a Value > 0.5 only), then in the parameter, assign:
$$Condition1 = -999 (or any non-existent value)
$$Condition2 = 0.5

The resultant condition in the filter transformation would be –
Value = -999 OR Value > 0.5

The output will be –
ID
Value
3
0.6
4
1.0


Thursday, August 19, 2010

Data Translations – A proven efficient solution

Background
At one of my clients, back in 2005, we came across a situation where data from Legacy systems needed to be migrated to a new application. Additionally, data was to flow down from the new application to the data warehouse. The requirement for certain data attributes was to translate legacy codes to new codes for the new application, and to translate the new codes back to legacy codes for the data warehouse (to meet the reporting requirements).

For example, let's say Gender in the legacy system had the following values –
  1. MALE
  2. FEMALE
The new application required –
  1. M (instead of MALE)
  2. F (instead of FEMALE)
The data warehouse required the gender to be the same as in legacy system –
  1. MALE
  2. FEMALE
There were a number of such data translation requirements.

Goal
I wanted to come up with a solution that would -
  • Avoid hard-coding of these data translations / business rules within the code
  • Give an ability to easily maintain (update) the translations if the business rules changed
  • Allow 'new' business rules / data translations to be accommodated within the same design
  • Make the business rules / data translations available at a centralized place for multiple applications / processes to access them

Solution
I decided to build a translation table within the database (Oracle, at that time; could be any database) that would have the following three primary columns –
  1. Translation_Key
  2. Original_Value
  3. Translated_Value
The translation table had a unique key constraint defined, which was a combination of all the three columns mentioned above. This ensured that no duplicate entries would be created in the table.

The data in this table was as follows –
Translation_Key
Original_Value
Translated_Value
GENDER_CODE_FOR_APP
MALE
M
GENDER_CODE_FOR_APP
FEMALE
F
GENDER_CODE_FOR_DW
M
MALE
GENDER_CODE_FOR_DW
F
FEMALE
…..and so on…
…..
…..

The translation key identifies a set of translations for a particular attribute.


To access the table, the combination of first two columns (Translation_Key and Original_Value) was used to lookup and retrieve the required data translation (third column – Translated_Value).

Thus, for the data load to the new application,
Translation_Key = GENDER_CODE_FOR_APP and Original_Value (say, MALE) was used to retrieve the Translated_Value (M in this case).

Similarly, for the data warehouse load,
Translation_Key = GENDER_CODE_FOR_DW and Original_Value (say, F) was used to retrieve the Translated_Value (FEMALE in this case).


We did a one-time exercise initially and populated the translation table with all the required data translations / business rules. It was easy to sit with the business to review and get the necessary approval.

I have used the same solution at two of my most recent clients. This solution has worked very well and would be one my recommended solutions where value-to-value translations are required in an ETL process.

In my next few blog entries, I will discuss how to populate the translation table, and talk about the best approach to implement the solution in an ETL environment.