Sunday, 30 July 2017 14:46

Testing Type 2 Slowly Changing Dimensions using ETL Validator

Written by
Rate this item
(0 votes)

Type 2 Slowly Changing Dimensions are used in the Data Warehouses for tracking changes to the data by preserving historical values. This is achieved by creating a new record in the dimension whenever a value in the set of key columns is modified and maintaining start and end date for the records. The latest records are either identified by querying for records that are not end dated or by maintaining a flag (eg. current_flg) to easily identify them. 

 

Testing SCD Type 2 Dimensions

Testing SCD Type 2 Dimensions is tricky because it cannot be achieved by a simple comparison of the source and target data.  In this article we will examine different aspects of Type 2 SCD that can be tested using ETL Validator. 

For the sake of this article, lets consider an Employee dimension (EMPLOYEE_D) of SCD Type 2 which is sourced from a table called EMPLOYEE in the source system. 

 

EMPLOYEE Table has the following columns
ROW_ID
EMP_NO
FIRST_NAME
LAST_NAME
SSN
DOB
JOB_TITLE
SALARY

EMPLOYEE_DIM SCD Type 2 Dimension Table has the following columns
ROW_WID
EMP_NO
FIRST_NAME
LAST_NAME
SSN
DOB
JOB_TITLE
SALARY
START_DT
END_DT
CURRENT_FLG

 

Test 1: Verifying the current data

Use a Query Compare test case in ETL Validator to compare the current data records in the SCD Type 2 Employee_Dim with the data in the source Employee table. 

Source Query : select ROW_ID, EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY from EMPLOYEE


Target Query : select ROW_ID, EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY from EMPLOYEE_DIM where CURRENT_FLG = 'Y'

Test 2: Verifying the uniqueness of the key columns in the SCD

The combination of the key columns in the SCD should be Unique. For the above example, the columns EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY comprise of an unique key in the EMPLOYEE_DIM dimension. This can be easily verified using the Duplicate Check Rule in the Data Rules test plan of ETL Validator. The query generated by ETL Validator using the Duplicate Check Rule should be something like below: 

 

 select EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY, COUNT(*) CNT from EMPLOYEE_DIM group by EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY having COUNT(*)>1

 

This query should not return any rows. 

Test 3: Verifying that historical data is preserved and new records are getting created

Whenever there is a change to the values in the key columns a new record should be inserted in the EMPLOYEE_DIM and the old record should be end dated. ETL Validator's Component Test Case can be used to verify this functionality. The Component test case has a feature of Baseline and Compare which can be used to identify the changes in the EMPLOYEE_DIM.  
 
Below are the steps :

  •   Create a Component test case and take a snapshot of the current values in the EMPLOYEE_DIM (called Baseline) 
  •   Modify a few records in the source EMPLOYEE table by updating the values in the key columns such as SALARY, LAST_NAME
  •   Execute the ETL process so the the EMPLOYEE_DIM has the latest data
  •   Run the Component test case to compare the Baseline data with the Result table and identify the differences. Verify that the differences are as expected. 

 

ETL Validator thus provides a complete framework for automating the testing of SCD Type 2 dimensions. 

Read 432 times Last modified on Monday, 31 July 2017 03:32