Type 2 Slowly Changing Dimensions are used in the Data Warehouses for tracking changes to the data by preserving historical values. This is achieved by creating a new record in the dimension whenever a value in the set of key columns is modified and maintaining start and end date for the records. The latest records are either identified by querying for records that are not end dated or by maintaining a flag (eg. current_flg) to easily identify them.
Testing SCD Type 2 Dimensions is tricky because it cannot be achieved by a simple comparison of the source and target data. In this article we will examine different aspects of Type 2 SCD that can be tested using ETL Validator.
For the sake of this article, lets consider an Employee dimension (EMPLOYEE_D) of SCD Type 2 which is sourced from a table called EMPLOYEE in the source system.
EMPLOYEE Table has the following columns
EMPLOYEE_DIM SCD Type 2 Dimension Table has the following columns
Use a Query Compare test case in ETL Validator to compare the current data records in the SCD Type 2 Employee_Dim with the data in the source Employee table.
The combination of the key columns in the SCD should be Unique. For the above example, the columns EMP_NO, FIRST_NAME, LAST_NAME, SSN, DOB, JOB_TITLE, SALARY comprise of an unique key in the EMPLOYEE_DIM dimension. This can be easily verified using the Duplicate Check Rule in the Data Rules test plan of ETL Validator. The query generated by ETL Validator using the Duplicate Check Rule should be something like below:
This query should not return any rows.
Whenever there is a change to the values in the key columns a new record should be inserted in the EMPLOYEE_DIM and the old record should be end dated. ETL Validator's Component Test Case can be used to verify this functionality. The Component test case has a feature of Baseline and Compare which can be used to identify the changes in the EMPLOYEE_DIM.
Below are the steps :
ETL Validator thus provides a complete framework for automating the testing of SCD Type 2 dimensions.
Over the last few years, we have been working with a number of customers across various industries such as Life Sciences, Financial Services, Higher Education etc. While the problems related to data testing are similar across industries and JSON and flat files are very common, there are many differences in the file formats. In few industries, Avro is popular while in others, based on use case, Parquet is more commonly used.
At Datagaps, one of the key challenges is our ability to empower customers to test any of the formats with ease. Of course, we have an option to build native connections for each format but that may not be the best option sometimes. At Datagaps, we try to leverage open standards, open source frameworks to support our customers. This is where Apache Drill (https://drill.apache.org/) comes into play.
Over the last 6 months, we have been drilling around a bit and absolutely love the speed and flexibility that Apache Drill provides. As of version 3.4.5, we use Drill as the interface between ETL Validator and any of the file formats mentioned above (except flat files since flat files are much more common and deserve native connectors). In this blog, I wanted to take few minutes and explain how easy it is for you to get started with Drill and integrate with ETL Validator.
It is nice to see Tableau maintain its leadership position in Gartner’s magic quadrant for the 3rd consecutive year.
At Datagaps, this is pretty important for us as the testing automation needs for various BI platforms follow a similar trend. It is interesting that we do not see the traditional BI players in any of the quadrants but this picture is reflective of the number of BI Validator downloads we are seeing on a regular basis. Based on customer demand and experience from supporting other BI platforms, we are currently supporting the following test plans for Tableau.
Business analysts and testing teams can baseline and compare workbook PDF snapshots to identify data and layout differences. This drastically cuts down the time it takes to test for regression in Tableau workbooks. Similar to baseline and comparison of workbooks, comparison of PDF snapshots of views is possible as well. This will identify visual as well as data differences with ease.
Over the last few years, the Salesforce platform has become an incredible force in the market for various reasons. Of course, the most obvious use case is for the CRM capabilities. In addition, many organizations have started using the power of the force platform to build and deploy custom applications in the cloud at an incredibly fast pace.
While this journey is truly exciting, there will always be a burning underlying need to be able to test the data and ensure that it is always as expected. In this blog, I just thought of highlighting a few use cases and how ETL Validator can help you in addressing those.
Oracle E-Business Suite (EBS) R12 is a significant new version with valuable new features and capabilities. Although there is an upgrade path from EBS 11i to R12, most companies reimplement R12 and migrate the data from their 11i instance. Reimplementation can be a complex project but it also gives them the option to improve their implementation.
When transitioning from EBS R12.1 to R12.2 companies generally perform an inplace upgrade. One of our customer was upgrading from EBS R12.1 to R12.2 and wanted to verify that the upgrade did not cause any issues to the data in their data warehouse. While testing the data warehouse and the dashboards can help identify data issues during the upgrade, it is important to test the data in the EBS R12 instance from the backend. This type of testing is called database testing.
Subject Areas in Oracle OBI are great for business users. They hide the complexity involved in warehousing projects and present an easy to use mechanism to create adhoc analysis based on the business user's need.
From an IT perspective, testing subject areas is fairly complex. Every Subject Area is really a grouping of Dimension Folders (with attributes) and Measures also known as Facts. The dimension folders may have 100s of attributes and connected to fact tables via foreign keys. Typically, this is referred to as the Star Schema in the industry. Engineers, Business Analysts, Quality Assurance teams struggle to ensure that business users do not encounter unpleasant surprises in the form SQL Errors when they try to create and execute analysis from the subject areas.
Every few years the BI vendors have a major release that forces the companies using these products to uptake a complex BI upgrade project. For example OBIEE 11g to 12c upgrade or Business Objects 4.1 to 4.2 upgrade. Testing is an important aspect of these upgrade projects. The blog explains how customers can leverage BI Validator to test the upgrades..
1. Upgrade Test Plan: This can be used to compare the reports across the pre and post upgrade environments and ensure that they are as expected. If BI Validator identifies any differences in the data sets, it marks the test plans with a "Warning" status.
Note: Since the UI may look very different (OBIEE 11g & 12c), compare the data in the reports as opposed to the pdf option.
2. As part of the above test plan, BI Validator can also ensure that the reports are found in the post-upgrade environment and that there is no degradation in performance.
3. Stress Test Plan can simulate concurrent loads (e.g 10, 100,500..concurrent users) on the reports and dashboards and ensure that there is no degradation when more users access the BI system.
All the above can be achieved with zero programming and just a couple of clicks. Try BI Validator now in few minutes to know how it can help you during the upgrades!
As part of the upgrade, if you are upgrading your ETL, then you may want to look at ETL Valdiator as well.
Recently, I stumbled upon a relatively old article on Data Migration from TDWI that explains what it is, how it is different from other integration patterns, popular techniques and the typical challenges associated with this pattern.
Though the article is more than 5 years old, it still aligns with what we are seeing in the field today. Most of the customers use ETL technologies for their migration projects and thus the problems encountered are very similar to the ones we see in data warehousing patterns.
One of our customer is planning to roll out new BI functionality to 2000 additional users. They were concerned whether their current BI infrastructure can support this kind of load. So they asked us to perform stress testing using BI Validator with the following two key objectives
At a recent event, one of the prospects at the attendees came over to our booth and asked me to demonstrate a use case that was kind of interesting. I got a similar request from another prospect recently and thought that it might be a good idea to blog and show how it can be done using ETL Validator.
Problem: There is a source table and a target table. The attendee wanted to find the difference in a number field between two tables. In addition, he was also interested in specifying an acceptable variance and define a rule on that. If the difference is within the limits, then, he wanted the test case to be marked as success. If the difference exceeds the variance, then, he wanted the test case to be marked as failure. In just few minutes, we were able to demonstrate this use case using 3.4 version of ETL Validator.
As most of us in the BI community are aware, In-Place upgrades are fairly common when clients want to go through minor upgrades (e.g OBIEE 184.108.40.206 to 220.127.116.11 or Business Object 4.1 SP5 to 4.1 SP 7). However, there are a number of risks which need to carefully addressed in such scenarios. Below are few examples of what can potentially go wrong and how you can leverage BI Validator to validate "In-Place" Upgrades.
I am not sure if you ever wrote Python but it is one of the coolest language out there. Easy to learn, easy to write and do yeah, easy to read! Writing code in Python is something I enjoy when I want to get a break from my routine job of putting together slides or preparing for upcoming demos.
Over the last few years, the usage of Python has gone up drastically and one such area is testing automation. With very few lines of code, you can achieve remarkable things. As an example, sometime back I had to compare the data in two CSV files (tens of thousands or rows) and then spit out the differences. The code looked somewhat like this:
ETL testing applies to Data Warehouses or Data integration projects while Database Testing applies to any database holding data (typically transaction systems). Here are the high level tests done in each:
ETL Testing : Primary goal is to check if the data moved properly as expected
Database Testing : Primary goal is check if the data is following the rules/standards defined in the Data Model
Copyright © 2016 datagaps inc. All rights reserved.