Over the last few years, we have been working with a number of customers across various industries such as Life Sciences, Financial Services, Higher Education etc. While the problems related to data testing are similar across industries and JSON and flat files are very common, there are many differences in the file formats. In few industries, Avro is popular while in others, based on use case, Parquet is more commonly used.
At Datagaps, one of the key challenges is our ability to empower customers to test any of the formats with ease. Of course, we have an option to build native connections for each format but that may not be the best option sometimes. At Datagaps, we try to leverage open standards, open source frameworks to support our customers. This is where Apache Drill (https://drill.apache.org/) comes into play.
Over the last 6 months, we have been drilling around a bit and absolutely love the speed and flexibility that Apache Drill provides. As of version 3.4.5, we use Drill as the interface between ETL Validator and any of the file formats mentioned above (except flat files since flat files are much more common and deserve native connectors). In this blog, I wanted to take few minutes and explain how easy it is for you to get started with Drill and integrate with ETL Validator.
Over the last few years, the Salesforce platform has become an incredible force in the market for various reasons. Of course, the most obvious use case is for the CRM capabilities. In addition, many organizations have started using the power of the force platform to build and deploy custom applications in the cloud at an incredibly fast pace.
While this journey is truly exciting, there will always be a burning underlying need to be able to test the data and ensure that it is always as expected. In this blog, I just thought of highlighting a few use cases and how ETL Validator can help you in addressing those.
Oracle E-Business Suite (EBS) R12 is a significant new version with valuable new features and capabilities. Although there is an upgrade path from EBS 11i to R12, most companies reimplement R12 and migrate the data from their 11i instance. Reimplementation can be a complex project but it also gives them the option to improve their implementation.
When transitioning from EBS R12.1 to R12.2 companies generally perform an inplace upgrade. One of our customer was upgrading from EBS R12.1 to R12.2 and wanted to verify that the upgrade did not cause any issues to the data in their data warehouse. While testing the data warehouse and the dashboards can help identify data issues during the upgrade, it is important to test the data in the EBS R12 instance from the backend. This type of testing is called database testing.
Recently, I stumbled upon a relatively old article on Data Migration from TDWI that explains what it is, how it is different from other integration patterns, popular techniques and the typical challenges associated with this pattern.
Though the article is more than 5 years old, it still aligns with what we are seeing in the field today. Most of the customers use ETL technologies for their migration projects and thus the problems encountered are very similar to the ones we see in data warehousing patterns.
At a recent event, one of the prospects at the attendees came over to our booth and asked me to demonstrate a use case that was kind of interesting. I got a similar request from another prospect recently and thought that it might be a good idea to blog and show how it can be done using ETL Validator.
Problem: There is a source table and a target table. The attendee wanted to find the difference in a number field between two tables. In addition, he was also interested in specifying an acceptable variance and define a rule on that. If the difference is within the limits, then, he wanted the test case to be marked as success. If the difference exceeds the variance, then, he wanted the test case to be marked as failure. In just few minutes, we were able to demonstrate this use case using 3.4 version of ETL Validator.
I am not sure if you ever wrote Python but it is one of the coolest language out there. Easy to learn, easy to write and do yeah, easy to read! Writing code in Python is something I enjoy when I want to get a break from my routine job of putting together slides or preparing for upcoming demos.
Over the last few years, the usage of Python has gone up drastically and one such area is testing automation. With very few lines of code, you can achieve remarkable things. As an example, sometime back I had to compare the data in two CSV files (tens of thousands or rows) and then spit out the differences. The code looked somewhat like this:
ETL testing applies to Data Warehouses or Data integration projects while Database Testing applies to any database holding data (typically transaction systems). Here are the high level tests done in each:
ETL Testing : Primary goal is to check if the data moved properly as expected
Database Testing : Primary goal is check if the data is following the rules/standards defined in the Data Model
Copyright © 2016 datagaps inc. All rights reserved.