Over the last few years, we have been working with a number of customers across various industries such as Life Sciences, Financial Services, Higher Education etc. While the problems related to data testing are similar across industries and JSON and flat files are very common, there are many differences in the file formats. In few industries, Avro is popular while in others, based on use case, Parquet is more commonly used.
At Datagaps, one of the key challenges is our ability to empower customers to test any of the formats with ease. Of course, we have an option to build native connections for each format but that may not be the best option sometimes. At Datagaps, we try to leverage open standards, open source frameworks to support our customers. This is where Apache Drill (https://drill.apache.org/) comes into play.
Over the last 6 months, we have been drilling around a bit and absolutely love the speed and flexibility that Apache Drill provides. As of version 3.4.5, we use Drill as the interface between ETL Validator and any of the file formats mentioned above (except flat files since flat files are much more common and deserve native connectors). In this blog, I wanted to take few minutes and explain how easy it is for you to get started with Drill and integrate with ETL Validator.