Data Quality Testing
The purpose of Data Quality tests is to verify the accuracy and quality of the data. Data profiling is generally used to identify data quality issues in production systems once the application has been live for some time.
However, the goal of database testing is to automate the data quality checks in the testing phase.
Duplicate Data Checks
Look for duplicate rows with same unique key column or a unique combination of columns as per business requirement.
Example: Business requirement says that a combination of First Name, Last Name, Middle Name and Data of Birth should be unique.
Sample query to identify duplicates
SELECT fst_name, lst_name, mid_name, date_of_birth, count(*) FROM Customer GROUP BY fst_name, lst_name, mid_name HAVING count(*)>1
Data Validation Rules
Many database fields can contain a range of values that cannot be enumerated. However, there are reasonable constraints or rules that can be applied to detect situations where the data is clearly wrong. Instances of fields containing values violating the validation rules defined represent a quality gap that can impact ETL processing.
Example: Date of birth (DOB). This is defined as the DATE datatype and can assume any valid date. However, a DOB in the future, or more than 100 years in the past are probably invalid. Also, the date of birth of the child is should not be greater than that of their parents.
Data Integrity Checks
This measurement addresses "keyed" relationships of entities within a domain. The goal of these checks is to idenfity orphan records in the child entity with a foreign key to the parent entity.
1. Count of records with null foriegn key values in the child table
2. Count of invalid foriegn key values in the child table that do not have a corresponding primary key in the parent table
Example: In a ERP scerario, Order Line table has a foriegn key for the Order Header table. Check for Orphan Order line records without the corresponding Order header record.
1. Count of nulls in the Order Header foreign key column in the Order Line table:
SELECT count(order_header_id) FROM order_lines where order_header_id is null
2. Count of invalid foriegn key values in the Order Line table:
SELECT order_header_id FROM order_lines
minus
SELECT h.order_header_id FROM order_header h, order_lines l where h.order_header_id=l.order_header_id
Automate data quality testing using ETL Validator
ETL Validator comes with Data Rules Test Plan and Foreign Key Test Plan for automating the data quality testing.
Data Rules Test Plan: Define data rules and execute them on a periodic basis to check for data that violates them.
Foreign Key Test Plan: Define data joins and identify data integrity issues without writing any SQL queries.
Reference Data Testing
Many database fields can only contain limited set of enumerated values. Instances of fields containing values not found in the valid set represent a quality gap that can impact processing.
Verify that data conforms to reference data standards
Data model standards dictate that the values in certain columns should adhere to a
values in a domain.
Example: Values in the country_code column should have a valid country code from a Country Code domain.
select distinct country_code from address
minus
select country_code from country
Compare domain values across environments
One of the challenge in maintaining reference data is to verify that all the reference data values from the development environments has been migrated properly to the test and production environments.
Example: Compare Country Codes between development, test and production environments.
Track reference data changes
Baseline reference data and compare it with the latest reference data so that the changes can be validated.
Example: A new country code has been added and an existing country code has been marked as deleted in the development environment without the approval or notification to the data stewart.
Automate reference data testing using ETL Validator
ETL Validator comes with Baseline & Compare Wizard and Data Rules test plan for automatically capturing and comparing Table Metadata.
Baseline reference data and compare with the latest copy to track changes to reference data.
Define data rules to verify that the data conform to the domain values
Database Procedure Testing
It is quite common to have database procedures with business logic in an application. As part of white box testing, examine the database procedure structure and derive test data from the program logic/code.
Database Procedure unit testing
Unit testing of the database procedures is similar to the unit testing process followed by development/QA teams for testing of code written in other languages such as Java and C#.
The steps to be followed are listed below:
Review Design: Review the database procedures to understand the code and design specifications to come up with the Test Cases.
Setup Test Data: Insert test data into the database tables for the test .
Execute Test : Run the database procedure passing the appropriate input parameters. Compare data in tables and procedure output with the expected results.
Teardown Test Data: Delete or modify the test data in the tables to their orignal state.
Example: In a financial company, the interest earned on the savings account is dependent the daily balance in the account for the month. A database procedure was written in the application to calculate the interest earned as part of the month end process.
1. Review the requirement and design for calculating the interest and come up with the test cases.
2. Setup test accounts and the corresponding daily balance records.
3. Execute procedure for calculating the interest passing the corresponding account and month details. Verify that the interest output from the procedure is the expected value.
4. Cleanup test account and daily balance data from database tables.
Database Regression Testing
The goal of Database Regression testing is to identify any issues that might occur due to changes in the database metadata, procedures or system upgrades.
Automated Database Testing
Automating the Database testing is the key for regression testing of the Database particularly more so in an agile development environment.
Organizing test cases into test plans (or test suites) and executing them automatically as and when needed can reduce the time and effort needed to perform the regression testing.
Automating Database testing can also eliminate any human errors while performing manual checks.
Changes to Metadata
Track changes to table metadata such as adding or dropping of columns, new constraints and tables. Often database metadata changes are not communicated to the QA and Development teams resulting in Application failures. Maintaining a history of DDL changes will help narrow down the tests that need to be run.
Example 1: The length of a comments column was increased in the application UI but the corresponding increase was not made in the database table.
Example 2: One of the index in the database was dropped acccidentally which resulted in application performance issues.
Automate Database regression testing using ETL Validator
ETL Validator comes with a Metadata Compare Wizard that helps track changes to Table metadata over a period of time. This helps ensure that the QA and development teams are aware of the changes to table metadata.
ETL Validator also has the capability to Baseline and Compare the output of database procedures so that any changes in the output can be validated.
Database Integration Testing
When an application is tested from the UI, the database procedures and tables are accessed and tested as well. The goal of integration testing is to validate that the database tables are getting populated with the expected data based on input provided in the application.
End-to-End Data Testing
Integration testing of the database and the related applications involves the following steps:
Review the application UI to database attribute mapping document. Prepare one if missing.
Run tests in the application UI tracking the test data input.
Verify that the data loaded into the database tables matches the input test data.
Example: One of the optional postal address field in the UI was not getting saved to the database because of an application defect. The issue was only identified after the application went live when some of the mails were returned.
Max. Data Length Testing
The focus of this test is to validate that the data is not getting truncated while being stored in the database:
Review the application UI to database attribute mapping document. Prepare one if missing.
Run tests in the application UI by entering the maximum allowed test data length from the UI.
Verify that the data loaded into the database tables matches the input test data without any truncation.
Example: The application UI allowed the user to enter the comments larger than 2000 characters while the corresponding database column was defined as VARCHAR(2000). Any user comments longer than 2000 characters were getting truncated by the database procedure during the insert.