Test data – be careful what you ask for

Test data – be careful what you ask for

August 19, 2020 Off By Hoofer

By Nynke Hogeveen

When you ask for soda, you might get Fanta even though you prefer Coca-Cola. You don’t get exactly what you wanted, but you got soda – right? This is the same with test data. Initially testers or QA engineers may request ‘just 100 rows of test data’ to test an application. But later on they come across a data related defect because the data does not contain the edge case they were supposedly proving. So the whole thing was a waste of their test time. They needed Coca-Cola, not ‘just soda’. If they had asked for Coca-Cola right away, they all would have saved their selves a lot of time. In short: data related defects are a waste of your test time! It’s time to end this waste of time once and for all.

“Data related defects are a waste of your test time!”

Production-quality test data is key

Just getting 100 rows of generic data for your test is easy. But it will not give you a proper test result. That’s why testers prefer production-quality test data above all since it offers the most reliable test results. But how do you do that? You cannot just give them copies of production of course. Firstly, this data contains privacy sensitive information and secondly, you would get a huge storage problem. Imagine giving every dev, QA or test team a copy of 5 TB production data… Although using copies of production data for testing purposes is still common, it is also one of the biggest bottlenecks in testing processes. The privacy sensitiveness of the data and the total size cause, in most cases, a lot of trouble.

Test data as an accelerator

Instead of test data being a bottleneck in testing (containing privacy sensitive data and causing storage problems), it could (and should!) be an accelerator. When your test database is representative and quickly available (preferably automated), it will help speed up your testing processes (and the entire software development process). The key consideration in this is how to anonymize the database in a credible, usable manner and how to make it quickly available. The picture below shows that after masking and deploying the first database (Test Data Master), it can act as a source for subsequent copies within the lower environments.

But for test data as an accelerator, you can’t have full copies for QA and DEV. You need small data sets containing the specifically required test cases: you need a subset of the Test Data Master. After all, smaller databases (subsets) reduce testing time by their very nature. They lend themselves to fast refreshes and they are easily adapted to evolving test data requirements.

We may assume that testers have a good understanding of the test data (cases) they need in a particular subset. If not, do they have the knowledge and ability to prove the system they have to test? But creating a subset is more than just picking some test cases out of production. The subsets need to be referential intact, meaning that all related data in other tables, databases and systems is present as well. It is impossible to do this manually, so you’ll need a subset tool to do it for you.

“We may assume that testers have a good understanding of the test data they need. If not, do they have the knowledge and ability to prove the system they have to test?”


Be careful when you ask or get asked for ‘just some generic data’ for testing. It seems like an easy way for some quick testing but typically it results in data related defects, which are a pure waste of your time. Instead, start the conversation about how you’re able to use test data as an accelerator instead of it being a bottleneck. Creating (masked) subsets of test data will save you a lot of time and money, which you can spend on more useful work and improving your products and processes even more.


Photo by Afif Kusuma on Unsplash

About the Author


Nynke Hogeveen is communications adviser at DATPROF, a leading Test Data Management solutions provider. By sharing knowledge and offering the right tools she wants to make Test Data Management more accessible to every organization. The main goal is to simplify getting the right test data in the right place at the right time.