Real data is too complex to handle
We all know the problem of data retrieved from the live system. It is hard to get, it is complex, it all looks identical (although it is not), it is a lot, it does not have the needed content and it is almost impossible to manipulate.
However, to be able to perform our tests, we need specific data, or files containing specific values and we need it in huge volumes. Additional, files used might need unique identifiers with the consequence that once a file is used, it has become useless for further testing or retesting. And due to the amount of data and the complexity of the data, it is a hell of a job to manipulate the files without making mistakes.
