See Synthetic Data.
I've updated the repository with a "Noisy Data" feature.
This will generate bulk data with invalid field values.
It helps with testing ETL pipelines to be sure they will scale to the expected volumes.
Clone https://github.com/slott56/DataSynthTool
Read https://slott56.github.io/DataSynthTool/_build/html/index.html