Testing with PySpark

This isn't about details of pySpark. This is about the philosophy of testing when working with a large, complex framework, like pySpark, pandas, numpy, or whatever.

BLUF

Use data subsets.

Write unit tests for the functions that process the data.

Don't test pyspark itself. Test the code you write.

Some …

more ...