Testing with PySpark

This isn't about details of pySpark. This is about the philosophy of testing when working with a large, complex framework, like pySpark, pandas, numpy, or whatever.

BLUF

Use data subsets.

Write unit tests for the functions that process the data.

Don't test pyspark itself. Test the code you write.

Some …

more ...



Fighting Against Over-Engineering

I've been trying to help some folks who have a "search" algorithm that's slow.

They know it's slow -- that's pretty obvious.

They're -- unfortunately -- sure that asyncio will help. That's not an obvious conclusion. It involves no useful research. Indeed, that's a kind of magical thinking. Which leads me to consider …

more ...






Bashing the Bash -- The shell is awful and what you can do about it

A presentation I did recently.

https://github.com/slott56/bashing-the-bash

Folks were polite and didn't have too many questions. I guess they fundamentally agreed: the shell is awful, we can use it for a few things.

Safe Shell Scripts Stay Simple: Set the environment, Start the application. ---------------------------------------------------------------------------

The Seven S's …

more ...