FAERIE DUST™

Here's how to recognize a Faerie Dust request:

  1. We have identified a problem. It can be with almost anything: scalability, reliability, auditability, any Quality Measure.
  2. We're pursuing a specific technology. Typically, something that has the lowest impact on our architecture.
  3. We can't address anything other than this specific technology variation …
more ...

Testing with PySpark

This isn't about details of pySpark. This is about the philosophy of testing when working with a large, complex framework, like pySpark, pandas, numpy, or whatever.

BLUF

Use data subsets.

Write unit tests for the functions that process the data.

Don't test pyspark itself. Test the code you write.

Some …

more ...



Fighting Against Over-Engineering

I've been trying to help some folks who have a "search" algorithm that's slow.

They know it's slow -- that's pretty obvious.

They're -- unfortunately -- sure that asyncio will help. That's not an obvious conclusion. It involves no useful research. Indeed, that's a kind of magical thinking. Which leads me to consider …

more ...