Sagan-esque Data Volumes

About once a week a question shows up on Stack Overflow that involves loading a database with truly epic volumes of data. For example "billions of rows in a single table for a month".

Billions of rows per month is a minimum insert rate of 385 row per second.

Also …

more ...


Python in the News

Date Tags #python

Making the rounds: Droopy: easy file receiving. Apparently, there were some widely-read blog posts about this. Google "Droopy: A Tiny Web Server That Makes Receiving Files a Snap" to see the buzz.

The point here is that 750 lines of Python code can go a long way. It's a complete …

more ...

Technology Adoption and the "No"-gates

Let's say you've found some new, good way to do business.

JSON, for example. Or Agile Methods in general. Or TDD specifically. Or use of an ORM.

You read up on it. You build a spike solution to show that it's more efficient.

The First No-Gate

You make The Essential …

more ...

A Limit to Reuse

We do a lot of bulk loads. A lot.

So many that we have some standard ETL-like modules for generic "Validate", "Load", "Load_Dimension", "Load_Fact" and those sorts of obvious patterns.

Mostly our business processes amount to a "dimensional conformance and fact load", followed by extracts, followed by a different "dimensional …

more ...




Yet More Praise for Unit Tests

I can't say enough good things about TDD.

But I'll try.

Due to an epic failure to read the documentation (this, specifically) I couldn't get our RESTful web services to work in Apache.

The entire application system has pretty good test coverage. I use the Python unittest to do integration …

more ...

REST and HTTP Digest Authentication

It seems so simple: use the HTTP Digest Authorization with the Quality of Protection set to "auth".

It's an easy algorithm. A nonce that encodes a timestamp can be used to be sure no one is attempting to cache credentials. It's potentially very, very nice.

Except for one thing: Apache …

more ...