S.Lott -- Software Architect

Denormalization or "What did you mean by that?"

I use the word denormalization heavily, to make a point to a certain class of developers. Other developers object to the term, since it doesn't have a precise meaning.

The point I often have to make this:

3rd Normal Form is for Updates.
Data Warehousing is about Insert and Select …

more ...

Genius Move -- Characteristic Functions

The comment was eaten by Haloscan, but here's the text...

You need to read Rozhenstein on characteristic functions.

select
sum(case when a < .5 then 1 else 0 end) 'A'
,sum(case when a >= .5 and a < .75 then 1 else 0 end) 'B'
,sum(case when a >= .75 then …

more ...

My Query Is Slow -- What To Do? Or Dumb-As-A-Post SQL (Revised)

First, let me point out that the Data Cartel ("DBA" means Don't Bother Asking) won't release all the information I requested, so some of this is a guess.

We'll look at a number of dumb-as-a-post SQL techniques. This is proof -- if any were needed -- that bad SQL is worse than …

more ...

The Django World-View: Model+Admin First; Built-in Transparency and Trustworthiness

See Michael Hugos "Think about screens and the data on them to simplify system development " for some helpful insight on what an "application" really is -- access to data. Simple transparency is lifted up as a critical value for software.

I liked the "If you don't believe it could be this …

more ...

Database Design and UML - What was the question again?

One issue in creating a database design is working around the limitations inherent in the SQL data model. I'm going to call it the SQL model because you can make the case that the entity-relationship (ER) model is an abstraction and could have a far more expressive implementation. I'm going …

more ...

Another Dimensional Model Implementation

The Cubulus project and Alexandru Toth 's page describe an "OLAP Aggregation Engine". It is very nice to see advanced work done on the dimensional model.

The cited research dates from 1999 (V. Markl, F. Ramsak, R. Bayer, "Improving OLAP Performance by Multidimensional Hierarchical Clustering", Proceedings of the Intl. Database …

more ...

Just for a moment, I though I'd found something SQLAlchemy doesn't do perfectly.

After having written a number of application-specific object-relational mappers, I have been on the prowl for an elegant, enduring solution. I had started to come to grips with Django , and like much of the approach. Django has a tiny infrastructure feature (the settings.py file) which made it unpleasant to …

more ...

Python and Reverse Engineering, Part 5

Python is a top-shelf toolset for creating sample data to do performance testing.

Let's say that you need to validate a data warehouse design, and you need a million facts that join with thousands of dimension entities across a half-dozen dimensions. You'll be generating data for seven different tables, and …

more ...

Python and Reverse Engineering, Part 2

A stored procedure isn't really very easy to understand. There's a profound fascination with triggers and stored procedures, and they're both really bad ideas. I can't say enough bad things about stored procedures. See PL/SQL vs. Java - Which is REALLY faster? and Over-Solving the Problem or When your architect …

more ...

Dejavu and Python-based Dimensional Analysis

Actually, the code looks like a clever expansion on my example, in PyCon 2007 (Revised) .

"But wait," you say. "Creating a pivot table in Python?"

Of course. Spreadsheets can create pivot tables from dimensionally normalized data. However, getting the data in this form is often challenging and if there is …

more ...