Denormalization or "What did you mean by that?"

I use the word denormalization heavily, to make a point to a certain class of developers. Other developers object to the term, since it doesn't have a precise meaning.

The point I often have to make this:

  1. 3rd Normal Form is for Updates.
  2. Data Warehousing is about Insert and Select …
more ...

Genius Move -- Characteristic Functions

The comment was eaten by Haloscan, but here's the text...

You need to read Rozhenstein on characteristic functions.

select
sum(case when a < .5 then 1 else 0 end) 'A'
,sum(case when a >= .5 and a < .75 then 1 else 0 end) 'B'
,sum(case when a >= .75 then …
more ...



Database Design and UML - What was the question again?

One issue in creating a database design is working around the limitations inherent in the SQL data model. I'm going to call it the SQL model because you can make the case that the entity-relationship (ER) model is an abstraction and could have a far more expressive implementation. I'm going …

more ...

Another Dimensional Model Implementation

The Cubulus project and Alexandru Toth 's page describe an "OLAP Aggregation Engine". It is very nice to see advanced work done on the dimensional model.

The cited research dates from 1999 (V. Markl, F. Ramsak, R. Bayer, "Improving OLAP Performance by Multidimensional Hierarchical Clustering", Proceedings of the Intl. Database …

more ...


Python and Reverse Engineering, Part 5

Python is a top-shelf toolset for creating sample data to do performance testing.

Let's say that you need to validate a data warehouse design, and you need a million facts that join with thousands of dimension entities across a half-dozen dimensions. You'll be generating data for seven different tables, and …

more ...