Demonstration Code¶

Example 1¶

Some imports

from pathlib import Path
from csv import DictReader
from statistics import mean, stdev

from funcsql import *

Read the raw data file. This function simply returns the list of dictionaries from the csv.DictReader. The columns aren’t useful as shown.

DEFAULT_PATH = Path("tests") / "Anscombe_quartet_data.csv"


def get_series(path: Path = DEFAULT_PATH) -> list[dict[str, str]]:
    """
    Get all four series.

    :param path: source path
    :return: a list of dictionaries with keys "series", "x", and "y".
    """
    with open(path) as source_file:
        rdr = DictReader(source_file)
        return list(rdr)

The column names are x123, y1, y2, y3, x4, and y4, which require restructuring.

We want a table like the following:

@startuml
hide methods
hide circle

class Anscombe {
series: int
x: float
y: float
}
@enduml

There are four series, each use a subset of columns:

{series: 1, x: x123, y: y1}
{series: 2, x: x123, y: y2}
{series: 3, x: x123, y: y3}
{series: 4, x: x4, y: y4}

One way to restructure this is a UNION of four queries.

SELECT 1, x123 as x, y1 as y
FROM raw
UNION
SELECT 2, x123 as x, y2 as y
FROM raw
UNION
SELECT 3, x123 as x, y3 as y
FROM raw
UNION
SELECT 4, x4 as x, y4 as y
FROM raw

We can do this as follows without the overhead of creating and loading one table that we’ll used to create a second table.

def restructure(sample_rows: list[dict[str, str]]) -> list[Row]:
    """
    Transform the mish-mash of columns to a series number, and x, y values.

    :param samples: the raw results of the CSV ``DictReader``.
    :return: A list of ``Row`` instances.
    """
    samples = Table("samples", sample_rows)

    q1 = Select(
        series=lambda cr: 1,
        x=lambda cr: float(cr.samples.x123),
        y=lambda cr: float(cr.samples.y1),
    ).from_(samples)
    q2 = Select(
        series=lambda cr: 2,
        x=lambda cr: float(cr.samples.x123),
        y=lambda cr: float(cr.samples.y2),
    ).from_(samples)
    q3 = Select(
        series=lambda cr: 3,
        x=lambda cr: float(cr.samples.x123),
        y=lambda cr: float(cr.samples.y3),
    ).from_(samples)
    q4 = Select(
        series=lambda cr: 4,
        x=lambda cr: float(cr.samples.x4),
        y=lambda cr: float(cr.samples.y4),
    ).from_(samples)

    rows = (
        list(fetch(q1)) + list(fetch(q2))
        + list(fetch(q3)) + list(fetch(q4))
    )

The real goal is to compute some descriptive statistics after restructuring the data.

SELECT series, MEAN(x), MEAN(y)
FROM anscombe
GROUP BY series

The query is a bit longer, but (again) doesn’t involve the overhead of loading table. Or – in this example – loading one table with raw data and then inserting into another table to restructure the data.

We’ve highlighted the funcsql.Select that corresponds to the statistics query.

def main() -> None:
    data = restructure(get_series())
    anscombe = Table.from_rows("anscombe", data)

    print("Series I")
    query = (
        Select(x=lambda cr: cr.anscombe.x, y=lambda cr: cr.anscombe.y)
        .from_(anscombe)
        .where(lambda cr: cr.anscombe.series == 1)
    )
    for r in fetch(query):
        print(f"{r.x:6.2f}, {r.y:6.2f}")

    print("Means")
    stats_query = (
        Select(
            mean_x=Aggregate(mean, lambda cr: cr.anscombe.x),
            mean_y=Aggregate(mean, lambda cr: cr.anscombe.y)
        )
        .from_(anscombe)
        .group_by(series=lambda cr: cr.anscombe.series)
    )
    for r in fetch(stats_query):
        print(f"{r.series} {r.mean_x:.2f} {r.mean_y:.2f}")

This does the same processing without the conceptual overheads of table schema, or other SQL complications like connections and commits.

Example 2¶

TBD

Demonstration Code¶

Example 1¶

Example 2¶

Functional SQL

Navigation

Related Topics