funcsql Code

Here’s the content of the funcsql module.

@startuml

package funcsql {

    class Table
    class Row

    Table *- Row : produces

    class CompositeRow

    CompositeRow *-- Row

    class Select {
        from_()
        where()
        join()
        group_by()
        having()
    }

    class Aggregate
    class lambda << (F,orchid) >>

    Select o-- Aggregate
    Select o-- lambda

    class fetch << (F,orchid) >>

    fetch --- Select : executes
    fetch o--> Row : > produces
    fetch o-- Table : < consumes
    fetch o-- CompositeRow : uses

}

We’ll look at the funcsql.Table and funcsql.Row structures first. Then the funcsql.Select class, used to define a query.

The funcsql.fetch() function embodies the Essential SQL Algorithm. It executes the query object, consuming rows from tables and producing rows in a result. Along the way, it builds funcsql.CompositeRow objects which are provided to lambdas to evaluate expressions.

Table and Row

The funcsql.Table objects contain the data to be processed. The funcsql.Row is a wrapper that provides handy attribute name access for dictionary keys.

Table

class funcsql.Table(name: str, data: Iterable[dict[str, Any]] | None = None, schema: list[str] | None = None)

The foundational collection of data consumed by a query.

A Table produces funcsql.Row objects, but doesn’t necessarily contain them. The structure in this class is a list[dict[str, Any]]. From these, funcsql.Row objects are created.

A table has an internal name that’s used by the query. This should match the Python variable name, but that’s not a requirement.

The external Python variable is only used in the funcsql.Select.from_() method. The internal string name is used in all lambdas throughout the query.

Example

>>> t = Table("t", [
...     {'col1': 'row1-colum1', 'col2': 42},
...     {'col1': 'row2-colum1', 'col2': 43},
... ])
>>> list(t)
[Row('t', **{'col1': 'row1-colum1', 'col2': 42}), Row('t', **{'col1': 'row2-colum1', 'col2': 43})]

Important

Table names must be valid Python identifiers.

Column names must be valid Python identifiers.

The data source for a table can can be raw (external) data or it could be results of query execution.

Subclasses can support a wide variety of collections.

  • list[list[str]] supplied with a schema (i.e. CSV reader + headers).

  • list[JSONDoc] with JSONSchema schema definition from NDJSON files.

load(data: Iterable[dict[str, Any]]) None

Replace the underlying data with this data.

column_names() list[str]

Expose the column names. Useful for SELECT * constructs.

alias_iter(alias: str) Iterator[Row]

Apply a table alias.

classmethod from_rows(table_name: str, rows: Iterable[Row]) Table

Builds a table from a sequence of rows – possibly a UNION of multiple queries.

classmethod from_query(table_name: str, query: QueryBuilder) Table

Builds a table by fetching the results of the given query.

Row

class funcsql.Row(table: str, **columns: Any)

An immutable collection of column names and values.

Subclasses can be created to wrap a variety of collections, including dataclasses or Pydantic base_model instances.

_asdict() dict[str, Any]

Return the row as a dictionary.

_values() list[Any]

Rerturn only the values from this row.

Select query builder

The funcsql.Select class is used by build queries (and subqueries.) It’s used by the funcsql.fetch() function (among others) to consume rules from funcsql.Table objects and produce an iterable sequence of funcsql.Row results.

The result of a funcsql.Select construction is a data structure to be used by the funcsql.fetch() function. A funcsql.Select is iterable as a convenience.

Select

class funcsql.Select(star_: str | None = None, **name_expr: Callable[[CompositeRow], Any] | Aggregate)

Builds a query, starting with the SELECT clause.

For SELECT * queries, the star parameter must be STAR.

For all other query forms, use name=lambda cr: expr constructs. This is equivalent to expr AS name in SQL.

The expression can contain table.column references using cr.table.column. The full name must be provided: we don’t attempt to deduce the table that belongs with a name.

The expression can contain any other Python literal of function.

SELECT label, 3*val+1 as answer, 42 as literal
FROM table
Select(
    label=lambda cr:cr.table.label,
    answer=lambda cr: 3 * cr.table.val + 1,
    literal=lambda cr: 42
)

Each expression must be a function (or lambda or callable object) that accepts a single funcsql.CompositeRow object and returns a useful value.

from_(*tables: Table | str, **named_tables: Table | str) Self

The FROM clause: the tables to query. Table aliases aren’t required if all the table names are unique.

To do a self-Join, the tables require an alias. The alias will be used in CompositeRow objects.

SELECT e.name as employee, m.name as manager
FROM employees e, employees m
WHERE ...
(
    Select(
        employee=lambda cr: cr.e.name,
        manager=lambda cr: cr.m.name
    )
    .from_(e=employees, m=employees)
    .where(...)
)

This method also expands SELECT * into proper names.

For recursive queries in a funcsql.With context, a table is created by the initial query. This table recursively populated by a funcsql`Select.union() that references the named initial table using a string table name instead of a funcsql.Table object.

join(table: Table | None = None, *, on_: Callable[[CompositeRow], bool], **alias: Table) Self

The JOIN table ON condition clause: a table to query and the join condition for that table.

SELECT e.name as employee, m.name as manager
FROM employees e
JOIN employee m ON e.manager_id = m.employee_id
(
    Select(
        employee=lambda cr: cr.e.name,
        manager=lambda cr: cr.m.name
    )
    .from_(e=employees)
    .join(m=employee, on_=lambda cr: cr.e.manager_id == cr.m.employee_id)
)

In principle, this could lead to an optimization where pair-wise table joins are done to build the final CompositeRow objects.

This doesn’t handle any of the outer join operators.

Todo

[LEFT | RIGHT | FULL] OUTER? JOIN

An implicit union of non-matching rows. An additional “filterfalse()`` is required to provide NULL-filled missing rows.

Todo

USING("col1", "col2") builds labmda cr: cr.table.col1 == cr.?.col1

Based on left and right sides of join(table, using=("col1", "col2")).

where(function: Callable[[CompositeRow], bool]) Self

The WHERE clause: an expression used for joining and filtering.

SELECT e.name as employee, m.name as manager
FROM employees e, employees m
WHERE e.manager_id = m.employee_id
(
    Select(
        employee=lambda cr: cr.e.name,
        manager=lambda cr: cr.m.name
    )
    .from_(e=employees, m=employees)
    .where(lambda cr: cr.e.manager_id == cr.m.employee_id)
)
group_by(*name: str, **named_expr: Callable[[CompositeRow], Any]) Self

The GROUP BY clause: a list of names or a list of name=expr expressions.

If names are provided, they must be names computed by the SELECT clause.

If name=expr is used, the name=expr is added to the SELECT clause.

SELECT deparment_id, count(*)
FROM employees
GROUP BY department_id
(
    Select(
        department_id=lambda cr: cr.employees.department_id
        count=Aggregate(count, "*")
    )
    .from_(employees)
    .group_by("department_id")
)
having(function: Callable[[Row], bool]) Self

The HAVING clause: an expression to filter groups.

The expression will operate on funcsql.Row objects. It does not operate on CompositeRow objects.

SELECT deparment_id, count(*)
FROM employees
GROUP BY department_id
HAVING count(*) > 2
(
    Select(
        department_id=lambda cr: cr.employees.department_id
        count=Aggregate(count, "*")
    )
    .from_(employees)
    .group_by("department_id")
    .having(lambda row: row.count > 2)
)

The groups are not built from CompositeRow objects; they’re simpler funcsql.Row objects.

union(recursive_select: Select) Self

The UNION clause: A secondary SELECT.

There are two meanings for UNION:

  • Inside the Common Table Expression of a WITH statement, it’s a recursive join.

  • Elsewhere, it’s simply a concatenation of rows from two queries.

Aggregate

Aggregate comptuations are done after grouping, and must be delayed until then. To make this work out, they’re wrapped as an object that’s distinct from a simple lambda.

class funcsql.Aggregate(reduction_function: Callable[[Iterable[Any]], Any], name: str | Callable[[CompositeRow], Any])

Wrapper for an aggregate function to summarize.

There are three forms for using this:

  • m_x = Aggregate(mean, "x"), x = lambda jr: jr.table.column
    

    The x=expr defines the raw value to summarize. This value is then referred to by name in the Aggregate() constructor. (Yes, the x= is defined later, but name resolution doesn’t happen until the query is run.)

  • m_x = Aggregate(mean, lambda jr: jr.table.column)
    

    Same as above. A column name is made up for the expression.

  • count = Aggregate(count, "*")
    

    A special case.

Many SQL aggregates are first-class parts of Python.

AVG

statistics.mean()

COUNT

We define a special functsql.count() function to ignore None.

MAX

max()

MIN

min()

SUM

sum()

STDEV

statistics.stdev()

Todo

Offer DISTINCT variants to reduce to a set before computation.

CompositeRow

These objects move through the funcsql.fetch() function’s pipeline of steps.

class funcsql.CompositeRow(*table_rows: Row, context: CompositeRow | None = None)

A collection of Row instances from one or more named Tables. This provides attribute navigation through the columns of the tables in this composite.

>>> cr = CompositeRow(Row("t1", a=1), Row("t2", b=2))
>>> cr.t1.a
1
>>> cr.t2.b
2

SQL Algorithm

fetch

funcsql.fetch(query: QueryBuilder, *, context: CompositeRow | None = None, cte: dict[str, Table] | None = None) Iterator[Row]
funcsql.fetch(query: Select, *, context: CompositeRow | None = None, cte: dict[str, Table] | None = None) Iterator[Row]
funcsql.fetch(query: Values, *, context: CompositeRow | None = None, cte: dict[str, Table] | None = None) Iterator[Row]
funcsql.fetch(query: With, *, context: CompositeRow | None = None, cte: dict[str, Table] | None = None) Iterator[Row]

The essential SQL Algorithm.

For SELECT, the algorithm is this:

\[Q(T) = H \biggl( G \Bigl( S \left( W ( F(T) ) \right) \Bigr) \biggr)\]

For VALUES, the algorithm is much smaller:

\[Q() = S( \bot )\]

Where

Reading from inner to outer, the order of operations is:

  1. FROM – Creates table-like creature with CompositeRow rows.

  2. WHERE – filters CompositeRow rows.

  3. SELECT – creates a Table with simple Row instances, but no name.

  4. GROUP BY – creates a second table with Row instances and no name.

  5. HAVING – filters the second table

Parameters:
  • query – The QueryBuilder that provides Row instances.

  • context – A context used for subqueries that depend on a context query.

  • cte – The Common Table Expression tables, built by the preceeding WITH clause.

Returns:

Row instances.

Todo

Refactor to return Iterator | None or raise an exception.

It’s awkward to test an Iterator. It’s easier to test the class (None or Iterator) or handle an exception.

funcsql.exists(context: CompositeRow, query: Select) bool

Did a suquery fetch any rows? Implements the EXISTS() function.

Parameters:
  • context – A context used for subqueries that depend on a context query.

  • query – The Query that provides Row instances.

Returns:

True if any row was built by the query.

Note

The order of the arguments is reversed from funcsql.fetch() because this is used in a query builder where a context is generally required.

fetch values

funcsql.fetch_first_value(query: Select) Any

Fetches the first row’s first column value. Implements cases where subquery produces a scalar value.

funcsql.fetch_all_values(query: Select) Iterator[Any]

Fetches the all row’s first column values. Implements cases where subquery produces a set of values.

For example, WHERE col IN (SELECT ...).

component functions

funcsql.from_product(query: Select, *, context: CompositeRow | None = None, cte: dict[str, Table] | None = None) Iterable[CompositeRow]

FROM clause creates an iterable of funcsql.CompositeRow objects as a product of source tables.

Parameters:
  • query – The funcsql.Select object

  • context – An optional context used for subqueries.

Returns:

product of all rows of all tables

funcsql.where_filter(query: Select, composites: Iterable[CompositeRow]) Iterator[CompositeRow]

WHERE clause filters the iterable funcsql.CompositeRow objects.

Parameters:
Returns:

An iterator overn funcsql.CompositeRow objects.

Todo

Outer Joins are implemented here.

funcsql.select_map(query: Select | Values, composites: Iterable[CompositeRow]) Iterator[Row]

SELECT clause applies all non-aggregate computations to the funcsql.CompositeRow objects.

Parameters:
Returns:

an iterator over funcsql.Row instances.

funcsql.aggregate_map(select_clause_agg: dict[str, Aggregate], groups: dict[tuple[Any, ...], Iterable[Row]]) Iterator[Row]

Applies Aggregate computations to grouped rows.

Very similar to the way the funcsql.select_map() function works.

Parameters:
  • select_clause_agg – the aggregate functions from a funcsql.Select object

  • groups – the key: list[value] mapping with groups built from raw data.

Returns:

an iterator over funcsql.Row instances.

funcsql.group_reduce(query: Select, select_rows: Iterable[Row]) Iterator[Row]

GROUP BY clause either creates groups and summaries, or passes data through.

There are three cases:

  • Group by with no aggregate functions. This will be a collection of group keys.

  • Aggregate functions with no Group by. This is a single summary aggregate applied to all data. Think SELECT COUNT(*) FROM table.

  • Neither Group by nor aggregate functions. The data passes through.

Parameters:
Returns:

iterator over funcsql.Row objects

Todo

Avoid simple list() in order to cope with partitioned tables.

funcsql.having_filter(query: Select, group_rows: Iterable[Row]) Iterator[Row]

HAVING clause filters the groups.

This is similar to funcsql.where_filter().

Parameters:
Returns:

iterator over funcsql.Row objects

Other SQL

These perform the expected operation on the underlying funcsql.Table. It’s often better to reach into the Table directly. The point of the funcsql.Table class is to wrap an underlying Python data structure.

class funcsql.Insert

Adds rows to a Table.

Insert().into(table).values([{data}])
class funcsql.Update(table: Table)

Updates rows in a Table.

(
    Update(table)
    .set(col=lambda row: function)
    .where(lambda row: condition)
)
class funcsql.Delete

Deletes rows from a table.

Delete().from_(table).where(lambda row: condition)