Tutorial¶
There are several parts to this tutorial:
A more complete example is in Demonstration Code.
Queries¶
The essential SQL query has a number of clauses.
Here’s a short example of a SELECT
:
SELECT n.name, v.c2
FROM names_table n, values_table v
WHERE n.code = v.c1
We rewrite this into a Python funcsql.Select
object.
Each clause has unique rewrite rules, depending con what the SQL clause does.
SELECT
. Each variable or expression must becomes a lambda object, to be computed later. A target variable must be provided; this is in effect, a mandatoryAS
clause.Select(name=lambda cr: cr.n.name, value=lambda cr: cr.v.c2)
The
cr
parameter is a reminder that the lambda will be evaluated with a “composite row”. This row has the tables named in the from clause and the columns of those tables. Thecr.n.name
expression picks the table namedn
and the columnname
from that table.FROM
. Each table name is replaced by aTable
object. We’ll see how these are made in the Schema and Tables section..from_(n=names_table, v=values_table)
As with the
SELECT
clause, this provides the alias name and the table name. Table names can be listed without alias names, also. The tables must be loaded before building the query.WHERE
. The logical condition becomes a lambda, also..where(lambda cr: cr.n.code == cr.v.c1)
The .from_()
and .where()
syntax are methods to add details to the funcsql.Select
object.
There are other methods include .group_by()
and .having()
.
Generally, the whole thing must either be complete on one line, or wrapped in ()
‘s.
This seems best:
from funcsql import *
query = (
Select(name=lambda cr: cr.n.name, value=lambda cr: cr.v.c2)
.from_(n=names_table, v=values_table)
.where(lambda cr: cr.n.code == cr.v.c1)
)
Using wrapping ()
‘s makes the query object easier to work with.
We call the funcsql.Select
a “Query Builder”.
It doesn’t “do” anything.
It barely has any internal state changes.
It collects the various clauses an options of a query.
A separate funcsql.fetch()
function uses a query object to process data defined in funcsql.Table
objects.
Schema and Tables¶
A funcsql.Table
object has a name and a list of rows.
The list of rows need to be dictionary objects.
The keys of this dictionary are the column names.
We limit column names to valid Python identifiers. After all, this is pure Python code. Python identifiers, for example, are case sensitive, where in SQL, there’s no such rule.
How this data arrives on the scene is left wide open. For one example, consider a CSV-format file with a bunch of columns.
from csv import DictReader
from pathlib import Path
from funcsql import *
table_1_path = Path("table_1.csv")
with open(table_1_path) as source_file:
rdr = DictReader(source_file)
table_1 = Table("table_1", list(rdr))
There’s no formal SQL-schema with table definitions.
The table is defined by the column names.
The column names in the CSV file need to match the column names in the query
object.
That’s the only rule:
Important
Any source of list[dict[str, Any]]
data can build a funcsql.Table
object.
Also important, the funcsql.Row
objects will be created for you as needed.
Do not map source data to funcsql.Row
instances.
Here are two other examples of tables:
from funcsql import *
values_table = Table(
"values_table",
[
{"c1": 1, "c2": 42.0},
{"c1": 2, "c2": 3.14},
{"c1": 3, "c2": 2.72},
],
)
names_table = Table(
"names_table",
[
{"code": 1, "name": "Life"},
{"code": 2, "name": "Pi"},
{"code": 3, "name": "Ee"},
],
)
And, yes, the table name matches the variable name to which the funcsql.Table
object is assigned.
This isn’t a requirement, but debugging can be a nightmare if the variable names don’t match the table names.
Fetching Rows¶
To actually execute the query, use the funcsql.fetch()
function.
Load the tables, assigning them to variables.
Build the query with references to the table variables. Assign this to a variable.
Apply the
funcsql.fetch()
function the query. This is an iterable that returnsfuncsql.Row()
objects.
The idea is to be able to to SQL-like processing with minimal overhead.
The result is a sequence of funcsql.Row
objects.
These are similar to named tuples.
The attribute names come from the funcsql.Select
instance, which has parameters of the form name=lambda cr:...
.
from funcsql import *
names_table = Table("names_table", ...)
values_tables = Table("values_table", ...)
query = (Select...)
for row in fetch(query):
print(row)
The idea is the minimize the overheads. This doesn’t have a database connection. It doesn’t have any cursor management or locking or commits. It doesn’t create or require a SQL schema. It does SQL-like processing on Table-like objects.
Group By and Having¶
Here’s an example of a GROUP BY
clause:
SELECT group AS key, sum(value) AS total
FROM raw_table
GROUP BY key
Here’s the rewrite to use the funcsql.Select.group_by()
method to build the grouping.
This also uses a funcsql.Aggregate
object that wraps a function, the built-in sum()
,
and a lambda to compute the parameter values for this function.
The value
item, specifically, is used to extract the required column from the composite row.
from funcsql import *
query = (
Select(
key=lambda c: c.raw.group,
value=lambda c: c.raw.value,
total=Aggregate(sum, "value")
)
.from_(raw_table)
.group_by("key")
)
This has a strange-looking redundancy.
The value=lambda
creates a value for the total=Aggregate
.
This can be abbreviated to Aggregate(sum, value=lambda c: c.raw.value)
.
When parsing SQL, a database engine will make a number of optimizations when working out what a name might refer to. In contrast, this library works directly in Python; it doesn’t have access to the original variable names, table schema, or expressions.
Here’s the table used for the above query.
from funcsql import *
raw_table = Table(
"raw_table",
[
{"group": "1", "value": 1},
{"group": "1", "value": 1},
{"group": "2", "value": 2},
{"group": "2", "value": 3},
],
)
Subqueries¶
Here’s an example of a subquery:
SELECT first_name
FROM employees
WHERE department_id IN (
SELECT department_id FROM departments WHERE location_id>1500
);
Part of the query is the department_id
set created by the subquery.
Here’s an alternative formulation as pure Python.
from funcsql import *
department_ids = set(
fetch_all_values(
Select(department_id=lambda c: c.departments.department_id)
.from_(departments)
.where(lambda c: int(c.departments.location_id) > 1500)
)
)
c3 = (
Select(first_name=lambda c: c.employees.first_name)
.from_(employees)
.where(lambda c: c.employees.department_id in department_ids)
)
Note there are two queries. This reflects the meaning of the SQL, as well as the syntax. The queries aren’t syntactically nested.
First, the subquery SELECT department_id FROM departments ...
is executed to create a set of ids.
Second, the main query executes, using that set of ids.
It’s possible to combine them into a single construct.
This is – after all – pure Python.
The set(...)
expression that computes the value of department_ids
can replace the department_ids
variable in the lambda c: c.employees.department_id in department_ids
expression.
The lambda becomes very long, but, the subquery is textually part of the main query, if that’s important for reader comprehension.
In principle, a SQL database might optimize IN (SELECT x...)
to a one-row lookup using an index or something.
Since the Python set does lookups so quickly, it’s often faster to build the set of values, then do set membership tests.
The Exists Clause¶
The SQL EXISTS()
function generally contains a subquery that includes a reference to the containing query.
For example,
SELECT e.last_name
FROM employees e
WHERE EXISTS (
SELECT * FROM employees b
WHERE b.employee_id = e.manager_id AND b.last_name = 'King'
)
The subquery uses e.manager_id
to refer to a column in a row in the containing query.
Here’s the rewritten query.
from funcsql import *
c5 = (
Select(last_name=lambda c: c.e.last_name)
.from_(e=employees)
.where(
lambda c: exists(
c,
Select(STAR)
.from_(b=employees)
.where(lambda sq: sq.b.employee_id == sq.e.manager_id and sq.b.last_name == "King"),
)
)
)
An exists()
function has two parameters:
The context; the composite row from the parent query.
A Query, usually a
funcsql.Select
object.
Using the given context, the query is evaluated. All of the tables from the parent query are available in the child query. Table aliases are absolutely required to disambiguate references.
WITH
clause and Recursion¶
We have two choices for creating “common table expressions”:
Just like everything else – the With is a QueryBuilder implementation.
The Python with statement implementation. This comes close to the SQL
WITH
clause, but doesn’t do everything.
With is a QueryBuilder¶
Here’s a simple example:
(
With(
table_1=Select(...),
table_2=Select(...)
)
.query(Select(...).from_("table_1", "table_2"))
)
The funcsql.With
uses table_name = Select
to match the SQL table_name AS (SELECT...)
.
The funcsql.With.query()
method expects a funcsql.Select
object.
The table names are strings, since the tables aren’t part of the global namespace; they’re part of the context of the funcsql.With
object.
We can also write this as follows:
(
With(
table_1=Select(...),
table_2=Select(...)
)
.select(...).from_("table_1", "table_2"))
)
The use of the funcsql.With.select()
method more closely parallels the SQL syntax.
Use either the funcsql.With.query()
method with an separate funcsql.Select
,
or use the funcsql.With.select()
method.
Don’t use both, and don’t mix-and-match in your application.
The Python with
statement¶
The following is really unpleasant, but works:
from contextlib import nullcontext
from funcsql import *
with nullcontext(
fetch_table("the_codes", Select(code=lambda cr: cr.names.code).from_(names))
) as the_codes:
rows = fetch(Select(STAR).from_(the_codes))
The complication is a funcsql.Table
has an internal name, separate from the variable name.
Ideally, they match.
This is more pleasant:
from funcsql import *
with fetch_table("the_codes", Select(code=lambda cr: cr.names.code).from_(names)) as the_codes:
rows = fetch(Select(STAR).from_(the_codes))
This works because a Table
is a context manager that does almost nothing.
This doesn’t help specify recursive queries, since they’re actually part of the WITH
clause that creates the CTE used by the target query.
More on recursive query building¶
The examples in https://www.sqlite.org/lang_with.html are particularly good at showing the recursive query technique.
WITH RECURSIVE
works_for_alice(n) AS (
VALUES('Alice')
UNION
SELECT name FROM org, works_for_alice
WHERE org.boss=works_for_alice.n
)
SELECT avg(height) FROM org
WHERE org.name IN works_for_alice;
Is restated as follows:
from funcsql import *
query = (
With(
under_alice=Values(name=lambda cr: "Alice", level=lambda cr: 0).union(
Select(name=lambda cr: cr.org.name, level=lambda cr: cr.under_alice.level + 1)
.from_(org)
.join(table="under_alice", on_=lambda cr: cr.org.boss == cr.under_alice.name)
)
)
.select(line=lambda cr: ".........."[: cr.under_alice.level * 3] + cr.under_alice.name)
.from_("under_alice")
)
Note the funcsql.Values
object,
which implements the SQL Values
clause.
The SQL UNION
becomes a funcsql.Values.union()
method in query construction.
A Select()...union(Select()...)
outside the WITH
context is an ordinary Union.
It’s the same as +
operator between the
data lists that make up a Table.
In a With(table=Select()...union(Select().from("table")...))
inside a WITH
context, the
union specifies recursive traverssal.
Note that this also uses string table name instead of a funcsql.Table
object.
The "under_alice"
table is the whole point being of this Common Table Expression.
The recursive query is building this table. It doesn’t really exist until after the recursive query is complete.