Components¶
We have the following software architecture.
Note the four tiers:
Control. At the top are monitor (to run locally) and handler (to run as an AWS lambda.)
Application. The top-tier relies on the applications: reader, writer, filter. The application modules are also stand-alone applications and can be executed individually from the command line.
Storage and Notification. The application modules rely on storage and notification: model, storage, notification.
Infrastructure. All of the commonments rely on the common module which provides configuration details.
These rest on a number of dependencies, listed in the requirements.in
.
pydantic. Used to define the model class in
model
.requests. Used to capture RSS feed in
reader
.jinja2. Used to create HTML documents in
writer
andnotification
.schedule. Used by the
monitor
to control a recurring task.boto3 and botocore. Used to manage AWS resources in
storage
andnotification
.
We’ll look at each componment in a little more detail.
Monitor¶
The monitor executes the feeder reader using its own internal scheduler. This is used when running locally.
To change the schedule, use control-c to crash the application. It consumes very few resources, and can be left running in a terminal window. It can be started or stopped as needed.
Handler¶
The handler is used to execute the feeder reader when it’s deployed as as AWS lambda. The AWS lambda is triggered by a lambda scheduler to periodically perform the various tasks.
This is used when running locally.
To change the schedule, an AWS console is used. A Cloud Formation Template (CLT) can define the resources and the schedule.
Reader¶
The reader consumes data from USCourts RSS feeds and captures it locally.
The storage
module will either use an AWS S3 bucket or it will
use local file storage.
Here’s a more detailed view of the processing.
The resulting files have the following structure:
The files are decomposed by day to make it easy to clean up old files. Within a day, they’re decomposed by hour to make the files small and fast to process.
Within a JSON file (either an items.json
or filter.json
) the structure saved
is a sequence of USCourtItemDetail
instances. See Model for more on this data structure.
Filter¶
The filter examines the captured JSON files, examining all of the USCourtItemDetail
instances. The that match the docket information are written to a
separate file, filter.json
.
Any changes to the filter file are important.
A notification strategy is provided in the notification
module.
Within these JSON files (either history.json
or filter.json
) the structure saved
is a sequence of USCourtItemDetail
instances. See Model for more on this data structure.
Writer¶
The writer builds a web site from the captured files.
The source JSON files (either history.json
or filter.json
) the structure saved
is a sequence of USCourtItemDetail
instances. See Model for more on this data structure.
The output files are created with Jinja templates. See Jinja Templates for more information.
Notification¶
Choices involve
SMTP on a local computer.
SES when deployed in an AWS lambda.
A fancy Text User Interface (TUI) application to show status and notifications.
Other choices include a simple log file or using AWS SNS for notifications.
The decision of which notification to use is a feature of the common
module.
The output files are created with Jinja templates. See Jinja Templates for more information.
Model¶
The model classes are built using Pydantic. This does data validation, and also handles serialization to and from JSON format.
The USCourtItemDetail
class is defined as follows:
Common¶
The common module gathers configuration data.
Generally, a lambda deployment relies on environment variables that are part of the lambda configuration.
When running locally, the configuration file is split to keep private information in a separate file that’s not easily put into a Git repository.
Jinja Templates¶
Jinja templates are fairly complex pieces of functionality used in two places.
HTML pages have a fair amount of boilerplate. Jinja faciltates this by permitting a sophisticated inheritance hierarchy among pages.
The diagram for the relationships in writer
is similar to
the one shown above for notification
.
It involves more than a single HTML_MESSAGE
extension to HTML_BASE
.
The overall index.html
page is generated by the HTML_INDEX
template.
This template includes links to the other subject index pages.
Each of the subject areas – court, docket, date, filtered – has a
directory with an index page and a number of subject pages.
The court/index.html
has the list of index pages, created by the HTML_SUBJECT_INDEX
template.
Each of the court/index_xx.html
pages is created by the HTML_SUBJECT_PAGE
template,
and contains one page of items.
Template expansion works by “evaluating” a page template. Each page template extends the base template, which provides a consistent set of content. The base template includes blocks that are replaced by content defined in the extension templates.
The template language includes for
commands, allowing
template and content to be repeated for each item in a collection.
Additionally, “macro” definitions allow for pieces of template content
to be injected consistently in multiple places within a page.