Components

We have the following software architecture.

https://www.plantuml.com/plantuml/svg/RP9T2iCW48JVznHUe3r2oLKaRbgfFy5OIYczUvS4N7TD7aASpvqPejDMRQcl601MEp_j7Ss2wB0KjFGsNp3qp3ckEM6g9cQ7uRf-b4nVCQTHtsekv3c2bAjV_6ohFnixmaTzyQ6UwU_YJtNgvAd121uskN1C_01uPIwNMvUk0CShmFd0aSfB0VPIeFDgUtQWVnaqxnstav5JJMw-KbOktO59dmv8fmw0ghXplXgyKm5PWNNPW7LPW7MOC3cwoty=

Note the four tiers:

  • Control. At the top are monitor (to run locally) and handler (to run as an AWS lambda.)

  • Application. The top-tier relies on the applications: reader, writer, filter. The application modules are also stand-alone applications and can be executed individually from the command line.

  • Storage and Notification. The application modules rely on storage and notification: model, storage, notification.

  • Infrastructure. All of the commonments rely on the common module which provides configuration details.

These rest on a number of dependencies, listed in the requirements.in.

  • pydantic. Used to define the model class in model.

  • requests. Used to capture RSS feed in reader.

  • jinja2. Used to create HTML documents in writer and notification.

  • schedule. Used by the monitor to control a recurring task.

  • boto3 and botocore. Used to manage AWS resources in storage and notification.

We’ll look at each componment in a little more detail.

Monitor

The monitor executes the feeder reader using its own internal scheduler. This is used when running locally.

https://www.plantuml.com/plantuml/svg/LOzB2iCm34JtEeMMxHMSJLOzG2zGN7lS21mxA4wpABbx7KFw6H28Ds8OzNEnMfOn4hMDo2YiPvTJa0VT5ucUPpV0Bn4TaMA2BMpg7DHpaN7tk6gg6L8a9xu07dgxrZGelvgxoxW8cw3TbsYx-G51Ola3gye7R4UBTP08FeMiU4BFH3sIhw-y0G==

To change the schedule, use control-c to crash the application. It consumes very few resources, and can be left running in a terminal window. It can be started or stopped as needed.

Handler

The handler is used to execute the feeder reader when it’s deployed as as AWS lambda. The AWS lambda is triggered by a lambda scheduler to periodically perform the various tasks.

This is used when running locally.

https://www.plantuml.com/plantuml/svg/LOvD2eDG38JtEKMMxHLquwe7w0NAmpTY83ubnheelNlZW_uDX0dVp6AwrPFvDYbH_OWC2v9p4xVs8_AcoEmoeINYq18jSPaBNu0CkrsHmlXHqqDDhqW5rdw9rSuF64Jz3-mc7_1yhzX7KV1fc0rts9ceGyanzWK=

To change the schedule, an AWS console is used. A Cloud Formation Template (CLT) can define the resources and the schedule.

Reader

The reader consumes data from USCourts RSS feeds and captures it locally.

https://www.plantuml.com/plantuml/svg/JP2x3i8m34LtVuMLFGoC2A4YEdH1I5GCZAOngAXDIXu3GlrtKXhQUeYJOryxTkOyMH_Q1g0oRMSqQu-MIR5T0EUDfGV7dCO4XVlJfJpW1p3QlMuOpsKvF-mQoQ68J40FeCZJZHpZSFcBJ1CRu3NQyPZdYIuYHR5WK-NQS-jMJsVq5EjXF6EZyLTRAMKQ03SeNe7jMq02KkwcMPB_iRdmeAceyy6o0CXOgxYRBm==

The storage module will either use an AWS S3 bucket or it will use local file storage.

Here’s a more detailed view of the processing.

https://www.plantuml.com/plantuml/svg/PP31JWGX38RlVOeUTrzXvh96BoPwCYPUMQOuDGnbeSGOi-zkXsmcet8WmFV_vMzFLLtHjaI0VWB8jhEo9FIHNAA2tp0KSXDwIO7VHu2X4vGuOU18TOJhthUAk-slhr0cfpW4AKZEpnY89dj7MuEfQbi8TIPyUtx2Agrukj5_JbPQRgoxpuqjpUlBlRdqqU03gUE8SluWhpib1poqn9T6n_MhjDqsHQztbpEb2LLmvgfIldgduoOooE5Nji_P17XOf51FLuzRmtkqjcpekivRCi39AjduVm0=

The resulting files have the following structure:

https://www.plantuml.com/plantuml/svg/SoWkIImgAStDuU9Aoyz9IIrII4aiILIevb800gX8913u-hguG4MHP3myaCJ0ufavgGgP9QcvnNfPnVbvSBbQT9qP6KOAYSKAIXuUIbYDWCWYJ5B2PvGnNgEC2HXLouNisKeWoCrDIO4u0EPr9MnAZD6KSC7j118bgKMG800Qxv2Qbm8COW0=

The files are decomposed by day to make it easy to clean up old files. Within a day, they’re decomposed by hour to make the files small and fast to process.

Within a JSON file (either an items.json or filter.json) the structure saved is a sequence of USCourtItemDetail instances. See Model for more on this data structure.

Filter

The filter examines the captured JSON files, examining all of the USCourtItemDetail instances. The that match the docket information are written to a separate file, filter.json.

https://www.plantuml.com/plantuml/svg/PP0x3i8m38PtdyBg7Ww04EhGeKiFm1Y8YuBoG8aH0-hToQEqaV3Ws9_zsrRwo3Bmong0cxTFxyWnpjemXGP4za7U2K19bbSP2NE07y1aipUA0bwIJTCmhBIfZ6F32jU6K3FPU7X4xxRQ0hilFVMGshG0r1HFGcmNhMzX-qGyOXT8gkud4UBy9yciMP0rxv1cT00zEPN-v0i=

Any changes to the filter file are important. A notification strategy is provided in the notification module.

Within these JSON files (either history.json or filter.json) the structure saved is a sequence of USCourtItemDetail instances. See Model for more on this data structure.

Writer

The writer builds a web site from the captured files.

https://www.plantuml.com/plantuml/svg/ZL913e8m4BplApQzmmCy60uUkFW0ZvKkYa9BscqqCVpTWe154iCUscPcPoUPRUPOMlPD3L2OfZMQDEF3Lams0XmRAoy4e2JBitI4_GDVWA5AKokoU0frF1uE8nenUHvX0rxLXC6YSUdD6Jrp7NpEBZ8odblFIwl2UCknJp-lg50w59LMz4oltDG2lZD7eJB5dLPAjKKfcLOwtY0CdI7BNsdvLpd4CrDz7D-mC4cI_RGGn1qAIr89c92U7VVtkJ3gNtIDoOO9GwC0o4Yhy1NU

The source JSON files (either history.json or filter.json) the structure saved is a sequence of USCourtItemDetail instances. See Model for more on this data structure.

The output files are created with Jinja templates. See Jinja Templates for more information.

Notification

Choices involve

  • SMTP on a local computer.

  • SES when deployed in an AWS lambda.

  • A fancy Text User Interface (TUI) application to show status and notifications.

Other choices include a simple log file or using AWS SNS for notifications.

https://www.plantuml.com/plantuml/svg/ROyn3i8m34LtdyBA14Clm82wiB7r12P9gqY9NRNEXBXxdGf49V1a_V_y9FUhqgArP00lPHQEhCYYQKoUD0aVOjkA1N2iiUkkxmBnWcwAZUCnJNNP0MVYB3NW4z3cQnjk0xm0JcTSYyv_h0OquvtA8v3xxTlP3eYdxA3XdnYWZnpigmq=

The decision of which notification to use is a feature of the common module.

The output files are created with Jinja templates. See Jinja Templates for more information.

Model

The model classes are built using Pydantic. This does data validation, and also handles serialization to and from JSON format.

The USCourtItemDetail class is defined as follows:

https://www.plantuml.com/plantuml/svg/bLBBQiCm4BplLoovf2qreLSJIcbw2QINbfo3OcqR4JsCj8Q6DFzUMNxK9Y4G7GGQPcPdntu9B3nNHWCKpfJEacKi3r9OLWKU0UCfj0W1NqMWToT9msd8DJKKYT2mbaocbR5YJJa6zkcfbTtdkFvOfPUaK7XLidBsdr9MsuqK74NrpSeG9FmOGyOu9_popcnOVD_z67qVS_IPSFwRPMbu5sIn4zQcP5UptEJVUW9qvWyHd3mzcibezFfpuIhFWogaA_R4WqLZvuf20zmKOU0Da2QSmsSfteY5wgfG4SX7WlIDcutoViThani_WE6YyqiKd4qnovv7ZKndVNsNDVLfTHqnQk9FjrEIa4p91pHajkJ37m==

Common

The common module gathers configuration data.

https://www.plantuml.com/plantuml/svg/bL6nQaCn3Dpz5I9BfkGJIg3G35tRkJYnNfx1bemi1qBflrT-IS4aMUh9Efzqf_EkgXTjEJbtv5oOa1JibfcScs92AsYAfwrovop852J8ruWJAyA1rGhWwGa1xBnZKU2cDGQ4VLGGB5oZibos2-6pDf_I1NH6Q1LbNM7cZ12YuF5AGmhGnPn9sfIPgyBtqjmfdavcRuLqZiAK-ofdBz4V4jOL-0hsOe3xkJTym__bibXklVSrSGlmnfAzHYRum5oILnpDNWE5pURbJsf0nfm6-354jxE9zbM_

Generally, a lambda deployment relies on environment variables that are part of the lambda configuration.

When running locally, the configuration file is split to keep private information in a separate file that’s not easily put into a Git repository.

Jinja Templates

Jinja templates are fairly complex pieces of functionality used in two places.

HTML pages have a fair amount of boilerplate. Jinja faciltates this by permitting a sophisticated inheritance hierarchy among pages.

https://www.plantuml.com/plantuml/svg/RP7FJiCm3CRlUGfh9q1q3u2cQaDKxB2TkljaKiEyvK-L60GQl3jfDR6a8WSVFdznVlcyZ891tgrHgaTzmZU28xiZmbb1EjfWZD2u3mxUMNAIgK-iPUEnep2AcGdkgYfl_ro7Eo_yoXg5lGECCCk5UVyrAnvUxxQR_UEgRM2ns97j3GhLDLmympaGTW_mOhm-_Md2OcWgQkyaeKtbG2uHtd2gdtk7bkw1XMoy7Hq9V0ApRTfszJpXEL_CGysGlrCA-QKXXsigOqX5-x_UC4cf8hMggekDg0DtL09f3jgGKilqkxJRNm==

The diagram for the relationships in writer is similar to the one shown above for notification. It involves more than a single HTML_MESSAGE extension to HTML_BASE.

https://www.plantuml.com/plantuml/svg/bPBB2i8m44Nt-Oh1bOhw0LB4L_61Yc05TnBJmOQcQKaYAkg_6wkegRfm5Y5SpjovJDBMXYgpImGX6MKntIBi5JUeW6eetQ-Dx5Y24m5RJ52jOHXC9-jkP-63vmMOP88QRBNWmTmgGXesOIDI5Zyrmut0eiXIQL2QegnGXgZAt9w7jQG9ri0cINGb9owa66Oqw_ih9qrEl0Kzr-jlw8V1OjaT-xtW7oEdvQVXl3Fm__ExTP0NTKzK_JxrIRdvF-ZSqwujuY6wZyQuOW89QTkDsszm1G==

The overall index.html page is generated by the HTML_INDEX template. This template includes links to the other subject index pages. Each of the subject areas – court, docket, date, filtered – has a directory with an index page and a number of subject pages. The court/index.html has the list of index pages, created by the HTML_SUBJECT_INDEX template. Each of the court/index_xx.html pages is created by the HTML_SUBJECT_PAGE template, and contains one page of items.

Template expansion works by “evaluating” a page template. Each page template extends the base template, which provides a consistent set of content. The base template includes blocks that are replaced by content defined in the extension templates.

The template language includes for commands, allowing template and content to be repeated for each item in a collection. Additionally, “macro” definitions allow for pieces of template content to be injected consistently in multiple places within a page.