########### Components ########### We have the following software architecture. .. plantuml:: @startuml package feederreader { component model.py component common.py component storage.py component notification.py component reader.py component filter.py component writer.py component monitor.py component handler.py } handler.py --> reader.py handler.py --> filter.py handler.py --> writer.py monitor.py --> reader.py monitor.py --> filter.py monitor.py --> writer.py filter.py --> notification.py reader.py --> model.py filter.py --> model.py writer.py --> model.py reader.py --> storage.py filter.py --> storage.py writer.py --> storage.py notification.py --> common.py storage.py --> common.py reader.py --> common.py filter.py --> common.py writer.py --> common.py @enduml Note the four tiers: - **Control**. At the top are monitor (to run locally) and handler (to run as an AWS lambda.) - **Application**. The top-tier relies on the applications: reader, writer, filter. The application modules are also stand-alone applications and can be executed individually from the command line. - **Storage and Notification**. The application modules rely on storage and notification: model, storage, notification. - **Infrastructure**. All of the commonments rely on the common module which provides configuration details. These rest on a number of dependencies, listed in the ``requirements.in``. - pydantic. Used to define the model class in :py:mod:`model`. - requests. Used to capture RSS feed in :py:mod:`reader`. - jinja2. Used to create HTML documents in :py:mod:`writer` and :py:mod:`notification`. - schedule. Used by the :py:mod:`monitor` to control a recurring task. - boto3 and botocore. Used to manage AWS resources in :py:mod:`storage` and :py:mod:`notification`. We'll look at each componment in a little more detail. Monitor ======= The monitor executes the feeder reader using its own internal scheduler. This is used when running locally. .. plantuml:: @startuml start :read config build schedule; repeat :at scheduled time; partition Feeder_Reader { :cleaner; :reader; :filter; :writer; } repeatwhile (forever) @enduml To change the schedule, use **control-c** to crash the application. It consumes very few resources, and can be left running in a terminal window. It can be started or stopped as needed. Handler ======= The handler is used to execute the feeder reader when it's deployed as as AWS lambda. The AWS lambda is triggered by a lambda scheduler to periodically perform the various tasks. This is used when running locally. .. plantuml:: @startuml start repeat :at scheduled time; partition Feeder_Reader { :cleaner; :reader; :filter; :writer; } repeatwhile (forever) @enduml To change the schedule, an AWS console is used. A Cloud Formation Template (CLT) can define the resources and the schedule. Reader ====== The reader consumes data from USCourts RSS feeds and captures it locally. .. plantuml:: @startuml component reader component storage cloud AOUSC { file RSS file r2 as "RSS" } database data { folder YYYYMMDD { folder HH { file items.json } } } AOUSC <--- reader : "query" reader --> storage storage --> items.json : "capture" @enduml The ``storage`` module will either use an AWS S3 bucket or it will use local file storage. Here's a more detailed view of the processing. .. plantuml:: @startuml start partition cleaner { :read config; :clean old files; } partition reader { :read config; repeat :for each feed; :consume the XML source; repeat :for each item; :build USCourtItemDetail object; if (unique) then (unique) :Append to history file; endif repeat while (more items) repeat while (more feeds) } stop @enduml The resulting files have the following structure: .. plantuml:: @startuml folder data { folder YYYYMMDD { folder HH { file items.json } folder hh2 as "HH" { file i2 as "items.json" } folder hh3 as "HH" { file i3 as "items.json" } } folder yyyymmdd2 as "YYYYMMDD" folder yyyymmdd3 as "YYYYMMDD" file filter.json } @enduml The files are decomposed by day to make it easy to clean up old files. Within a day, they're decomposed by hour to make the files small and fast to process. Within a JSON file (either an ``items.json`` or ``filter.json``) the structure saved is a sequence of ``USCourtItemDetail`` instances. See `Model`_ for more on this data structure. Filter ====== The filter examines the captured JSON files, examining all of the ``USCourtItemDetail`` instances. The that match the docket information are written to a separate file, ``filter.json``. .. plantuml:: @startuml component filter component storage database data { folder YYYYMMDD { folder HH { file items.json } } file filter.json } filter <-- storage storage <-- items.json : "read" filter --> storage storage --> filter.json : "write" @enduml Any changes to the filter file are important. A notification strategy is provided in the :py:mod:`notification` module. Within these JSON files (either ``history.json`` or ``filter.json``) the structure saved is a sequence of ``USCourtItemDetail`` instances. See `Model`_ for more on this data structure. Writer ====== The writer builds a web site from the captured files. .. plantuml:: @startuml component writer component storage database data { folder YYYYMMDD { folder HH { file items.json } } } database output { folder court { file ci as "index.html" } folder docket { file di as "index.html" } folder date { file dti as "index.html" } folder filter { file fi as "index.html" } file index.html } items.json ---> storage : "read" storage --> writer writer --> storage storage --> output : "write" @enduml The source JSON files (either ``history.json`` or ``filter.json``) the structure saved is a sequence of ``USCourtItemDetail`` instances. See `Model`_ for more on this data structure. The output files are created with Jinja templates. See `Jinja Templates`_ for more information. Notification ============ Choices involve - SMTP on a local computer. - SES when deployed in an AWS lambda. - A fancy Text User Interface (TUI) application to show status and notifications. Other choices include a simple log file or using AWS SNS for notifications. .. plantuml:: @startuml component notification component common notification --> common : "get configuration" cloud { component SES } () smtp notification --> SES : "Lambda" notification --> smtp : "Local" SES --> smtp @enduml The decision of which notification to use is a feature of the ``common`` module. The output files are created with Jinja templates. See `Jinja Templates`_ for more information. Model ===== The model classes are built using **Pydantic**. This does data validation, and also handles serialization to and from JSON format. The ``USCourtItemDetail`` class is defined as follows: .. note: Check the source directory to be sure the line numbers are correct. .. include:: ../src/model.py :start-line: 4 :end-line: 55 Common ====== The common module gathers configuration data. .. plantuml:: @startuml component common file config.toml file fdrdr_config.toml note bottom of fdrdr_config.toml Kept outside Git repository. Usually the user's home directory. end note node host { rectangle "environment variables" as envvar } note bottom of envvar Kept outside Git repository. Usually part of Cloud Formation Template. end note common --> envvar common --> config.toml common --> fdrdr_config.toml @enduml Generally, a lambda deployment relies on environment variables that are part of the lambda configuration. When running locally, the configuration file is split to keep private information in a separate file that's not easily put into a Git repository. Jinja Templates =============== Jinja templates are fairly complex pieces of functionality used in two places. HTML pages have a fair amount of boilerplate. Jinja faciltates this by permitting a sophisticated inheritance hierarchy among pages. .. plantuml:: @startuml package jinja { class Environment class DictLoader Environment *-- DictLoader } package notification as "notification.py" { class HTML_BASE <