Code

Infrastructure

The overall configuration and infrastructure is implemented by the Common module.

Common

Common functions and classes used throughout Feeder Reader.

There are three functions in this module to get configuration information from a variety of places.

https://www.plantuml.com/plantuml/svg/SoWkIImgAStDuU9ApaaiBbP8Jos9Xx1iR1Gqt7GKyekICp9JD1NSIlEIIpBpynJixA1AK_FAuaiIat9I2Ii5cvgVbvQPZazjVb9MQdA9WekhcsEeJqWm5nUIDbrTEuI5N0XNOX57Q42O1BGhLI4rjw2aKY4tDJSfjQYOYyiXDIy5w7C0

There’s a hierarchy to the configuration data:

  • Environment variables.

    • All of the AWS_ environment variables

    • All of the LAMBDA_ environment variables

    • All of the FDRDR_ environment variables

  • The user’s private ~fdrdr_config.toml.

  • The local config.toml.

  • The built-in defaults in this module.

See https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-runtime for standard AWS environment variable names.

common.get_class(cls)

Map a generic class to a specific implementation class, if there are choices. This generally applies to storage.Storage and notification.Notification.

The CONFIG mapping dictinary has environments and class mappings.

Parameters:

cls (type) – An abstract super class with an environment-specific implementation.

Returns:

an implementation class, based on the environment.

Return type:

type

common.env_table()

Get environment variables and create TOML-like tables.

AWS_REGION = "us-east-1" is treated like the following TOML

[AWS]
    REGION = "us-east-1"

The envvar name is decomposed into a table and a name within the table.

This looks for variable names that begin with "AWS_", "LAMBDA_". or "FDRDR_".

Return type:

dict[str, Any]

common.get_config(**section)

Read the config files, if present and merge in the reults of the env_table() function.

Here are the tiers of overrides:

  • Any overrides from the section parameter are seen first. These should come from the command line.

  • The ~/fdrdr_config.toml. This usually has a [[notifier.smtp]] table with user credentials.

  • The config.toml from the current working directory.

  • The DEFAULTS in this module.

Use a call like this to provide overrides from the command line:

common.get_config(writer = {'format': 'csv'})

to override the config file with command-line options.

Parameters:

section (dict[str, Any]) – A specific table within the TOML file to inject overrides.

Returns:

The complete configuration.

Return type:

dict[str, Any]

Storage and Notification

The structure of RSS entries is defined by the Model module.

Storage has a number of implementations in the Storage module.

Messaging, also, has a number of implementations in the Notification module.

Model

Data Model for the Feeder Reader.

https://www.plantuml.com/plantuml/svg/bLBBQiCm4BplLuXSqfQQqAi99JIz5DfBIyv1iRQ9Y1v6MaD3cd_lMdxK9Y4G7GmQPcPdnrf4ae4gQm0ARqllq94e3qewqeKuWk1J61cZU94HFxn20wSXrzOG4w5XlH7QLaQQ3EOYKaXNezq-5wlFztHJ68kWIzBU_LFAQhDMabXKVqbRX8GVEIJ7EOdluEsQvFRknvfxpj5d4lFVB4l3ko3BdNGk9RkQvIR_rXEWCtzWSFBqQ2UZqQklfQy23QfGljWZ3HRbpZN61hWfmi0R84quXyzIl299tLMX6SX7WlIFcutoViThani_WE6YyqiKCPkAB7iUDJATzVLTrjIdrNL4rl2Zssc50cr91s7HRZ14W0KwnQ_t3m==

Note

A docket number may be composed of a number or letter indicating the court, a two-digit number to identify the year, the case type (either CV/cv for civil cases or CR/cr for criminal cases), a four- or five-digit case number, and the judge’s initials.

For example, 1:21-cv-5678-MW is the docket number for the 5,678th civil case filed in the year 2021 and assigned to court number 1 and the Honorable Martha Washington.

class model.Channel(*, title, link)

The channel definition from the RSS source.

For example,

<rss version="2.0">
<channel>
<title>Eastern District of New York Filings Entries on cases</title>
<link>https://ecf.nyed.uscourts.gov</link>
<description>Public Filings in the last 24 Hours</description>
<lastBuildDate>Thu, 28 Dec 2023 21:20:01 GMT</lastBuildDate>
...
</channel>
</rss>

The from_tag() method only preserves the title and link information.

Parameters:
  • title (str) –

  • link (Url) –

title: str

RSS feed title

RSS feed link

classmethod from_tag(tag)

Extracts the individual field values from the XML tag.

Parameters:

tag (Element) – The XML tag for a channel.

Returns:

New Channel instance.

Return type:

Channel

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'link': FieldInfo(annotation=Url, required=True), 'title': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class model.Item(*, title, link, description, text_pub_date)

One item from within a channel extracted from the RSS source.

For example,

<item>
<title>
<![CDATA[ 2:23-cv-09491-PKC-ST Sookra v. Berkeley Carroll School et al ]]>
</title>
<pubDate>Thu, 28 Dec 2023 21:18:55 GMT</pubDate>
<author/>
<guid isPermaLink="true">https://ecf.nyed.uscourts.gov/cgi-bin/DktRpt.pl?508001</guid>
<description>
<![CDATA[ [Quality Control Check - Summons] Sookra v. Berkeley Carroll School et al ]]>
</description>
<link>https://ecf.nyed.uscourts.gov/cgi-bin/DktRpt.pl?508001</link>
</item>
Parameters:
  • title (str) –

  • link (Url) –

  • description (str) –

  • text_pub_date (str) –

title: str

RSS feed title

RSS feed link

description: str

RSS feed description

text_pub_date: str

RSS feed publication date

property pub_date: datetime

Parse the publication date.

classmethod parse(tag)

Parse the individual fields to create an interim mapping.

Parameters:

tag (Element) – the XML tag

Returns:

a dictionary with fields and values.

Return type:

dict[str, Any]

classmethod from_tag(tag)

Extracts the individual field values from the XML tag.

Parameters:

tag (Element) – The XML tag for an item.

Returns:

New Item instance.

Return type:

Item

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'description': FieldInfo(annotation=str, required=True), 'link': FieldInfo(annotation=Url, required=True), 'text_pub_date': FieldInfo(annotation=str, required=True), 'title': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class model.USCourtItem(*, title, link, description, text_pub_date, docket, parties)

Decompose an Item tag content to get Docket and Parties from the Title.

Generally the title has a docket string followed by the parties.

2:23-cv-09491-PKC-ST Sookra v. Berkeley Carroll School et al

The 2:23-cv-09491-PKC-ST is the docket. The remaining portion name the parties.

Parameters:
  • title (str) –

  • link (Url) –

  • description (str) –

  • text_pub_date (str) –

  • docket (str | None) –

  • parties (str | None) –

docket: str | None

Docket extracted from the title

parties: str | None

Parties extracted from the title.

classmethod from_tag(tag)

Extracts the individual field values from the XML tag.

Parameters:

tag (Element) – The XML tag for an item.

Returns:

New USCourtItem instance.

Return type:

USCourtItem

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'description': FieldInfo(annotation=str, required=True), 'docket': FieldInfo(annotation=Union[str, NoneType], required=True), 'link': FieldInfo(annotation=Url, required=True), 'parties': FieldInfo(annotation=Union[str, NoneType], required=True), 'text_pub_date': FieldInfo(annotation=str, required=True), 'title': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class model.USCourtItemDetail(*, item, channel)

An item and the channel for this item. The two are bound together to make sure the overall court information is kept with each item.

Parameters:
item: USCourtItem

The item from the RSS feed.

channel: Channel

The channel to which the item belongs.

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'channel': FieldInfo(annotation=Channel, required=True), 'item': FieldInfo(annotation=USCourtItem, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

Storage

Storage for the Feeder Reader.

The idea is to provide a common abstract class with two concrete implementations:

  • Local File system

  • AWS S3

https://www.plantuml.com/plantuml/svg/TLB1QiCm3BtxAtHSnz9BRsMKhXk5Gc47CtOOOxYEsDpQKR4LLjZwxqljr4bMgekntjFJaz0yiF9SEok49hUd2Yk6mEJL9yAFW2RK9h1Nca5R5GB80NWAF9Z0uMQl-6i4KrWGsOmvjSE4vDItDumVsq1L1Ew0DblKt14yghAFnxwZQSlxfyDuu2iwjh5L6f-DhPl_s-dpthLocH1pHn6VDEcQjB9BOx4TEKBdyUz_Y-AIqQkMawjb7NJihzrHYRRTQz9uaOzKUrunkYTaPMmii5INyF0g1rGvMTHwXpn8FQUD9GSj0MwZWnPgzB8RA8fPfoFELp0Upv8r8UPObSsj_04=
class storage.Storage(base)

Abstract class with operations to persist data.

Note that a path is represented as a tuple[str, ...], not as a “/”-delimited string.

Parameters:

base (Path) –

textify(content)

Transforms a Writeable into JSON-formatted representation.

  • str objects are left intact, no changes. Presumably, these were already in JSON format.

  • Any pydanticBaseModel instance is serialized using it’s model_dump_json() method.

  • Any other class is provided to the pydantic.to_json function.

Parameters:

content (str | BaseModel | Iterable[BaseModel]) – Something to persist

Returns:

String, ready to write.

Return type:

str

abstract exists(local_path)

Does the path exist?

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

bool

abstract make(local_path, exist_ok=False)

Make a new directory.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • exist_ok (bool) –

Return type:

None

abstract write_json(local_path, content)

Serialize an object or list of objects in JSON. Most collections should be lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted. Usually a list.

Return type:

None

abstract read_json(local_path, target)

Deserialize a list of objects from JSON to Python. Most collections are lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • target (type[BaseModel]) – Sublass of pydantic.BaseModel.

Return type:

list[BaseModel]

abstract write_text(local_path, content)

Write text. Often HTML. Possible Markdown or CSV.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str) – An string to write.

Return type:

None

abstract open_nljson(local_path)

Open a path to append objects in Newline Delimited JSON format. The path must exist. Use exists() and make() as needed.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

abstract write_nljson(content)

Append on object in Newline Delimited JSON format to the currenly open NLJSON.

Parameters:

content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted.

Return type:

None

abstract close_nljson()

Close path after appending objects in Newline Delimited JSON format.

Return type:

None

abstract listdir(local_path)

List the contents of a path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

Iterator[tuple[str, …]]

abstract rmdir(local_path)

Remove all objects with a given path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

validate(document, cls)

Uses pydantic model validation to validate JSON to create an object. Since most collections have lists, this builds a list of instances, even when given a dictionary built from a JSON source.

Parameters:
  • document (dict[str, Any]) – a dictionary recovered from parsing JSON text.

  • cls (type[BaseModel]) – A subclass of pydantic.BaseModel.

Returns:

an instance of the given class

Raises:

Validation errors of the document cannot be validated.

Return type:

list[BaseModel]

class storage.LocalFileStorage(base)

Concrete class with operations to persist data in the local filesystem.

Note that a path is represented as a tuple[str, ...], not as a “/”-delimited string.

Parameters:

base (Path) –

pathify(local_path)

Builds a local Path from a tuple of strings.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Returns:

Path instance.

Return type:

Path

exists(local_path)

Does the path exist?

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

bool

make(local_path, exist_ok=False)

Make a new directory.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • exist_ok (bool) –

Return type:

None

write_json(local_path, content)

Serialize an object or list of objects in JSON. Most collections should be lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted. Usually a list.

Return type:

None

read_json(local_path, cls)

Deserialize a list of objects from JSON to Python. Most collections are lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • target – Sublass of pydantic.BaseModel.

  • cls (type[BaseModel]) –

Return type:

list[BaseModel]

write_text(local_path, content)

Write text. Often HTML. Possible Markdown or CSV.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str) – An string to write.

Return type:

None

open_nljson(local_path)

Open a path to append objects in Newline Delimited JSON format. The path must exist. Use exists() and make() as needed.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

write_nljson(content)

Append on object in Newline Delimited JSON format to the currenly open NLJSON.

Parameters:

content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted.

Return type:

None

close_nljson()

Close path after appending objects in Newline Delimited JSON format.

Return type:

None

listdir(local_path)

List the contents of a path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

Iterator[tuple[str, …]]

rmdir(local_path)

Remove all objects with a given path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

class storage.S3Storage(base)

Concrete class with operations to persist data in an S3 bucket.

Note that a path is represented as a tuple[str, ...], not as a “/”-delimited string.

Within an S3 bucket, there aren’t proper directories; the “path” IS merely a long key to identify an object in the bucket. This means that the make() method doesn’t really need to do anything.

Parameters:

base (Path) –

pathify(local_path)
Parameters:

local_path (str | tuple[str, ...]) –

Return type:

str

exists(local_path)

Does the path exist?

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

bool

make(local_path, exist_ok=False)

Make a new directory.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • exist_ok (bool) –

Return type:

None

write_json(local_path, content)

Serialize an object or list of objects in JSON. Most collections should be lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted. Usually a list.

Return type:

None

read_json(local_path, cls)

Deserialize a list of objects from JSON to Python. Most collections are lists.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • target – Sublass of pydantic.BaseModel.

  • cls (type[BaseModel]) –

Return type:

list[BaseModel]

write_text(local_path, content)

Write text. Often HTML. Possible Markdown or CSV.

Parameters:
  • local_path (str | tuple[str, ...]) – nodes along the path.

  • content (str) – An string to write.

Return type:

None

open_nljson(local_path)

Open a path to append objects in Newline Delimited JSON format. The path must exist. Use exists() and make() as needed.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

write_nljson(content)

Append on object in Newline Delimited JSON format to the currenly open NLJSON.

Parameters:

content (str | BaseModel | Iterable[BaseModel]) – An object to be converted to JSON and persisted.

Return type:

None

close_nljson()

Close path after appending objects in Newline Delimited JSON format.

Return type:

None

listdir(local_path)

List the contents of a path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

Iterator[tuple[str, …]]

rmdir(local_path)

Remove all objects with a given path.

Parameters:

local_path (str | tuple[str, ...]) – nodes along the path.

Return type:

None

Notification

Notification for the Feeder Reader.

The idea is to provide a common abstract class with several concrete implementations:

  • Stored log files.

  • AWS SNS.

  • Local SMTP.

  • AWS SES – better than local SMTP.

The notifiers use Jinja to format documents. This module has the document templates.

https://www.plantuml.com/plantuml/svg/dLDTIyCm57tFhyZZXdO_GCQOw0Q25g4J7qGaIxCk8nyZkHLaxh-RD4tNj4yp3sroxpadzojB2yH0BGL2LCkZLam1LXhuZbC2N2jyGjWjihLW20LC1R4MvsE4Nv9PIofcxx3W5ZxUYWTT6tW29XyP42u-E-HDkEHdCi9CEqo6c-0cVSif1dB6EwwutRVUCsf-8RfdNRa0MHjODaJwRvB0_3VB8gclKxniNgyNYgn4AI_-8HH8YSwgM4bNf2k5MXOwzxiiTScYK50VzI8bM0b7mRS9nISxG84sRWPILB2bGBUJtVG4NCNWYsgrunMUp_5aVOkreNjUJl6wLhH9QB58LGvS7KWYibBVt6WbdRTdtJ1v50H234BN9Rv_BoAF6ogOBrDop0iFJXx3RBQuAEL3JrDH57ljBLwZwrbZS7V4yMpERMMwKMkClgWZzKjn45eoLRf7-mS=

A notifier is a context manager. The idea is that it is going to accumulate details and send them at exit time. If there are no details, it does nothing.

note_class = common.get_class(Notification)
with note_class(storage, nfr_config) as notifier:
    # process items
    notifier.notify(item)

At the end of the with statement, the context manager will make sure all items are handled.

class notification.Notification(storage=None)

Abstract class with operations to notify of changes.

Parameters:

storage (Storage | None) –

start()

Start accumulating notifications.

Return type:

None

notify(message)

Accumulate an item for notification.

Parameters:

message (USCourtItem) – An item to include in the notification.

Return type:

None

abstract finish()

End accumulating notifications; finalize and send.

Return type:

None

class notification.LogNote(storage)

Accumulates an HTML file with notifications.

The storage parameter is required.

Parameters:

storage (Storage) –

finish()

End accumulating notifications; finalize and send.

Return type:

None

class notification.SMTPNote(storage=None)

Sends an email message using Python smtplib.

The storage parameter must be None, which is the default.

The configuration file must provide the parameters for accessing the SMTP server. See common.get_config().

Parameters:

storage (Storage | None) –

finish()

End accumulating notifications; finalize and send.

Return type:

None

class notification.SNSNote(storage=None)

Sends a message using AWS SNS; presumably with an email subscription configured.

The topic’s ARN is provided as an environment variable FDRDR_SNS_TOPIC.

The storage parameter must be None, which is the default.

Parameters:

storage (Storage | None) –

finish()

End accumulating notifications; finalize and send.

Return type:

None

class notification.SESEmail(storage=None)

Sends an email message using AWS SES.

The sending email address must have been verified with AWS.

https://docs.aws.amazon.com/ses/latest/dg/creating-identities.html

The Identity ARN is provided as an environment variable FDRDR_IDENTITY_ARN. The configuration file must provide the parameters for accessing the SES Email. See common.get_config().

The storage parameter must be None, which is the default.

Parameters:

storage (Storage | None) –

finish()

End accumulating notifications; finalize and send.

Return type:

None

Applications

There are three top-level applications.

The Reader application parses the RSS feed and saves history. This application includes a Cleaner function to erase old objects from history.

The Filter application examines the history for any of the interesting dockets. These are used for notification.

The Writer application converts the history into HTML pages. When an S3 bucket is used, these can be read with a browser.

Reader

Reader application of the Feeder Reader.

https://www.plantuml.com/plantuml/svg/ZPDDImCn48Rl-HN3NggbI_4gfGYLWg0N5y5RYTbf6yX7IR984V-xizkidRMrcbEIUUQTDxEpTp79a1fhX38oHbWZrYkTCC1i1bVB2VYWjhg-XcNZLDBUmNoUKILtgGduDnXGrlyID2ZTPL0eRtVUeKiGamzoW_0XWDP1eJhSYKyXSj6odBCttKAnzW5GgYqyLrKNAJYQtoIoezUdHrxI-XhDeE1BjV1DI0y9xJqcgStv6BjTRALpQ4HtbGymBaieRYQsE4bYPvWHTL8GfgFEEGT7qu7w-RnsvTmUf3pg73swCzGxxS_Ssh6bRNvBK6lRFiQ2nJmN9s2lqPmXk7L6YNx0JhwXyYus5edfaPyw-OqjIwuIHriopyctQP2kXlUWEx0FZd_Nz8wshlEOPvQfyecXdQN9vIEpmRGYWRNUN1hquGGVSmO5sRKW1GIqkxG7Y_ODGnJY3brDl_ON

The reader() uses common to get parameters with the list of RSS feeds. For each feed, it gets the current RSS document, and parses this to extract the USCourtItemDetail details.

Important

Feeds have only the previous 24 hours of content.

This should be run at least once each day, twice to be sure nothing is missed.

Todo

This needs to be more cautious about reading from storage.

We can’t reread stored history for every individual item.

reader.feed_iter(document)

Iterates through the <channel><item>...</item></channel> structure of the <rss> tag describing a feed.

This emits a sequence of model.Channel and model.USCourtItem objects from the XML. No effort is made to combine the Channel and USCourtItem items; the data is emitted as a denormalized sequence.

Parameters:

document (Element) – the XML source.

Returns:

an iterable of Channel and USCourtItem

Return type:

Iterator[USCourtItem | Channel]

reader.capture(storage, feed)

Save this feed into JSON file in storage {item.date}/{item.time.hour}/items.json.

This transforms the publication date into YYYYMMDD/HH path to a file. If the file exists, this is added to the content.

If the file doesn’t exist, it’s created.

Parameters:
Return type:

None

reader.reader()

Reads all RSS feeds and saves the items. Uses common.get_config() to get the list of feeds and the storage class. Uses feed_iter() to parse XML. Uses capture() to preserve the model.USCourtItemDetail items.

Return type:

None

reader.cleaner()

Removes all old files. The configuration file provides the window size.

Uses common.get_config() to get the list of feeds and the storage class.

Return type:

None

Filter

Filter application of the Feeder Reader.

https://www.plantuml.com/plantuml/svg/VP712i8m38RlUOhWIK5VG6HuykZ51v3QPREmjAF9a25xTwkRs7bOJw7vykV_oL7KHEsu0TiO4MIZzdvrIYnuEE3wjCKGxTsL6pnrtgeB7ejYWYlNACKb3cpWDdZoYg8Xcfhm2PZUA70P_s38E1_KLSwQZqGEdxF6R29L_CVMf5SRVENg9NS3G9vtjok-Zd2FgqYcb1N0bFBTpUb4f0tICIKeOtARR-YJfvyXsP97J6fpWPztW6IPaBZL7fduHb40ZkJBTB0N

The filter() uses common to get the list of interesting docket numbers.

It scans the entire downloaded collection of items looking for “interesting” docket. It compares those with the history cache to see if there have been changes. When changes are found the cache is updated and notifications are sent.

filter.match_items(storage, path, targets, history, counts)

Examine all the items in storage, looking for interesting dockets.

The implementation is a sequence of filters…

Sadly, we’ve pre-empted the name filter() in this module. Here’s the alternative design.

source = cast(list[USCourtItemDetail], storage.read_json(path, USCourtItemDetail))
has_docket = filter(lambda item: item.item.docket, source)
has_target = filter(lambda item: any(d in item.item.docket.lower() for d in targets), has_docket)
novel = filter(lambda item: item not in history, has_target)
Parameters:
  • storage (Storage) – The persistent storage from which we can read data.

  • path (tuple[str, ...]) – The path in storage to find model.USCourtItemDetail instances.

  • targets (list[str]) – The various dockets that are interesting.

  • history (set[USCourtItemDetail]) – The previous state of the history, to see if this is new.

  • coutns – A counter to update when something new is found.

  • counts (Counter[str]) –

Return type:

Iterator[USCourtItemDetail]

filter.filter()

Reads the RSS data. Uses common.get_config() to get the list of dockers, and the storage class, and the notifier class. Uses match_items() to get items on the interesting dockets. Uses the give notifier to notify of changes.

Return type:

None

Writer

Writer application of the Feeder Reader.

https://www.plantuml.com/plantuml/svg/ZP712i8m38RlVOgmauA-W8onYti1lO_InfcvfgFj9YA-kuLDw2vpEONa-_yafFH1kZ2OJgXEUW-TbRecrgJGGvx3hZg0TUfEUW_Lm2gGPIYAcNuw2bWhF_v7UzaGoqW_yBNPEQ3fkoZQoBirJYXhUil80NomJZO8Rm4n9eFBL1EVX8kPYU8KQp3KolADchLPgKbBtCg5nvb7SfPDx2RLwtnh58l55ux7uhSTd21pABNFX10biewF-m8=

The writer() uses common to get the target output format.

This reads all the unique items from the data cache, organizes them, and then emits all of them using the selected template.

It also looks for a filter.json with the dockets considered interesting by the filter.

writer.load_indices(storage)

Reads all of the data files to organize items by court, docket, date, as well those found by the filter application.

Here’s the transformation done by the load_indices() function.

https://www.plantuml.com/plantuml/svg/ZL8zQyCm4DtrAmvDgI6TCWKS0lNW39bAXr9Aa2M5Af6bu3iFfVI_zoJRMbfeD9pktdsykvFsI6gWiVT2nEvYDS4hd-EnQHQxRHTDEh8zFbum1WaAmIXIGYpnKAtNARXjivQTIMJhvVafZWCTk41ZIJccqruX_dR0bm0-Eg5PCr5VxQJUnkx49QtuBjrESLb2_8i8jk0TKacOoqtNdIc9Cedxmi_Eansi4OloJrglRY2hoWNZjCMgohkTBTx6D1ilSJSFExVE24FBEAcxX2GejLkuCyXewOftbv-bVgCFfw_18jQRztfNABSWQL01pt7ekK3t2JugSvhTgmSfnDu6mt_c1m==
param storage:

Storage for all items.

return:

a dictionary with a collection of mappings used to organize the presentation.

Parameters:

storage (Storage) –

Return type:

dict[str, dict[str, list[USCourtItemDetail]]]

writer.write_csv(indices)

Writes a CSV-formatted extract of the “court” index. This is written to standard output.

Parameters:

indices (dict[str, dict[str, list[USCourtItemDetail]]]) – Indices mapping created by load_indices()

Return type:

None

writer.paginate_keys(keys, page_size)

Decomposes the keys in an index into pages. Returns list of page number and start-end values. This can be used to group keys into pages.

Parameters:
  • keys (list[Any]) – Keys to an index created by load_indices()

  • page_size (int) – Page size. A value of zero stops pagination and returns a [(1, (0, len))] list of page numbers and start values.

Returns:

List of tuple[page number, tuple[start, end]] values.

Return type:

list[tuple[int, tuple[int, int]]]

writer.write_template(logger, output, format, page_size, indices)

Writes the index.html, and index_page.html files for the expected four keys in the output from the load_indices() function.

Parameters:
  • logger (Logger) – A place to log interesting info.

  • output (Storage) – The Storage to which the files will be written.

  • format (str) – The format, “html” or “md”

  • page_size (int) – The page size

  • indices (dict[str, dict[str, list[USCourtItemDetail]]]) – The output from the load_indices() function

Returns:

Return type:

None

writer.writer()

Captures the RSS state as HTML or MD files. Uses common.get_config() to get the format, page size, and the Storage class. Uses load_indices() to gather the items. Uses write_template() to write HTML or MD output. Uses write_csv() to write CSV output.

Return type:

None

Control

There are two overall control implementations.

  • Local running is implemented via the monitor module.

  • Lambda execution remains TBD. A handler module is planned.

Monitor

The top-level Feeder Reader contoller when run in local model.

This will define and execute a periodic schedule of FeederReader work.

It leverages reader, filter, and writer to actually do the work.

monitor.feeder_reader()

The FeederReader Processing.

  1. reader.cleaner()

  2. reader.reader()

  3. reader.filter()

  4. writer.writer()

Return type:

None

monitor.sched_builder(logger, options)

Builds the schedule from the configuration options.

Parameters:
  • logger (Logger) – A logger to write info

  • options (dict[str, Any]) – The options for the monitor.

Return type:

None

monitor.sched_runner()

Run the schedule.

Return type:

None

monitor.main()

Main program for the monitor.

Get the configuration. Build the schedule. Run the schedule.

Return type:

None