Expansions¶
The core user story did not include any word on transformations of the data.
The following general story framework summarizes a family of essential features:
As a journalist, I want to parse titles and descriptions of RSS feeds to extract additional data encoded there so I can make better use of the available information.
This suffers from a vagueness problem. See https://xp123.com/articles/invest-in-good-stories-and-smart-tasks/ for help on decomposing this generic, meta-story into specific stories. Also see https://agileforall.com/patterns-for-splitting-user-stories/ for more guidance on decomposition of complexity.
Practically, the rss_status.title_transform()
function is a single concrete instance of the above
general story pattern. This function decomposes the title into docket and parties. There are more nuggets
of goodness buried in the description property of the RSS item.
There’s a tiny problem with this function. We’ll look at two versions of adding transformations. The first way to add transformations is simple and applies to the case were a few, very simple things need to be done. Emphasis on “few”.
Then we’ll look at a more sophisticated rewrite that might be in order.
Simple Expansion¶
How do we expand the transformation steps? There are several parts to making a few, simple changes:
Write the additional transform steps inside the
rss_status.title_transform()
function.Right now, the code looks like this:
docket, _, parties_title = row.title.partition(' ') _, _, real_docket = docket.partition(":")
Which partitions the title on a space, and the partitions the docket on a
`":"
.Test the function in isolation. This is best done with examples, as shown in the
rss_status.title_transform()
function. The Python doctest tool can confirm the example is correct.Here’s the OS-level command that runs doctest.
slott$ python3 -m doctest -v rss_status.py
Here’s the bottom two lines of the output.
7 passed and 0 failed. Test passed.
All of the examples worked as predicted.
Expand the
rss_status.ExpandedRSS
class to include the additional attributes built by the new transform function(s).
Because the structure of the rss_status.ExpandedRSS
class has changed, the next daily
feed will appear to be entirely new. And – clearly – the new data items make it appear all new.
There are ways to limit the scope of comparison to check for newness, but the problem of
“it appears all new after I made a change” doesn’t appear to be large enough to yield significant
benefit from additional complexity.
More Sophistication¶
The rss_status.title_transform()
function does two things.
- It builds a list of
rss_status.ExpandedRSS
items. - It applies a series of transformational steps to each
rss_status.SourceRSS
item to create the resultingrss_status.ExpandedRSS
items.
We can refactor the function to break it into poeces and make it quite a bit easier to expand.
First. The list-building is really this.
todays_data = list(transform(xml_reader(url)))
This depends on a new transform()
function.
def transform(source: Iterable[SourceRSS]) -> Iterator[ExpandedRSS]:
for item in source:
parties_title, real_docket = title_parse(row.title)
# other extraction goes here
yield ExpandedRSS(
title=row.title,
link=row.link,
description=row.description,
pubDate=row.pubDate,
docket=real_docket,
parties_title=parties_title,
# Other field values go here
)
The title_parse()
function looks like this.
def title_parse(title: str) -> Tuple[str, str]:
docket, _, parties_title = title.partition(' ')
_, _, real_docket = docket.partition(":")
return parties_title, real_docket
We can now easily add more things that are like title_parse()
.
Since this is so small, it’s easy to test in isolation. We can write many
functions like this, and bundle them into the overall transform()
function
to add additional, derived attributes of each item.