pyWeb Literate Programming 3.2

Yet Another Literate Programming Tool

Introduction

Literate programming was pioneered by Knuth as a method for developing readable, understandable presentations of programs. These would present a program in a literate fashion for people to read and understand; this would be in parallel with presentation as source text for a compiler to process and both would be generated from a common source file.

One intent is to synchronize the program source with the documentation about that source. If the program and the documentation have a common origin, then the traditional gaps between intent (expressed in the documentation) and action (expressed in the working program) are significantly reduced.

py-web-lp is a literate programming tool that combines the actions of weaving a document with tangling source files. It is independent of any source language. While is designed to work with RST document markup, it should be amenable to any other flavor of markup. It uses a small set of markup tags to define chunks of code and documentation.

Background

The following is an almost verbatim quote from Briggs’ nuweb documentation, and provides an apt summary of Literate Programming.

In 1984, Knuth introduced the idea of literate programming and described a pair of tools to support the practise (Donald E. Knuth, “Literate Programming”, The Computer Journal 27 (1984), no. 2, 97-111.) His approach was to combine Pascal code with TeX documentation to produce a new language, WEB, that offered programmers a superior approach to programming. He wrote several programs in WEB, including weave and tangle, the programs used to support literate programming. The idea was that a programmer wrote one document, the web file, that combined documentation written in TeX (Donald E. Knuth, TeX book, Computers and Typesetting, 1986) with code (written in Pascal).

Running tangle on the web file would produce a complete Pascal program, ready for compilation by an ordinary Pascal compiler. The primary function of tangle is to allow the programmer to present elements of the program in any desired order, regardless of the restrictions imposed by the programming language. Thus, the programmer is free to present his program in a top-down fashion, bottom-up fashion, or whatever seems best in terms of promoting understanding and maintenance.

Running weave on the web file would produce a TeX file, ready to be processed by TeX. The resulting document included a variety of automatically generated indices and cross-references that made it much easier to navigate the code. Additionally, all of the code sections were automatically prettyprinted, resulting in a quite impressive document.

Knuth also wrote the programs for TeX and METAFONT entirely in WEB, eventually publishing them in book form. These are probably the largest programs ever published in a readable form.

Other Tools

Numerous tools have been developed based on Knuth’s initial work. A relatively complete survey is available at sites like Literate Programming, and the OASIS XML Cover Pages: Literate Programming with SGML and XML.

The immediate predecessors to this py-web-lp tool are FunnelWeb, noweb and nuweb. The ideas lifted from these other tools created the foundation for py-web-lp.

There are several Python-oriented literate programming tools. These include LEO, interscript, lpy, py2html, PyLit-3

The FunnelWeb tool is independent of any programming language and only mildly dependent on TeX. It has 19 commands, many of which duplicate features of HTML or LaTeX.

The noweb tool was written by Norman Ramsey. This tool uses a sophisticated multi-processing framework, via Unix pipes, to permit flexible manipulation of the source file to tangle and weave the programming language and documentation markup files.

The nuweb Simple Literate Programming Tool was developed by Preston Briggs (preston@tera.com). His work was supported by ARPA, through ONR grant N00014-91-J-1989. It is written in C, and very focused on producing LaTeX documents. It can produce HTML, but this is clearly added after the fact. It cannot be easily extended, and is not object-oriented.

The LEO tool is a structured GUI editor for creating source. It uses XML and noweb-style chunk management. It is more than a simple weave and tangle tool.

The interscript tool is very large and sophisticated, but doesn’t gracefully tolerate HTML markup in the document. It can create a variety of markup languages from the interscript source, making it suitable for creating HTML as well as LaTeX.

The lpy tool can produce very complex HTML representations of a Python program. It works by locating documentation markup embedded in Python comments and docstrings. This is called “inverted literate programming”.

The py2html tool does very sophisticated syntax coloring.

The PyLit-3 tool is perhaps the very best approach to Literate programming, since it leverages an existing lightweight markup language and it’s output formatting. However, it’s limited in the presentation order, making it difficult to present a complex Python module out of the proper Python required presentation.

py-web-lp

py-web-lp works with any programming language. It can work with any markup language, but is currently configured to work with RST. This philosophy comes from FunnelWeb noweb, nuweb and interscript. The primary differences between py-web-lp and other tools are the following.

  • py-web-lp is object-oriented, permitting easy extension. noweb extensions are separate processes that communicate through a sophisticated protocol. nuweb is not easily extended without rewriting and recompiling the C programs.

  • py-web-lp is built in the very portable Python programming language. This allows it to run anywhere that Python 3.3 runs, with only the addition of docutils. This makes it a useful tool for programmers in any language.

  • py-web-lp is much simpler than FunnelWeb, LEO or Interscript. It has a very limited selection of commands, but can still produce complex programs and HTML documents.

  • py-web-lp does not invent a complex markup language like Interscript. Because Iterscript has its own markup, it can generate LaTeX or HTML or other output formats from a unique input format. While powerful, it seems simpler to avoid inventing yet another sophisticated markup language. The language py-web-lp uses is very simple, and the author’s use their preferred markup language almost exclusively.

  • py-web-lp supports the forward literate programming philosophy, where a source document creates programming language and markup language. The alternative, deriving the document from markup embedded in program comments (“inverted literate programming”), seems less appealing. The disadvantage of inverted literate programming is that the final document can’t reflect the original author’s preferred order of exposition, since that informtion generally isn’t part of the source code.

  • py-web-lp also specifically rejects some features of nuweb and FunnelWeb. These include the macro capability with parameter substitution, and multiple references to a chunk. These two capabilities can be used to grow object-like applications from non-object programming languages (e.g. C or Pascal). Since most modern languages (Python, Java, C++) are object-oriented, this macro capability is more of a problem than a help.

  • Since py-web-lp is built in the Python interpreter, a source document can include Python expressions that are evaluated during weave operation to produce time stamps, source file descriptions or other information in the woven or tangled output.

py-web-lp works with any programming language; it can work with any markup language. The initial release supports RST via simple templates.

The following is extensively quoted from Briggs’ nuweb documentation, and provides an excellent background in the advantages of the very simple approach started by nuweb and adopted by py-web-lp.

The need to support arbitrary programming languages has many consequences:

No prettyprinting:

Both WEB and CWEB are able to prettyprint the code sections of their documents because they understand the language well enough to parse it. Since we want to use any language, we’ve got to abandon this feature. However, we do allow particular individual formulas or fragments of LaTeX or HTML code to be formatted and still be part of the output files.

Limited index of identifiers:

Because WEB knows about Pascal, it is able to construct an index of all the identifiers occurring in the code sections (filtering out keywords and the standard type identifiers). Unfortunately, this isn’t as easy in our case. We don’t know what an identifier looks like in each language and we certainly don’t know all the keywords. We provide a mechanism to mark identifiers, and we use a pretty standard pattern for recognizing identifiers almost most programming languages.

Of course, we’ve got to have some compensation for our losses or the whole idea would be a waste. Here are the advantages I [Briggs] can see:

Simplicity:

The majority of the commands in WEB are concerned with control of the automatic prettyprinting. Since we don’t prettyprint, many commands are eliminated. A further set of commands is subsumed by LaTeX and may also be eliminated. As a result, our set of commands is reduced to only about seven members (explained in the next section). This simplicity is also reflected in the size of this tool, which is quite a bit smaller than the tools used with other approaches.

No prettyprinting:

Everyone disagrees about how their code should look, so automatic formatting annoys many people. One approach is to provide ways to control the formatting. Our approach is simpler – we perform no automatic formatting and therefore allow the programmer complete control of code layout.

Control:

We also offer the programmer reasonably complete control of the layout of his output files (the files generated during tangling). Of course, this is essential for languages that are sensitive to layout; but it is also important in many practical situations, e.g., debugging.

Speed:

Since [py-web-lp] doesn’t do too much, it runs very quickly. It combines the functions of tangle and weave into a single program that performs both functions at once.

Chunk numbers:

Inspired by the example of noweb, [py-web-lp] refers to all program code chunks by a simple, ascending sequence number through the file. This becomes the HTML anchor name, also.

Multiple file output:

The programmer may specify more than one output file in a single [py-web-lp] source file. This is required when constructing programs in a combination of languages (say, Fortran and C). It’s also an advantage when constructing very large programs.

Acknowledgements

This application is very directly based on (derived from?) work that

preceded this, particularly the following:

Also, after using John Skaller’s interscript http://interscript.sourceforge.net/ for two large development efforts, I finally understood the feature set I really wanted.

Jason Fruit and others contributed to the previous version.

Installing

This requires Python 3.10.

This is not (currently) hosted in PyPI. Instead of installing it with PIP, clone the GitHub repository or download the distribution kit.

After downloading, install pyweb “manually” using the provided setup.py.

python setup.py install

This will install the pyweb module.

This depends on Jinja2 templates. The Jinja components should be installed when setup.py uses requirements.txt to install the required components.

Using

py-web-lp supports two use cases, Tangle Source Files and Weave Documentation. These are often combined to both tangle and weave an application and it’s documentation. The work starts with creating a WEB file with documentation and code.

Create WEB File

See The py-web-lp Markup Language for more details on the language. For a simple example, we’ll use the following WEB file: examples/hw.w.

###########
Hello World
###########

This file has a small example.

@d The Body Of The Script @{
print("Hello, World!")
@}

The Python module includes a small script.

@o hw.py @{
@<The Body...@>
@}

This example has RST markup document, that includes some @d and @o chunks to define code blocks. The @d is the definition of a named chunk, The Body Of The Script. The @o defines an output file to be tangled. This file has a reference to the The Body Of The Script chunk.

When tangling, the code will be used to build the file(s) in the @o chunk(s). In this example, it will write the hw.py file by tangling the referenced chunk.

When weaving, the @d and @o chunks will have some additional RST markup inserted into the document. The output file will have a name based on the source WEB document. In this case it will be hw.rst.

Tangle Source Files

A user initiates this process when they have a complete .w file that contains a description of source files. These source files are described with @o commands in the WEB file.

The use case is successful when the source files are produced.

The use case is a failure when the source files cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

A typical command to tangle (without weaving) is:

python -m pyweb -xw examples/hw.w -o examples

The outputs will be defined by the @o commands in the source. The -o option writes the resulting tangled files to the named directory.

Weave Documentation

A user initiates this process when they have a .w file that contains a description of a document to produce. The document is described by the entire WEB file. The default is to use ReSTructured Text (RST) markup. The output file will have the .rst suffix.

The use case is successful when the documentation file is produced.

The use case is a failure when the documentation file cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

A typical command to weave (without tangling) is:

python -m pyweb -xt examples/hw.w -o examples

The output will be named examples/hw.rst. The -o option made sure the file was written to the examples directory.

Running py-web-lp to Tangle and Weave

Assuming that you have marked pyweb.py as executable, you do the following:

python -m pyweb examples/hw.w -o examples

This will tangle the @o commands in examples/hw.w It will also weave the output, and create examples/hw.rst. This can be processed by docutils to create an HTML file.

Command Line Options

Currently, the following command line options are accepted.

-v:

Verbose logging.

-s:

Silent operation.

-c x:

Change the command character from @ to *x*.

-w weaver:

Choose a particular documentation weaver template. Currently the choices are rst, tex, and html.

-xw:

Exclude weaving. This does tangling of source program files only.

-xt:

Exclude tangling. This does weaving of the document file only.

-p command:

Permit errors in the given list of commands. The most common version is -pi to permit errors in locating an include file. This is done in the following scenario: pass 1 uses -xw -pi to exclude weaving and permit include-file errors; the tangled program is run to create test results; pass 2 uses -xt to exclude tangling and include the test results.

-o directory:

The directory to which to write output files.

Bootstrapping

py-web-lp is written using py-web-lp. The distribution includes the original .w files as well as a .py module.

The bootstrap procedure is to run a “known good” pyweb to transform a working copy into a new version of pyweb. We provide the previous release in the bootstrap directory.

python bootstrap/pyweb.py pyweb.w
rst2html.py pyweb.rst pyweb.html

The resulting pyweb.html file is the updated documentation. The pyweb.py is the updated candidate release of py-web-lp.

Similarly, the tests built from a .w files.

python pyweb.py tests/pyweb_test.w -o tests
PYTHONPATH=.. pytest
rst2html.py tests/pyweb_test.rst tests/pyweb_test.html

Dependencies

py-web-lp requires Python 3.10 or newer.

The following components are listed in the requirements.txt file. These can be loaded via

python -m pip install -r requirements.txt

This lp uses Jinja for template processing.

The tomli library is used to parse configuration files for older Python that lack a tomllib in the standard library.

If you create RST output, you’ll want to use either docutils or Sphinx to translate the RST to HTML or LaTeX or any of the other formats supported by docutils or Sphinx. This is not a proper requirement to run the tool. It’s a common part of an overall document production tool chain.

The overview contains PlantUML diagrams. See https://plantuml.com/ for more information. The PlantUML for Sphinx plug-in can be used to render the diagrams automatically.

For development, additional components like pytest, tox, and mypy are also used for development.

More Advanced Usage

Here are two more advanced use cases.

Tangle, Test, and Weave with Test Results

A user initiates this process when the final document should include test output from the source files created by the tangle operation. This is an extension to the example shown earlier.

###########
Hello World
###########

This file has a small example.

@d The Body Of The Script @{
print("Hello, World!")
@}

The Python module includes a small script.

@o hw.py @{
@<The Body...@>
@}

Example Output
==============

@i examples/hw_output.log

The use case is successful when the documentation file is produced, including current test output.

The use case is a failure when the documentation file cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

The use case is a failure when the documentation file does not include current test output.

The sequence is as follows:

python -m pyweb -xw -pi examples/hw.w -o examples
python examples/hw.py >examples/hw_output.log
python -m pyweb -xt examples/hw.w -o examples

The first step uses -xw to excludes document weaving. The -pi option will permits errors on the @i command. This is necessary in the event that the log file does not yet exist.

The second step runs the test, creating a log file.

The third step weaves the final document, including the test output file. The -xt option excludes tangling, since output file had already been produced.

Template Changes

The woven document is based – primarily – on the text in the source WEB file. This is processed using a small set of Jinja2 macros to modify behavior. To fine-tune the results, we can adjust the templates used by this application.

The easiest way to do this is to work with the weave.py script which shows how to create a customized subclass of Weaver. The Handy Scripts and Other Files section shows this script and how it’s build from a few pyweb components.

The py-web-lp Markup Language

The essence of literate programming is a markup language that includes both code from documentation. For tangling, the code is relevant. For weaving, both code and documentation are relevant.

The source document is a “Web” documentation that includes the code. It’s important to see the .w file as the final documentation. The code is tangled out of the source web.

The py-web-lp tool parses the .w file, and performs the tangle and weave operations. It tangles each individual output file from the program source chunks. It weaves the final documentation file file from the entire sequence of chunks provided, mixing the author’s original documentation with some markup around the embedded program source.

Concepts

The .w file has two tiers of markup in it.

  • At the top, it has py-web-lp markup to distinguish documentation chunks from code chunks.

  • Within the documentation chunks, there can be markup for the target publication tool chain. This might be RST, LaTeX, HTML, or some other markup language.

The py-web-lp markup decomposes the source document a sequence of Chunks.

object web
object chunk
object documentation
object "source code" as code

web *-- chunk
chunk *-- documentation
chunk *-- code

The Web chunks have the following two overall sets of features:

  • Program source code to be tangled and woven. There are two important varieties: the “defined” chunks that are named, and the “output” chunks that define a file to be written. Program code chunks can have references to other defined code chunks. This permits created output files that tangled into a compiler-friendly order, separate from the presentation.

  • Documentation to be woven. These are the blocks of text between commands.

The bulk of the file is typically documentation chunks that describe the program in some publication-oriented markup language like RST, HTML, or LaTeX.

py-web-lp markup surrounds the code with “commands.” Everything else is documentation.

The code chunks have two transformations applied.

  • When Tangling, the indentation is adjusted to match the context in which they were originally defined. This assures that Python (which relies on indentation) parses correctly. For other languages, proper indentation is expected but not required.

  • When Weaving, selected characters can be quoted so they don’t break the publication tool. For HTML, &, <, > are quoted properly. For LaTeX, a few escapes are used to avoid problems with the fancyvrb environment.

The non-code, documentation chunks are not transformed up in any way. Everything that’s not explicitly a code chunk is output without modification.

All of the py-web-lp tags begin with @. This is sometimes called the command prefix. (This can be changed.) The tags were historically referred to as “commands.” For Python decorators in particular, the symbol must be doubled, @@, because all @ symbols are commands, irrespective of context.

The Structural tags (historically called “major commands”) partition the input and define the various chunks. The Inline tags are (called “minor commands”) are used to control the woven and tangled output from the defined chunks. There are Content tags which generate summary cross-reference content in woven files.

Boilerplate

There is some mandatory “boilerplate” required to make a working document. Requirements vary by markup language.

LaTeX

The LaTeX templates use \\fancyvrb. The following is required.

\\usepackage{fancyvrb}

Some minimal boilerplate document looks like this:

documentclass{article}
usepackage{fancyvrb}
title{ Title }
author{ Author }

begin{document}

maketitle
tableofcontents

Your Document Starts Here

end{document}
HTML

There’s often a fairly large amount of HTML boilerplate. Currently, the templates used do not provide any CSS classes. For more sophisticated HTML documents, it may be necessary to provide customized templates with CSS classes to make the document look good.

Structural Tags

There are two definitional tags; these define the various chunks in an input file.

@o file @{ text @}

The @o (output) command defines a named output file chunk. The text is tangled to the named file with no alteration. It is woven into the document in an appropriate fixed-width font.

There are options available to specify comment conventions for the tangled output; this allows inclusion of source line numbers.

@d name @{ text @}

The @d (define) command defines a named chunk of program source. This text is tangled or woven when it is referenced by the reference inline tag.

There are options available to specify the indentation for this particular chunk. In rare cases, it can be helpful to override the indentation context.

Each @o and @d tag is followed by a chunk which is delimited by @{ and @} tags. At the end of that chunk, there is an optional “major” tag.

@|

A chunk may define user identifiers. The list of defined identifiers is placed in the chunk, separated by the @| separator.

Additionally, these tags provide for the inclusion of additional input files. This is necessary for decomposing a long document into easy-to-edit sections.

@i file

The @i (include) command includes another file. The previous chunk is ended. The file is processed completely, then a new chunk is started for the text after the @i command.

All material that is not explicitly in a @o or @d named chunk is implicitly collected into a sequence of anonymous document source chunks. These anonymous chunks form the backbone of the document that is woven. The anonymous chunks are never tangled into output program source files. They are woven into the document without any alteration.

Note that white space (line breaks ('\n'), tabs and spaces) have no effect on the input parsing. They are completely preserved on output.

The following example has three chunks:

Some RST-format documentation that describes the following piece of the
program.

@o myFile.py
@{
import math
print( math.pi )
@| math math.pi
@}

Some more RST documentation.

This starts with an anonymous chunk of documentation. It includes a named output chunk which will write to myFile.py. It ends with an anonymous chunk of documentation.

Inline Tags

There are several tags that are replaced by content in the woven output.

@@

The @@ command creates a single @ in the output file. This is replaced in tangled as well as woven output.

@<name@>

The name references a named chunk. When tangling, the referenced chunk replaces the reference command. When weaving, a reference marker is used. For example, in RST, this can be replaced with RST `reference`_ markup. Note that the indentation prior to the @< tag is preserved for the tangled chunk that replaces the tag.

@(Python expression@)

The Python expression is evaluated and the result is tangled or woven in place. A few global variables and modules are available. These are described in Expression Context.

Content Tags

There are three index creation tags that are replaced by content in the woven output.

@f

The @f command inserts a file cross reference. This lists the name of each file created by an @o command, and all of the various chunks that are concatenated to create this file.

@m

The @m command inserts a named chunk (“macro”) cross reference. This lists the name of each chunk created by a @d command, and all of the various chunks that are concatenated to create the complete chunk.

@u

The @u command inserts a user identifier cross reference. This index lists the name of each chunk created by an @d command or @|, and all of the various chunks that are concatenated to create the complete chunk.

Additional Features

Sequence Numbers. The named chunks (from both @o and @d commands) are assigned unique sequence numbers to simplify cross references.

Case Sensitive. Chunk names and file names are case sensitive.

Abbreviations. Chunk names can be abbreviated. A partial name can have a trailing ellipsis (…), this will be resolved to the full name. The most typical use for this is shown in the following example:

Some RST-format documentation.

@o myFile.py
@{
@<imports of the various packages used@>
print(math.pi,time.time())
@}

Some notes on the packages used.

@d imports...
@{
import math,time
@| math time
@}

Some more RST-format documentation.

This example shows five chunks.

  1. An anonymous chunk of documentation.

  2. A named chunk that tangles the myFile.py output. It has a reference to the imports of the various packages used chunk. Note that the full name of the chunk is essentially a line of documentation, traditionally done as a comment line in a non-literate programming environment.

  3. An anonymous chunk of documentation.

  4. A named chunk with an abbreviated name. The imports... matches the name imports of the various packages used. Set off after the @| separator is the list of user-specified identifiers defined in this chunk.

  5. An anonymous chunk of documentation.

Note that the first time a name appears (in a reference or definition), it must be the full name. All subsequent uses can be elisions. Also not that ambiguous elision is an annoying problem when you first start creating a document.

Concatenation. Named chunks are concatenated from their various pieces. This allows a named chunk to be broken into several pieces, simplifying the description. This is most often used when producing fairly complex output files.

An anonymous chunk with some RST documentation.

@o myFile.py
@{
import math, time
@}

Some notes on the packages used.

@o myFile.py
@{
print(math.pi, time.time())
@}

Some more HTML documentation.

This example shows five chunks.

  1. An anonymous chunk of documentation.

  2. A named chunk that tangles the myFile.py output. It has the first part of the file. In the woven document this is marked with "=".

  3. An anonymous chunk of documentation.

  4. A named chunk that also tangles the myFile.py output. This chunk’s content is appended to the first chunk. In the woven document this is marked with "+=".

  5. An anonymous chunk of documentation.

Newline Preservation. Newline characters are preserved on input. Because of this the output may appear to have excessive newlines. In all of the above examples, each named chunk was defined with the following.

@{
import math, time
@}

This puts a newline character before and after the import line.

Controlling Indentation

We have two choices in indentation:

  • Context-Sensitive.

  • Consistent.

If we have context-sensitive indentation, then the indentation of a chunk reference is applied to the entire chunk when expanded in place of the reference. This makes it simpler to prepare source for languages (like Python) where indentation is important.

There are cases, however, when this is not desirable. There are some places in Python where we want to create long, triple-quoted strings with indentation that does not follow the prevailing indentations of the surrounding code.

Here’s how the context-sensitive indentation works.

@o myFile.py
@{
def aFunction(a, b):
    @<body of aFunction@>
@| aFunction @}

@d body...
@{
"""doc string"""
return a + b
@}

The tangled output from this will look like the following. All of the newline characters are preserved, and the reference to body of the aFunction is indented to match the prevailing indent where it was referenced. In the following example, explicit line markers of ~ are provided to make the blank lines more obvious.

~
~def aFunction(a, b):
~
~    """doc string"""
~    return a + b
~

[The @| command shows that this chunk defines the identifier aFunction.]

This leads to a difficult design choice.

  • Do we use context-sensitive indentation without any exceptions? This is the current implementation.

  • Do we use consistent indentation and require the author to get it right? This seems to make Python awkward, since we might indent our outdent a @< name @> command, expecting the chunk to indent properly.

  • Do we use context-sensitive indentation with an exception indicator? This seems to go against the utter simplicity we’re cribbing from noweb. However, it makes a great deal of sense to add an option for @d chunks to supersede context-sensitive indentation. The author must then get it right.

    The syntax to define a section looks like this:

@d -noindent some chunk name
@{First partial line
More that uses """
@}

We might reference such a section like this.

@d some bigger chunk...
@{code
    @<some chunk name@>
@}

This will include the -noindent section by resetting the contextual indentation to zero. The First partial line line will be output after the four spaces provided by the some bigger chunk context.

After the first newline (More that uses “””) will be at the left margin.

Tracking Source Line Numbers

Since the tangled output files are – well – tangled, it can be difficult to trace back from a Python error stack to the original line in the .w file that needs to be fixed.

To facilitate this, there is a two-step operation to get more detailed information on how tangling worked.

  1. Use the -n command-line option to get line numbers.

  2. Include comment indicators on the @o commands that define output files.

The expanded syntax for @o looks like this.

@o -start /* -end */ page-layout.css
@{
Some CSS code
@}

We’ve added two options: -start /* and -end */ which define comment start and end syntax. This will lead to comments embedded in the tangled output which contain source line numbers for every (every!) chunk.

Expression Context

There are two possible implementations for evaluation of a Python expression in the input.

  1. Create an ExpressionCommand, and append this to the current Chunk. This will allow evaluation during weave processing and during tangle processing. This makes the entire weave (or tangle) context available to the expression, including completed cross reference information.

  2. Evaluate the expression during input parsing, and append the resulting text as a TextCommand to the current Chunk. This provides a common result available to both weave and parse, but the only context available is the WebReader and the incomplete Web, built up to that point.

In this implementation, we adopt the latter approach, and evaluate expressions immediately. A global context is created with the following variables defined.

os.path:

This is the standard os.path module.

os.getcwd:

The complete os module is not available. Just this function.

datetime:

This is the standard datetime module.

time:

The standard time module.

platform:

This is the standard platform module.

__builtins__:

Most of the built-ins are available, too. Not all. exec(), eval(), open() and __import__() aren’t available.

theLocation:

A tuple with the file name, first line number and last line number for the original expression’s location.

theWebReader:

The WebReader instance doing the parsing.

theFile:

The .w file being processed.

thisApplication:

The name of the running py-web-lp application. It may not be pyweb.py, if some other script is being used.

__version__:

The version string in the py-web-lp application.

Architecture and Design Overview

This application breaks the overall problem of literate programming into the following sub-problems.

  1. Representation of the WEB document as Chunks and Commands

  2. Reading and parsing the input WEB document.

  3. Weaving a document file.

  4. Tangling the desired program source files.

Here’s the overall Context Diagram for this.

left to right direction
skinparam actorStyle awesome

actor "Developer" as Dev
rectangle PyWeb {
    usecase "Tangle Source" as UC_Tangle
    usecase "Weave Document" as UC_Weave
}
rectangle IDE {
    usecase "Create WEB" as UC_Create
    usecase "Run Tests" as UC_Test
    usecase "Build Documentation" as UC_Doc
    usecase "Build Application" as UC_App
}
database WEB
component App
folder Documentation

Dev --> UC_Create
Dev --> UC_Test
Dev --> UC_Doc
Dev --> UC_App

UC_Create --> WEB
WEB --> UC_Tangle
WEB --> UC_Weave

UC_Tangle --> App
UC_Weave --> Documentation

UC_Test ..> UC_Tangle
UC_Doc ..> UC_Weave
UC_App ..> UC_Tangle

The idea here is a central WEB document contains both the application source code and the documentation that describes the code. The documentation can present information in an order that’s meaningful and helpful to people; the tangling operation orders this for the benefit of compilers and tools.

Since this is often part of an Integrated Development Environment (IDE), the container for all of these software components is the developer’s desktop. (We don’t need a diagram for that.)

Here’s a summary of the application-level components. These are the most visible libraries and command-line applications.

component pyweb
package jinja
pyweb ..> jinja

package templates
pyweb *-- templates
jinja ..> templates

component weave
weave ..> pyweb

component tangle
tangle ..> pyweb

The weave and tangle are convenient scripts that import and customize the underlying pyweb application. We’ve used the dotted “depends-on” arrow to depict this. The pyweb application depends on Jinja2 to define the various templates for weaving the output documents. The pyweb application contains the templates; this is shown with a solid line.

We can modify the templates to alter the look and feel. The supplied weave.py script shows how to do this.

In many cases, the final production will multiple steps, as shown below:

database WEB
component pyweb
artifact ".rst File" as RST
component sphinx
artifact ".html File" as HTML

WEB --> pyweb
pyweb --> RST
RST --> sphinx
sphinx --> HTML

We can use pyweb-lp to create an .rst file with the documentation. This is then processed by Sphinx to inject a Sphinx theme and necessary CSS to make responsive web document(s).

This is often automated with a Makefile.

Overall Structure

Generally, the code breaks into three functional areas

  • The core representation of a WEB.

  • A parser to read the source WEB.

  • The emitters to produce woven and tangled output, which include weavers and tanglers.

We could depict it as follows:

folder core {
    class Web
    class Chunk
    abstract class Command
    Web *-- "1..*" Chunk
    Chunk *-- "1..*" Command
}
folder parser {
    class WebReader
    WebReader --> Web
}
folder emitters {
    abstract class Emitter
    class Tangler
    class Weaver
    Emitter <|-- Tangler
    Emitter <|-- Weaver
    Emitter --> Web
}

We’ll look at the core model, first.

Core WEB Representation

The basic structure has three layers, as shown in the following diagram:

class Web << dataclass >> {
    chunks: list[Chunk]
}
class Chunk {
    name: str
    commands: list[Command]
}
abstract class Command

Web *-- "1..*" Chunk
Chunk *-- "1..*" Command

class CodeChunk
Chunk <|-- CodeChunk

class NamedChunk
Chunk <|-- NamedChunk

class OutputChunk
Chunk <|-- OutputChunk

class NamedCodeChunk
Chunk <|-- NamedCodeChunk

class TextCommand
Command <|-- TextCommand

class CodeCommand
Command <|-- CodeCommand

class ReferenceCommand
Command <|-- ReferenceCommand

class XRefCommand
Command <|-- XRefCommand

class FileXRefCommand
XRefCommand <|-- FileXRefCommand

class MacroXRefCommand
XRefCommand <|-- MacroXRefCommand

class UseridXRefCommand
XRefCommand <|-- UseridXRefCommand

The source document is transformed into a Web, which is the overall container. The source is decomposed into a sequence of Chunk instances. Each Chunk is a sequence of Commands.

Chunk objects and Command objects cannot be nested, leading to delightful simplification.

The overall Web includes both the original sequence of Chunk objects as well as an index for the named Chunk instances.

Note that a named chunk may be created through a number of @d commands. This means that each named Chunk may be a sequence of definitions sharing a common name. They are concatenated in order to permit decomposing a single concept into sequentially described pieces.

The various layers of Web, Chunk, and Command each have attributes designed to be usable by a Jinja template when weaving output. When tangling, however, the only attribute that matters is the text contained in the @{ and @} brackets. This makes tangling somewhat simpler than weaving.

There is a small interaction between a Tangler and each Chunk to work out the indentation. based in the context in which a @< name @> reference occurs.

Reading and Parsing

class Web
class WebReader {
    parse(source) : Web
}
WebReader ..> Web
class Tokenizer
WebReader ..> Tokenizer

class OptionParser

class OptionDef

OptionParser *-- OptionDef

WebReader ..> OptionParser

A solution to the reading and parsing problem depends on a convenient tool for breaking up the input stream and a representation for the chunks of input and the sequence of commands. Input decomposition is done with something we might call the Splitter design pattern.

The Splitter pattern is widely used in text processing, and has a long legacy in a variety of languages and libraries. A Splitter decomposes a string into a sequence of strings using some split pattern. There are many variant implementations. For example, one variant locates only a single occurence (usually the left-most); this is commonly implemented as a Find or Search string function. Another variant locates all occurrences of a specific string or character, and discards the matching string or character.

The variation on Splitter in this application creates each element in the resulting sequence as either (1) an instance of the split regular expression or (2) the text between split patterns.

We define our splitting pattern with the regular expression '@.|\n'. This will split on either of these patterns:

  • @ followed by a single character,

  • or, a newline.

For the most part, \n is only text, and as almost no special significance. The exception is the @i filename command, which ends at the end of the line, making the \n significant syntax in this case.

We could be more specific with the following as a split pattern: '@[doOifmu\|<>(){}\[\]]|\n'. This would silently ignore unknown commands, merging them in with the surrounding text. This would leave the '@@' sequences completely alone, allowing us to replace '@@' with '@' in every text chunk. It’s not clear this additional level of detail is helpful.

Within the @d and @o commands, there is a name and options. These follow the syntax rules for Tcl or the shell. Optional fields are prefaced with -. All options must come before all positional arguments. The positional arguments provide the name being defined. In effect, the name is ' '.join(args.split(' '); this means multiple adjacent spaces in a name will be collapsed to a single space.

Emitters

There are two possible outputs:

  • A woven document.

  • One or more tangled source files.

The overall structure of the classes is shown in the following diagram.

class Web

abstract class Emitter {
    emit(web)
}

Emitter ..> Web

class Weaver
Emitter <|-- Weaver

class Tangler
Emitter <|-- Tangler

class TanglerMake
Tangler <|-- TanglerMake

abstract class ReferenceStyle
Weaver --> ReferenceStyle

class Simple
ReferenceStyle <|-- Simple

class Transitive
ReferenceStyle <|-- Transitive

class Template
Weaver --> Template

class "Jinja Macro" as macro
Template *-- macro

We’ll look at the weaving activity first, then the tangling activity.

Weaving

The weaving activity depends on having a target document markup language. There are several approaches to this problem.

  • We can use a markup language unique to py-web-lp. This would hide the final target markup language. It would mean that py-web-lp would be equivalent to a tool like Pandoc, producing a variety of target markup languages from a single, common source.

  • We can use any of the existing markup languages (HTML, RST, Markdown, LaTeX, etc.) expand snippets of markup into author-supplied markup to create the target woven document.

The problem with the first method is defining yet-another-markup-language. This seems needlessly complex.

The problem with the second method is the source WEB file is a mixture of the following two things:

  • The background document in some standard markup and

  • The code elements.

The code elements must be set off from the background text via some markup. In languages like RST and Markdown, there’s a small textual wrapper around code samples. In languages like HTML, the wrapper can be much more complex. Also, certain code characters may need to be properly escaped if the code sample happens to contain markup that should not be processed, but treated as literal text.

The author should not be foreced to repeat the wrappers around each code examples. This should be delegated to the literate programming tool. Further, the author should not be narrowly constrained by the markup injected by the weaving process; the weaver should be extensible to add features.

This leads to using the Facade design pattern. The weaver is a Facade over the Jinja template engine. The tool provides default templates in RST, HTML, and LaTeX. These can be replaced; new templates can be added. The templates used to wrap code sections can be tweaked relatively easily.

Tangling

The tangling activity produces output files. In other tools, some care was taken to understand the source code context for tangling, and provide a correct indentation. This required a command-line parameter to turn off indentation for languages like Fortran, where identation is not used.

In py-web-lp, there are two options. The default behavior is that the indent of a @< name @> command is used to set the indent of the material is expanded in place of this reference. If all @< commands are presented at the left margin, no indentation will be done. This is helpful simplification, particularly for users of Python, where indentation is significant.

In rare cases, we might need both, and a @d chunk can override the indentation rule to force the material to be placed at the left margin.

Application

The overall application has the following layers to it:

  • An Action class hierarchy that includes the actions of Load, Tangle, and Weave.

  • An overall Application class that executes the actions.

  • A top-level main function parses the command line, creates and configures the actions, and executes the sequence of actions.

The idea is that the Weaver Action should be visible to tools like PyInvoke. We want Weave("someFile.w") to be a sensible task.

abstract class Action

class ActionSequence
Action <|-- ActionSequence
ActionSequence *-- "2..m" Action

class LoadAction
Action <|-- LoadAction

class WeaveAction
Action <|-- WeaveAction

class TangleAction
Action <|-- TangleAction

class Application

Application *-- Action

This shows the essential structure of the top-level classes.

Implementation

The implementation is contained in a single Python module defining the all of the classes and functions, as well as an overall main() function. The main() function uses these base classes to weave and tangle the output files.

The broad outline of the presentation is as follows:

  • Base Classes that define a model for the .w file.

    • Web Class contains the overall Web of Chunks. A Web is a sequence of Chunk objects. It’s also a mapping from chunk name to definition.

    • Chunk Class Hierarchy are pieces of the source document, built into a Web. A Chunk is a collection of Command instances. This can be either an anonymous chunk that will be sent directly to the output, or a named chunks delimited by the structural @d or @o commands.

    • Command Class Hierarchy are the items within a Chunk. The text and the inline @<name@> references are the principle command classes. Additionally, there are some cross reference commands (@f, @m, or @u).

  • Output Serialization. This is the Emitter class hierarchy writes various kinds of files. These decompose into two subclasses:

    • A Tangler creates source code.

    • A Weaver creates documentation. The various Jinja-based templates are part of weaving.

  • Input Parsing covers deserialization from the source .w file to the base model of Web, Chunk, and Command.

  • Other application components:

    • Error Class defines an application-specific exception. This covers all of the various kinds of problems that might arise.

    • Action class hierarchy defines things this program does.

    • The Application class. This is an overall class definition that includes command line parsing, picking an Action, configuring and executing the Action. It could be a set of related functions, but we’ve bound them into a class.

    • Logging setup. This includes a simple context manager for logging.

    • The Main Function.

    • pyWeb Module File defines the final module file that contains the application.

We’ll start with the base classes that define the data model for the source WEB of chunks.

Base Classes

Here are some of the base classes that define the structure and meaning of a .w source file.

Base Class Definitions (1) =

Command class hierarchy -- used to describe individual commands in a chunk (10)Chunk class hierarchy -- used to describe individual chunks (8)Web class -- describes the overall "web" of chunks (3)

Base Class Definitions (1). Used by → pyweb.py (79).

The above order is reasonably helpful for Python and minimizes forward references. The Chunk, Command, and Web instances do have a circular relationship, making a strict ordering a bit complex.

We’ll start at the central collection of information, the Web class of objects.

Web Class

The overall web of chunks is contained in a single instance of the Web class that is the principle parameter for the weaving and tangling actions. Broadly, the functionality of a Web can be separated into the folloowing areas:

  • It is constructed by a WebReader.

  • It also supports “enrichment” of the web, once all the Chunk instances are known. This is a stateful update to the web. Each Chunk is updated with references it makes as well as references to it.

  • It supports Chunk cross-reference methods that traverse this enriched data. This includes a kind of validity check to be sure that everything is used once and once only.

Fundamentally, a Web is a hybrid list+mapping. It as the following features:

  • It’s a Sequence to retain all Chunk instances in order.

  • It’s a mapping of name-to-Chunk that also offers a moderately sophisticated lookup, including exact match for a Chunk name and an approximate match for a an abbreviated name.

The Web is built by the parser by loading the sequence of Chunk instances.

Note that the WEB source language has a “mixed content model”. This means the code chunks have specific tags with names. The text, on the other hand, is interspersed among the code chunks. The text belongs to implicit, unnamed text chunks.

A web instance has a number of attributes.

chunks:

the sequence of Chunk instances as seen in the input file. To support anonymous chunks, and to assure that the original input document order is preserved, we keep all chunks in a master sequential list.

files:

the @o named OutputChunk chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with “=”), all others a second definitions (marked with “+=”).

macros:

the @d named NamedChunk chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with “=”), all others a second definitions (marked with “+=”).

userids:

the cross reference of chunks referenced by commands in other chunks.

This relies on the way a @dataclass does post-init processing. One the raw sequence of Chunks has been presented, some additional processing is done to link each Chunk to the web. This permits the full_name property to expand abbreviated names to full names, and, consequently, chunk references.

Imports (2) =

from collections import defaultdict
from collections.abc import Iterator
from dataclasses import dataclass, field
from functools import cache
import logging
from pathlib import Path
from types import SimpleNamespace
from typing import Any, Optional, Literal, ClassVar, Union
from weakref import ref, ReferenceType

Imports (2). Used by → pyweb.py (79).

The class defines one visible element of a Web instance, the chunks list of Chunk instances. From this list of Chunk objects, the remaining internal objects are built. These include the following:

  • chunk_map has the mapping of chunk names to list of chunks that provide the definition for the chunk.

  • userid_map has the mapping of user-defined names to the list of chunks that define the name.

  • references is the set of all referenced chunks.

Additionally there are attributes to contain a logger, a reference to the WEB file path, used to evaluate expressions, and a “strict-match” option that can report errors during name resolution. Disabling this will allow documents to be tangled that are potentially incomplete.

Generally, a parser will create a list of Chunk objects. From this, the parser can creates the final Web.

Web class – describes the overall “web” of chunks (3) =

@dataclass
class Web:
    chunks: list["Chunk"]  #: The source sequence of chunks.

    # The ``@d`` chunk names and locations where they're defined.
    chunk_map: dict[str, list["Chunk"]] = field(init=False)

    # The ``@|`` defined names and chunks with which they're associated.
    userid_map: defaultdict[str, list["Chunk"]] = field(init=False)

    logger: logging.Logger = field(init=False, default=logging.getLogger("Web"))

    web_path: Path = field(init=False)  #: Source WEB file; set by ```WebParse``

    strict_match: ClassVar[bool] = True  #: Report ... names without a definition.

Web class – describes the overall “web” of chunks (3). Used by → Base Class Definitions (1).

The __post_init__() special method populates the detailed structure of the WEB document. There are several passes through the WEB to digest the data:

  1. Set all Chunk and Command back references to the Web container. This is required so a Chunk with a ReferenceCommand instance can properly refer to a chunk elsewhere in the Web container. There are all weak references to faciliate garbgage collection.

  2. Locate the unabbreviated names in chunks and references to chunks. Names can found in two places. The @d command provides a name. A @<name@> command can also provide a reference to a name. The unabbreviated names define the structure. Unambiguous abbreviations can be used freely, since full names are located first.

  3. Accumulate chunk lists, output lists, and name definition lists. This pass does two things. First any user-defined name after a @| command is accumulated. Second, any abbreviated name is resolved to the full name, and the complete mapping from chunk name to a sequence of defining chunks is completed.

  4. Set the referencedBy attribute of a Chunk instance with all of the commands that point to it. The idea here is that a top-level Chunk instance may have references to other Chunk isntances. This forms a kind of tree. Any given low-level Chunk object is named by a sequence of parent Chunk objects.

Once the initialization is complete, the Web instance can be woven or tangled.

Web class – describes the overall “web” of chunks (4) +=

    def __post_init__(self) -> None:
        """
        Populate weak references throughout the web to make full_name properties work.
        Then. Locate all macro definitions and userid references.
        """
        # Pass 1 -- set all Chunk and Command back references.
        for c in self.chunks:
            c.web = ref(self)
            for cmd in c.commands:
                cmd.web = ref(self)

        # Named Chunks = Union of macro_iter and file_iter
        named_chunks = list(filter(lambda c: c.name is not None, self.chunks))

        # Pass 2 -- locate the unabbreviated names in chunks and references to chunks.
        self.chunk_map = {}
        for seq, c in enumerate(named_chunks, start=1):
            c.seq = seq
            if not c.path:
                # Use ``@d name`` chunks (reject ``@o`` and text)
                if c.name and not c.name.endswith('...'):
                    self.logger.debug(f"__post_init__ 2a {c.name=!r}")
                    self.chunk_map.setdefault(c.name, [])
            for cmd in c.commands:
                # Find ``@< name @>`` in ``@d name`` chunks or ``@o`` chunks
                if cmd.has_name:
                    if not cast(ReferenceCommand, cmd).name.endswith('...'):
                        self.logger.debug(f"__post_init__ 2b {cast(ReferenceCommand, cmd).name=!r}")
                        self.chunk_map.setdefault(cast(ReferenceCommand, cmd).name, [])

        # Pass 3 -- accumulate chunk lists, output lists, and name definition lists.
        self.userid_map = defaultdict(list)
        for c in named_chunks:
            for name in c.def_names:
                self.userid_map[name].append(c)
            if not c.path:
                # Named ``@d name`` chunks
                if full_name := c.full_name:
                    c.initial = len(self.chunk_map[full_name]) == 0
                    self.chunk_map[full_name].append(c)
                    self.logger.debug(f"__post_init__ 3 {c.name=!r} -> {c.full_name=!r}")
            else:
                # Output ``@o`` and anonymous chunks.
                # Assume all @o chunks are unique. If they're not, they overwrite each other.
                # Also, there's not ``full_name`` for these chunks.
                c.initial = True

            # TODO: Accumulate all chunks that contribute to a named file...

        # Pass 4 -- set referencedBy a command in a chunk.
        # ONLY set this in references embedded in named chunk or output chunk.
        # In a generic Chunk (which is text) there's no anchor to refer to.
        # NOTE: Assume single references *only*
        # We should raise an exception when updating a non-None referencedBy value.
        # Or incrementing ref_chunk.references > 1.
        for c in named_chunks:
            for cmd in c.commands:
                if cmd.has_name:
                    ref_to_list = self.resolve_chunk(cast(ReferenceCommand, cmd).name)
                    for ref_chunk in ref_to_list:
                        ref_chunk.referencedBy = c
                        ref_chunk.references += 1

Web class – describes the overall “web” of chunks (4). Used by → Base Class Definitions (1).

The representation of a Web instance is a sequence of Chunk instances. This can be long and difficult to read. It is, however, complete, and can be used to build instances of Web objects from a variety of sources.

Web class – describes the overall “web” of chunks (5) +=

    def __repr__(self) -> str:
        NL = ",\n"
        return (
            f"{self.__class__.__name__}("
            f"{NL.join(repr(c) for c in self.chunks)}"
            f")"
        )

Web class – describes the overall “web” of chunks (5). Used by → Base Class Definitions (1).

Name and Chunk resolution are similar. Name resolution provides only the expanded name. Chunk resolution provides the list of chunks that define a name. Chunk resolution expands on the basic features of Name resolution.

The complex target.endswith('...') processing only happens once during __post_init__() processing. After the initalization is complete, all ReferenceCommand objects will have a full_name attribute that avoids the complication of resolving a name with a ... ellipsis.

Web class – describes the overall “web” of chunks (6) +=

    def resolve_name(self, target: str) -> str:
        """Map short names to full names, if possible."""
        if target in self.chunk_map:
            # self.logger.debug(f"resolve_name {target=} in self.chunk_map")
            return target
        elif target.endswith('...'):
            # The ... is equivalent to regular expression .*
            matches = list(
                c_name
                for c_name in self.chunk_map
                if c_name.startswith(target[:-3])
            )
            match : str
            # self.logger.debug(f"resolve_name {target=} {matches=} in self.chunk_map")
            match matches:
                case []:
                    if self.strict_match:
                        raise Error(f"No full name for {target!r}")
                    else:
                        self.logger.warning(f"resolve_name {target=} unknown")
                        self.chunk_map[target] = []
                    match = target
                case [head]:
                    match = head
                case [head, *tail]:
                    message = f"Ambiguous abbreviation {target!r}, matches {[head] + tail!r}"
                    raise Error(message)
            return match
        else:
            self.logger.warning(f"resolve_name {target=} unknown")
            self.chunk_map[target] = []
            return target

    def resolve_chunk(self, target: str) -> list["Chunk"]:
        """Map name (short or full) to the defining sequence of chunks."""
        full_name = self.resolve_name(target)
        chunk_list = self.chunk_map[full_name]
        self.logger.debug(f"resolve_chunk {target=!r} -> {full_name=!r} -> {chunk_list=}")
        return chunk_list

Web class – describes the overall “web” of chunks (6). Used by → Base Class Definitions (1).

The point of the Web object is to be able to manage a variety of structures. These iterator methods and properties provide the list of @o chunks, @d chunks, and the usernames after @| in a chunk.

Additionally, we can confirm the overall structure by asserting that each @d name has one reference. A name with no references indicates an omission, a name with multiple references suggests a spelling or ellipsis problem.

Web class – describes the overall “web” of chunks (7) +=

    def file_iter(self) -> Iterator[OutputChunk]:
        return (cast(OutputChunk, c) for c in self.chunks if c.type_is("OutputChunk"))

    def macro_iter(self) -> Iterator[NamedChunk]:
        return (cast(NamedChunk, c) for c in self.chunks if c.type_is("NamedChunk"))

    def userid_iter(self) -> Iterator[SimpleNamespace]:
        yield from (SimpleNamespace(def_name=n, chunk=c) for c in self.file_iter() for n in c.def_names)
        yield from (SimpleNamespace(def_name=n, chunk=c) for c in self.macro_iter() for n in c.def_names)

    @property
    def files(self) -> list["OutputChunk"]:
        return list(self.file_iter())

    @property
    def macros(self) -> list[SimpleNamespace]:
        """
        The chunk_map has the list of Chunks that comprise a macro definition.
        We separate those to make it slightly easier to format the first definition.
        """
        first_list = (
            (self.chunk_map[name][0], self.chunk_map[name])
            for name in sorted(self.chunk_map)
            if self.chunk_map[name]
        )
        macro_list = list(
            SimpleNamespace(name=first_def.name, full_name=first_def.full_name, seq=first_def.seq, def_list=def_list)
            for first_def, def_list in first_list
        )
        # self.logger.debug(f"macros: {defs}")
        return macro_list

    @property
    def userids(self) -> list[SimpleNamespace]:
        userid_list = list(
            SimpleNamespace(userid=userid, ref_list=self.userid_map[userid])
            for userid in sorted(self.userid_map)
        )
        # self.logger.debug(f"userids: {userid_list}")
        return userid_list

    def no_reference(self) -> list[Chunk]:
        return list(filter(lambda c: c.name and not c.path and c.references == 0, self.chunks))

    def multi_reference(self) -> list[Chunk]:
        return list(filter(lambda c: c.name and not c.path and c.references > 1, self.chunks))

Web class – describes the overall “web” of chunks (7). Used by → Base Class Definitions (1).

A Web instance is built by a WebReader. It’s used by an Emitter, including a Weaver as well as a Tangler. A Web is composed of individual Chunk instances.

Chunk Class Hierarchy

A Chunk is a piece of the input file. It is a collection of Command instances. A Chunk can be woven or tangled to create output.

class Chunk {
    name: str
    seq: int
    commands: list[Command]
    options: list[str]
    def_names: list[str]
    initial: bool
}

class OutputChunk
Chunk <|-- OutputChunk

class NamedChunk
Chunk <|-- NamedChunk

These subclasss reflect three kinds of content in the WEB source document:

  • Chunk is the anonymous text context.

    Text in the body generally becomes a TextCommand. Also, the various XREF commands (@m, @f, @u) can only appear here. In principle, a @< reference @> can appear in text. It must name a @d name @[...@] NamedDocumentChunk, which is expanded in place, not linked.

  • OutputChunk is the @o context.

    Text in the body becomes a CodeCommand. Any @< reference @> will be expanded when tangling, but become a link when weaving. This defines an output file.

  • NamedChunk is the @d context.

    Text in the body becomes a CodeCommand. Any @< reference @> will be expanded when tangling, but become a link when weaving.

Most of the attributes are pushed up to the superclass. This makes type checking the complex WEB tree much simpler.

The attributes are visible to the Jinja templates. In particular the sequence number, seq, and the initial definition indicator, initial, are often used to customize presentation of the woven content.

A type_is() method is used to discern the various subtypes. This slightly simplifies the work done by a template. It’s not easy to rely on proper inheritance because the templates are implemented in a separate language with their own processing rules.

Chunk class hierarchy – used to describe individual chunks (8) =

@dataclass
class Chunk:
    """Superclass for OutputChunk, NamedChunk, NamedDocumentChunk.
    """
    #: Short name of the chunk.
    name: str | None = None

    #: Unique sequence number of chunk in the WEB.
    seq: int | None = None

    #: Sequence of commands inside this chunk.
    commands: list["Command"] = field(default_factory=list)

    #: Parsed options for @d and @o chunks.
    options: list[str] = field(default_factory=list)

    #: Names defined after ``@|`` in this chunk.
    def_names: list[str] = field(default_factory=list)

    #: Is this the first use of a given Chunk name?
    initial: bool = False

    #: If injecting location details whenm tangling, this is the comment prefix.
    comment_start: str | None = None

    #: If injecting location details, this is the comment suffix.
    comment_end: str | None = None

    #: Count of references to this Chunk.
    references: int = field(init=False, default=0)

    #: The immediate reference to this chunk.
    referencedBy: Optional["Chunk"] = field(init=False, default=None)

    #: Weak reference to the ``Web`` containing this ``Chunk``.
    web: ReferenceType["Web"] = field(init=False, repr=False)

    #: Logger for any chunk-specific messages.
    logger: logging.Logger = field(init=False, default=logging.getLogger("Chunk"))

    @property
    def full_name(self) -> str | None:
        if self.name:
            return cast(Web, self.web()).resolve_name(self.name)
        else:
            return None

    @property
    def path(self) -> Path | None:
        return None

    @property
    def location(self) -> tuple[str, int]:
        return self.commands[0].location

    @property
    def transitive_referencedBy(self) -> list["Chunk"]:
        if self.referencedBy:
            return [self.referencedBy] + self.referencedBy.transitive_referencedBy
        else:
            return []

    def add_text(self, text: str, location: tuple[str, int]) -> "Chunk":
        if self.commands and self.commands[-1].typeid.TextCommand:
            cast(HasText, self.commands[-1]).text += text
        else:
            # Empty list OR previous command was not ``TextCommand``
            self.commands.append(TextCommand(text, location))
        return self

    def type_is(self, name: str) -> bool:
        """
        Instead of type name matching, we could check for these features:
        - has_code() (i.e., NamedChunk and OutputChunk)
        - has_text() (i.e., Chunk and NamedDocumentChunk)
        This is for template rendering, where proper Liskov
        Substitution is irrelevant.
        """
        return self.__class__.__name__ == name

Chunk class hierarchy – used to describe individual chunks (8). Used by → Base Class Definitions (1).

The subclasses do little more than partition thd Chunks in a way that permits customization in the template rendering process.

An OutputChunk is distinguished from a NamedChunk by having a path property and not having a full_name property.

Chunk class hierarchy – used to describe individual chunks (9) +=

class OutputChunk(Chunk):
    """An output file."""
    @property
    def path(self) -> Path | None:
        if self.name:
            return Path(self.name)
        else:
            return None

    @property
    def full_name(self) -> str | None:
        return None

    def add_text(self, text: str, location: tuple[str, int]) -> Chunk:
        if self.commands and self.commands[-1].typeid.CodeCommand:
            cast(HasText, self.commands[-1]).text += text
        else:
            # Empty list OR previous command was not ``CodeCommand``
            self.commands.append(CodeCommand(text, location))
        return self

class NamedChunk(Chunk):
    """A defined name with code."""
    def add_text(self, text: str, location: tuple[str, int]) -> Chunk:
        if self.commands and self.commands[-1].typeid.CodeCommand:
            cast(HasText, self.commands[-1]).text += text
        else:
            # Empty list OR previous command was not ``CodeCommand``
            self.commands.append(CodeCommand(text, location))
        return self

class NamedChunk_Noindent(Chunk):
    """A defined name with code and the -noIndent option."""
    pass

class NamedDocumentChunk(Chunk):
    """A defined name with text."""
    pass

Chunk class hierarchy – used to describe individual chunks (9). Used by → Base Class Definitions (1).

Command Class Hierarchy

A Chunk is a sequence of Command instances. For the generic Chunk superclass, the commands are – mostly – the TextCommand subclass of Command. These are blocks of text. A Chunk may also include some XRefCommand instances which expand to cross-reference material for an index.

For the CodeChunk and NamedChunk subclasses, the commands are CodeCommand instances intermixed with ReferenceCommand instances. A CodeCommand has a wrapper when weaving. Additionally, it will tangled into the output. A ReferenceCommand becomes a link when weaving, and expands to it’s full body when being tangled.

class Chunk {
    name: str
    commands: list[Command]
}
abstract class Command {
    {static} has_name: bool
    {static} has_text: bool
    {static} typeid: TypeId
    text: str
    tangle(Tangler, Target)
}

Chunk *-- "1..*" Command

abstract HasText
Command <|-- HasText

class TextCommand
HasText <|-- TextCommand

class CodeCommand
HasText <|-- CodeCommand

class ReferenceCommand
Command <|-- ReferenceCommand

abstract XRefCommand
Command <|-- XRefCommand

class FileXRefCommand
XRefCommand <|-- FileXRefCommand

class MacroXRefCommand
XRefCommand <|-- MacroXRefCommand

class UseridXRefCommand
XRefCommand <|-- UseridXRefCommand

class TypeId {
    __getattr__(str) : bool
}

Command -- TypeId

Each of these variants has the possibility of distinct processing when weaving the final document. The type information must be visibile to the Jinja template processing. This is done through an instance of the TypeId class attached to each of these classes.

The input stream is broken into individual commands, based on the various @x strings in the file. There are several subclasses of Command, each used to describe a different command or block of text in the input.

All instances of the Command class are created by the WebReader instance. In this case, a WebReader can be thought of as a factory for Command instances. Each Command instance is appended to the sequence of commands that belong to a Chunk.

This model permits two kinds of serialization:

  • Weaving a document from the WEB source file. This uses the various attributes of the various subclasses.

  • Tangling target documents with code. This relies on a tangle() method in each subclass.

We’ll address the run-time type identification first, the the definitions of the various Command subclasses.

Command class hierarchy – used to describe individual commands in a chunk (10) =

The TypeId Class -- to help the template engine (12)The Command Abstract Base Class (13)The HasText Type Hint -- used instead of another abstract class (14)The TextCommand Class (15)The CodeCommand Class (16)The ReferenceCommand Class (17)The XrefCommand Subclasses -- files, macros, and user names (18)

Command class hierarchy – used to describe individual commands in a chunk (10). Used by → Base Class Definitions (1).

The TypeId Class

The TypeId class provides run-time type identification to the Jinja templates. The idea is object.typeid.AClass is equivalent to isinstance(object, pyweb.AClass). It has simpler syntax and works better with Jinja templates. It helps sort out the various nodes of the AST built from the source WEB document.

There are three parts to the TypeId implementation:

  • A TypeId class definition to handle the attribute access. A reference to object.typeid.Name evaluates __getattr__(object, 'Name').

  • A metaclass definition, TypeIdMeta, to inject the new typeid attribute into each class.

  • The normal class initialization process, which evaluates __set_name__() for each attribute of a class that defines the method. This provides the containing class to the TypeId instance.

The idea of run-time type identification is – in a way – a failure to properly define the classes to follow the Liskov Substitution design principle. A better design would check for specific features of a subclass of Command. This becomes awkwardly complex in the Jinja templates, because the templates exist outside the class hierarchy. We rely on the typeid to map classes to macros appropriate to the class.

Imports (11) +=

from typing import TypeGuard, TypeVar, Generic

Imports (11). Used by → pyweb.py (79).

The TypeId Class – to help the template engine (12) =

_T = TypeVar("_T")

class TypeId:
    """
    This makes a given class name into an attribute with a
    True value. Any other attribute reference will return False.

    >>> class A:
    ...     typeid = TypeId()
    >>> a = A()
    >>> a.typeid.A
    True
    >>> a.typeid.B
    False
    """
    def __set_name__(self, owner: type[_T], name: str) -> "TypeId":
        self.my_class = owner
        return self

    def __getattr__(self, item: str) -> TypeGuard[_T]:
        return self.my_class.__name__ == item

from collections.abc import Mapping

class TypeIdMeta(type):
    """Inject the ``typeid`` attribute into a class definition."""
    @classmethod
    def __prepare__(metacls, name: str, bases: tuple[type, ...], **kwds: Any) -> Mapping[str, object]:  # type: ignore[override]
        return {"typeid": TypeId()}

The TypeId Class – to help the template engine (12). Used by → Command class hierarchy – used to describe individual commands in a chunk (10).

The TypeIdMeta metaclass sets the typeid attribute of each class defined by this metaclass. The ordinary class preparation will invoke the __set_name__() special method to provide details to the attribute.

Once set, any reference to c.typeid.name will be evaluated as __getattr__(c, 'name'). This permits the typeid to compare the name provided by __set_name__() with the name being inquired about.

The Command Class

The Command class is abstract, and describes most of the features of the various subclasses.

The Command Abstract Base Class (13) =

class Command(metaclass=TypeIdMeta):
    typeid: TypeId
    has_name: TypeGuard["ReferenceCommand"] = False
    has_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = False

    def __init__(self, location: tuple[str, int]) -> None:
        self.location = location  #: The (filename, line number)
        self.logger = logging.getLogger(self.__class__.__name__)
        self.web: ReferenceType["Web"]
        self.text: str  #: The body of this command

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(location={self.location!r})"

    @abc.abstractmethod
    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        ...

The Command Abstract Base Class (13). Used by → Command class hierarchy – used to describe individual commands in a chunk (10).

The HasText Classes

A type hint summarizes some of the subclass relationships.

The HasText Type Hint – used instead of another abstract class (14) =

HasText = Union["CodeCommand", "TextCommand"]

The HasText Type Hint – used instead of another abstract class (14). Used by → Command class hierarchy – used to describe individual commands in a chunk (10).

We don’t formalize this as proper subclass definitions. We probably should, but it doesn’t seem to add any clarity.

The TextCommand Class

The TextCommand class describes all of the text outside the @d and @o chunks. These are not tangled, and an exception is raised.

The TextCommand Class (15) =

class TextCommand(Command):
    """Text outside any other command."""
    has_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = True

    def __init__(self, text: str, location: tuple[str, int]) -> None:
        super().__init__(location)
        self.text = text  #: The text

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        message = f"attempt to tangle a text block {self.location} {shorten(self.text, 32)!r}"
        self.logger.error(message)
        raise Error(message)

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(text={self.text!r}, location={self.location!r})"
The CodeCommand Class

The CodeCommand class describes the text inside the @d and @o chunks. These are tangled without change.

The CodeCommand Class (16) =

class CodeCommand(Command):
    """Code inside a ``@o``, or ``@d`` command."""
    has_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = True

    def __init__(self, text: str, location: tuple[str, int]) -> None:
        super().__init__(location)
        self.text = text  #: The text

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        self.logger.debug(f"tangle {self.text=!r}")
        aTangler.codeBlock(target, self.text)

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(text={self.text!r}, location={self.location!r})"
The ReferenceCommand Class

The ReferenceCommand class describes a @< name @> construct inside a chunk. When tangled, these lead to inserting the referenced chunk’s content. Because this a reference to another chunk, the properties provide the values for the other chunk.

The ReferenceCommand Class (17) =

class ReferenceCommand(Command):
    """
    Reference to a ``NamedChunk`` in code, a ``@< name @>`` construct.
    In a CodeChunk or OutputChunk, it tangles to the definition from a ``NamedChunk``.
    In text, it can weave to the text of a ``NamedDocumentChunk``.
    """
    has_name: TypeGuard["ReferenceCommand"] = True

    def __init__(self, name: str, location: tuple[str, int]) -> None:
        super().__init__(location)
        self.name = name  #: The name that is referenced.

    @property
    def full_name(self) -> str:
        return cast(Web, self.web()).resolve_name(self.name)

    @property
    def seq(self) -> int | None:
        return cast(Web, self.web()).resolve_chunk(self.name)[0].seq

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        """Expand this reference.
        The starting position is the indentation for all **subsequent** lines.
        Provide the indent before ``@<``, in ``tangler.fragment`` back to the tangler.
        """
        self.logger.debug(f"tangle reference to {self.name=}, context: {aTangler.fragment=}")
        chunk_list = cast(Web, self.web()).resolve_chunk(self.name)
        if len(chunk_list) == 0:
            message = f"Attempt to tangle an undefined Chunk, {self.name!r}"
            self.logger.error(message)
            raise Error(message)
        aTangler.reference_names.add(self.name)
        aTangler.addIndent(len(aTangler.fragment))
        aTangler.fragment = ""

        for chunk in chunk_list:
            # TODO: if chunk.options includes '-indent': do a setIndent before tangling.
            for command in chunk.commands:
                command.tangle(aTangler, target)

        aTangler.clrIndent()

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(name={self.name!r}, location={self.location!r})"
The XrefCommand Classes

The XRefCommand classes describes a @f, @m, and @u constructs inside a chunk. These are not Tangled. They’re only woven.

Each offers a unique property that can be used by the template rending to get data about the WEB content.

The XrefCommand Subclasses – files, macros, and user names (18) =

class FileXrefCommand(Command):
    """The ``@f`` command."""
    def __init__(self, location: tuple[str, int]) -> None:
        super().__init__(location)

    @property
    def files(self) -> list["OutputChunk"]:
        return cast(Web, self.web()).files

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        raise Error('Illegal tangling of a cross reference command.')

class MacroXrefCommand(Command):
    """The ``@m`` command."""
    def __init__(self, location: tuple[str, int]) -> None:
        super().__init__(location)

    @property
    def macros(self) -> list[SimpleNamespace]:
        return cast(Web, self.web()).macros

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        raise Error('Illegal tangling of a cross reference command.')

class UserIdXrefCommand(Command):
    """The ``@u`` command."""
    def __init__(self, location: tuple[str, int]) -> None:
        super().__init__(location)

    @property
    def userids(self) -> list[SimpleNamespace]:
        return cast(Web, self.web()).userids

    def tangle(self, aTangler: "Tangler", target: TextIO) -> None:
        raise Error('Illegal tangling of a cross reference command.')

The XrefCommand Subclasses – files, macros, and user names (18). Used by → Command class hierarchy – used to describe individual commands in a chunk (10).

Output Serialization

The Emitter class hierarchy writes the output from the source Web instance. An Emitter instance is responsible for control of an output file format. This includes the necessary file naming, opening, writing and closing operations.

abstract class Emitter {
    output: Path
    emit(Web)
}

class Web
Emitter ..> Web

class Weaver
Emitter <|-- Weaver

class Tangler
Emitter <|-- Tangler
class TanglerMake
Tangler <|-- TanglerMake

package jinja {
    class Environment
}

Weaver --> Environment

object template

Weaver *-- template
Environment --> template

Here’s how the definitions are provided in the application. The two reference class definitions are used by by the Emitter class, and needs to be defined first. We’ll look at them later, since they’re a tiny strategy change in how cross-references are displayed.

Base Class Definitions (19) +=

Emitter Superclass (21)Weaver Subclass -- Uses Jinja templates to weave documentation (22)Tangler Subclass -- emits the output files (28)TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32)

Base Class Definitions (19). Used by → pyweb.py (79).

Imports (20) +=

import abc
from textwrap import dedent, shorten
from jinja2 import Environment, DictLoader, select_autoescape

Imports (20). Used by → pyweb.py (79).

The Emitter class is an abstraction, used to check the consistency of the subclasses.

Emitter Superclass (21) =

class Emitter(abc.ABC):
    def __init__(self, output: Path):
        self.logger = logging.getLogger(self.__class__.__qualname__)
        self.log_indent = logging.getLogger("indent." + self.__class__.__qualname__)
        self.output = output

    @abc.abstractmethod
    def emit(self, web: Web) -> None:
        pass

Emitter Superclass (21). Used by → Base Class Definitions (19).

The Weaver Subclass

The Weaver is a Facade that wraps Jinja template processing.

The job is to build the necessary environment, locate the templates, and then evaluate the template’s generate() method to fill the values into the template to create the woven document.

There’s “base_weaver” template that contains the essential structure of the output document. This creates the needed macros, and then weaves the various chunks, in order.

Each unique markup language has macros that provide the unique markup required for the various chunks. This permits customization of the markup.

We have an interesting wrinkle with RST-formatted output. There are two variants that may be important:

  • When used with Sphinx, the “small” caption at the end of a code block uses ..  rst-class:: small.

  • When used without Sphinx, i.e., native docutils, the the “small” caption at the end of a code block uses ..  class:: small.

This is a minor change to the template being used. The question is how to make that distinction in the weaver? One view is to use subclasses of Weaver for this. However, the templates are found by name in the template_map within the Weaver. The --weaver command-line option provides the string (e.g., rst or html) used to build a key into the template map.

We can, therefore, use the --weaver command-line option to provide an expanded set of names for RST processing.

  • -w rst is the Sphinx option.

  • -w rst-sphinx is an alias for rst. The dictionary key points to the same templates as rst.

  • -w rst-nosphinx is the “pure-docutils” version, using .. class::.

  • -w rst-docutils is an alias for the nosphinx option.

While this works out nicely, it turns out that the ..  container:: small is, perhaps, a better markup that ..  class:: small. This work in docutils and Sphinx.

Weaver Subclass – Uses Jinja templates to weave documentation (22) =

Debug Templates -- these display debugging information (24)RST Templates -- the default weave output (25)HTML Templates -- emit HTML weave output (26)LaTeX Templates -- emit LaTeX weave output (27)Common base template -- this is used for ALL weaving (23)

class Weaver(Emitter):
    template_map = {
        "debug_defaults": debug_weaver_template, "debug_macros": "",
        "rst_defaults": rst_weaver_template, "rst_macros": rst_overrides_template,
        "html_defaults": html_weaver_template, "html_macros": html_overrides_template,
        "tex_defaults": latex_weaver_template, "tex_macros": tex_overrides_template,

        "rst-sphinx_defaults": rst_weaver_template, "rst-sphinx_macros": rst_overrides_template,
        "rst-nosphinx_defaults": rst_weaver_template, "rst-nosphinx_macros": rst_nosphinx_template,
        "rst-docutils_defaults": rst_weaver_template, "rst-docutils_macros": rst_nosphinx_template,
    }

    quote_rules = {
        "rst": rst_quote_rules,
        "html": html_quote_rules,
        "tex": latex_quote_rules,
        "debug": debug_quote_rules,
    }

    def __init__(self, output: Path = Path.cwd()) -> None:
        super().__init__(output)
        # Summary
        self.linesWritten = 0

    def set_markup(self, markup: str = "rst") -> "Weaver":
        self.markup = markup
        return self

    def emit(self, web: Web) -> None:
        self.target_path = (self.output / web.web_path.name).with_suffix(f".{self.markup}")
        self.logger.info("Weaving %s using %s markup", self.target_path, self.markup)
        with self.target_path.open('w') as target_file:
            for text in self.generate_text(web):
                self.linesWritten += text.count("\n")
                target_file.write(text)

    def generate_text(self, web: Web) -> Iterator[str]:
        self.env = Environment(
            loader=DictLoader(
                self.template_map |
                {'base_weaver': base_template,}
            ),
            autoescape=select_autoescape()
        )
        self.env.filters |= {
            "quote_rules": self.quote_rules[self.markup]
        }
        defaults = self.env.get_template(f"{self.markup}_defaults")
        macros = self.env.get_template(f"{self.markup}_macros")
        template = self.env.get_template("base_weaver")
        return template.generate(web=web, macros=macros, defaults=defaults)

Weaver Subclass – Uses Jinja templates to weave documentation (22). Used by → Base Class Definitions (19).

There are several strategy plug-ins. Each is unique for a particular flavort of markup. These include the quoting function used to escape markup characters, and the templates used.

The objective is to have a generic “weaver” template which includes three levels of template definition:

  1. Defaults.

  2. Configured overrides, perhaps from pyweb.toml.

  3. Document overrides from the .w file in @t name @{...@} commands.

This means there is a two-step binding between document and macros.

  1. The base weaver document should import three generic template definitions:

    {%- from 'markup' import * %}

    {%- from 'configured' import * %}

    {%- from 'document' import * %}

  2. These names map (somehow) to specific templates based on markup language.

    markup -> rst/markup, etc.

This allows us to provide all templates and make a final binding at weave time. We can use a prefix loader with a given prefix. Some kind of “import rst/markup as markup” would be ideal.

Jinja, however, doesn’t seem to support this the same way Python does. There’s no import as construct allowing very late binding.

The alternative is to create the environment very late in the process, once we have all the information available. We can then pick the templates to put into a DictLoader to support the standard weaving structure.

The quoting rules apply to the various template languages. The idea is that a few characters must be escaped for proper presentation in the code sample sections.

Common Base Template

The common base template expands each chunk and each command in order. This involves some special case processing for OutputChunk and NamedChunk which have a “wrapper” woven around the chunk’s sequence of commands.

This relies on a number of individual macros:

  • text(command). Emits a block of text – this should do nothing with the text. The author’s original markup passes through untouched.

  • begin_code(chunk). Starts a block of code, either @d or @o.

  • code(command). Emits a block fo code. This may require escapes for special characters that would break the markup being used.

  • ref(command). Emits a reference to a named block of code.

  • end_code(chunk). Ends a block of code.

  • file_xref(command). Emit the full @f output, usually some kind of definition list.

  • macro_xref(command). Emit the full @m output, usually some kind of definition list.

  • userid_xref(command). Emit the full @u output, usually some kind of definition list.

The ref() macro can also be used in the XREF output macros. It can also be used in the end_code() macro. After a block of code, some tools (like Interscript) will show where the block was referenced. The point of using the ref() macro in multiple places is to make all of them look identical.

There are a variety of optional formatting considerations. First is cross-references, second is a variety of begin_code() options.

There are four styles for the “referencedBy” information in a Chunk.

  • Nothing.

  • The immediate @<name@> Chunk.

  • The entire transitive sequence of parents for the @<name@> Chunk. There are two forms for this:

    • Top-down path. Named (1) / Sub-Named (2) / Sub-Sub-Named (3).

    • Bottom-up path. Sub-Sub-Named (3) Sub-Named (2) Named (1).

These require four distinct versions of the end_code() macro. This macro uses the transitive_referencedBy propery of a Chunk producing a sequence of ref() values.

We need

Common base template – this is used for ALL weaving (23) =

base_template = dedent("""\
    {%- from macros import text, begin_code, code, ref, end_code, file_xref, macro_xref, userid_xref -%}
    {%- if not text is defined %}{%- from defaults import text -%}{%- endif -%}
    {%- if not begin_code is defined %}{%- from defaults import begin_code -%}{%- endif -%}
    {%- if not code is defined %}{%- from defaults import code -%}{%- endif -%}
    {%- if not ref is defined %}{%- from defaults import ref -%}{%- endif -%}
    {%- if not end_code is defined %}{%- from defaults import end_code -%}{%- endif -%}
    {%- if not file_xref is defined %}{%- from defaults import file_xref -%}{%- endif -%}
    {%- if not macro_xref is defined %}{%- from defaults import macro_xref -%}{%- endif -%}
    {%- if not userid_xref is defined %}{%- from defaults import userid_xref -%}{%- endif -%}
    {% for chunk in web.chunks -%}
        {%- if chunk.type_is('OutputChunk') or chunk.type_is('NamedChunk') -%}
            {{begin_code(chunk)}}
            {%- for command in chunk.commands -%}
                {%- if command.typeid.CodeCommand -%}{{code(command)}}
                {%- elif command.typeid.ReferenceCommand -%}{{ref(command)}}
                {%- endif -%}
            {%- endfor -%}
            {{end_code(chunk)}}
        {%- elif chunk.type_is('Chunk') -%}
            {%- for command in chunk.commands -%}
                {%- if command.typeid.TextCommand %}{{text(command)}}
                {%- elif command.typeid.ReferenceCommand %}{{ref(command)}}
                {%- elif command.typeid.FileXrefCommand %}{{file_xref(command)}}
                {%- elif command.typeid.MacroXrefCommand %}{{macro_xref(command)}}
                {%- elif command.typeid.UserIdXrefCommand %}{{userid_xref(command)}}
                {%- endif -%}
            {%- endfor -%}
        {%- endif -%}
    {%- endfor %}
""")

Common base template – this is used for ALL weaving (23). Used by → Weaver Subclass – Uses Jinja templates to weave documentation (22).

TODO: Need to more gracefully handle the case where an output chunk has multiple definitions.

@o x.y
@{
... part 1 ...
@}

@o x.y
@{
... part 2 ...
@}

The above should have the same output as the follow (more complex) alternative:

@o x.y
@{
@<part 1@>
@<part 2@>
@}

@d part 1
@{
... part 1 ...
@}

@d part 2
@{
... part 2 ...
@}

Currently, we casually treat the first instance as the “definition”, and don’t provide references to the additional parts of the definition.

Debug Template

Debug Templates – these display debugging information (24) =

def debug_quote_rules(text: str) -> str:
    return repr(text)

debug_weaver_template = dedent("""\
    {%- macro text(command) -%}
    text: {{command}}
    {%- endmacro -%}

    {%- macro begin_code(chunk) %}
    begin_code: {{chunk}}
    {%- endmacro -%}

    {%- macro code(command) %}
    code: {{command}}
    {%- endmacro -%}

    {%- macro ref(id) %}
    ref: {{id}}
    {%- endmacro -%}

    {%- macro end_code(chunk) %}
    end_code: {{chunk}}
    {% endmacro -%}

    {%- macro file_xref(command) -%}
    file_xref {{command.files}}
    {%- endmacro -%}

    {%- macro macro_xref(command) -%}
    macro_xref {{command.macros}}
    {%- endmacro -%}

    {%- macro userid_xref(command) -%}
    userid_xref {{command.userids}}
    {%- endmacro -%}
    """)

Debug Templates – these display debugging information (24). Used by → Weaver Subclass – Uses Jinja templates to weave documentation (22).

RST Template

The RST Templates produce ReStructuredText for the various web commands. Note that code lines must be indented when using this markup.

RST Templates – the default weave output (25) =

def rst_quote_rules(text: str) -> str:
    quoted_chars = [
        ('\\', r'\\'), # Must be first.
        ('`', r'\`'),
        ('_', r'\_'),
        ('*', r'\*'),
        ('|', r'\|'),
    ]
    clean = text
    for from_, to_ in quoted_chars:
        clean = clean.replace(from_, to_)
    return clean

rst_weaver_template = dedent("""
    {%- macro text(command) -%}
    {{command.text}}
    {%- endmacro -%}

    {%- macro begin_code(chunk) %}
    ..  _`{{chunk.full_name or chunk.name}} ({{chunk.seq}})`:
    ..  rubric:: {{chunk.full_name or chunk.name}} ({{chunk.seq}}) {% if chunk.initial %}={% else %}+={% endif %}
    ..  parsed-literal::
        :class: code

    {% endmacro -%}

    {# For RST, each line must be indented. #}
    {%- macro code(command) %}{% for line in command.text.splitlines() %}    {{line | quote_rules}}
    {% endfor -%}{% endmacro -%}

    {%- macro ref(id) %}    \N{RIGHTWARDS ARROW} `{{id.full_name or id.name}} ({{id.seq}})`_{% endmacro -%}

    {# When using Sphinx, this *could* be rst-class::, pure docutils uses container::#}
    {%- macro end_code(chunk) %}
    ..

    ..  container:: small

        \N{END OF PROOF} *{{chunk.full_name or chunk.name}} ({{chunk.seq}})*.
        {% if chunk.referencedBy %}Used by {{ref(chunk.referencedBy)}}.{% endif %}

    {% endmacro -%}

    {%- macro file_xref(command) -%}
    {% for file in command.files -%}
    :{{file.name}}:
        \N{RIGHTWARDS ARROW} `{{file.name}} ({{file.seq}})`_
    {%- endfor %}
    {%- endmacro -%}

    {%- macro macro_xref(command) -%}
    {% for macro in command.macros -%}
    :{{macro.full_name}}:
        {% for d in macro.def_list -%}\N{RIGHTWARDS ARROW} `{{d.full_name or d.name}} ({{d.seq}})`_{% if loop.last %}{% else %}, {% endif %}{%- endfor %}

    {% endfor %}
    {%- endmacro -%}

    {%- macro userid_xref(command) -%}
    {% for userid in command.userids -%}
    :{{userid.userid}}:
        {% for r in userid.ref_list -%}\N{RIGHTWARDS ARROW} `{{r.full_name or r.name}} ({{r.seq}})`_{% if loop.last %}{% else %}, {% endif %}{%- endfor %}

    {% endfor %}
    {%- endmacro -%}
    """)

rst_overrides_template = dedent("""\
    """)

rst_nosphinx_template = dedent("""\
    {%- macro end_code(chunk) %}
    ..

    ..  class:: small

        \N{END OF PROOF} *{{chunk.full_name or chunk.name}} ({{chunk.seq}})*

    {% endmacro -%}
    """)

RST Templates – the default weave output (25). Used by → Weaver Subclass – Uses Jinja templates to weave documentation (22).

HTML Template

The HTML templates use a relatively simple markup, avoiding any CSS names. A slightly more flexible approach might be to name specific CSS styles, and provide generic definitions for those styles. This would make it easier to tailor HTML output via CSS changes, avoiding any HTML modifications.

HTML Templates – emit HTML weave output (26) =

def html_quote_rules(text: str) -> str:
    quoted_chars = [
        ("&", "&amp;"),  # Must be first
        ("<", "&lt;"),
        (">", "&gt;"),
        ('"', "&quot;"),  # Only applies inside tags...
    ]
    clean = text
    for from_, to_ in quoted_chars:
        clean = clean.replace(from_, to_)
    return clean

html_weaver_template = dedent("""\
    {%- macro text(command) -%}
    {{command.text}}
    {%- endmacro -%}

    {%- macro begin_code(chunk) %}
    <a name="pyweb_{{chunk.seq}}"></a>
    <!--line number {{chunk.location}}-->
    <p><em>{{chunk.full_name or chunk.name}} ({{chunk.seq}})</em> {% if chunk.initial %}={% else %}+={% endif %}</p>
    <pre><code>
    {%- endmacro -%}

    {%- macro code(command) -%}{{command.text | quote_rules}}{%- endmacro -%}

    {%- macro ref(id) %}&rarr;<a href="#pyweb_{{id.seq}}"><em>{{id.full_name or id.name}} ({{id.seq}})</em></a>{% endmacro -%}

    {%- macro end_code(chunk) %}
    </code></pre>
    <p>&#8718; <em>{{chunk.full_name or chunk.name}} ({{chunk.seq}})</em>.
    {% if chunk.referencedBy %}Used by {{ref(chunk.referencedBy)}}.{% endif %}
    </p>
    {% endmacro -%}

    {%- macro file_xref(command) %}
    <dl>
    {% for file in command.files -%}
      <dt>{{file.name}}</dt><dd>{{ref(file)}}</dd>
    {%- endfor %}
    </dl>
    {% endmacro -%}

    {%- macro macro_xref(command) %}
    <dl>
    {% for macro in command.macros -%}
      <dt>{{macro.full_name}}<dt>
      <dd>{% for d in macro.def_list -%}{{ref(d)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}</dd>
    {% endfor %}
    </dl>
    {% endmacro -%}

    {%- macro userid_xref(command) %}
    <dl>
    {% for userid in command.userids -%}
      <dt>{{userid.userid}}</dt>
      <dd>{% for r in userid.ref_list -%}{{ref(r)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}</dd>
    {% endfor %}
    </dl>
    {% endmacro -%}
    """)

html_overrides_template = dedent("""\
    """)

HTML Templates – emit HTML weave output (26). Used by → Weaver Subclass – Uses Jinja templates to weave documentation (22).

LaTeX Template

The LaTeX templates use a markup focused in the verbatim environment. Common alternatives include listings and minted.

LaTeX Templates – emit LaTeX weave output (27) =

def latex_quote_rules(text: str) -> str:
    quoted_strings = [
        ("\\end{Verbatim}", "\\end\\,{Verbatim}"),  # Allow \end{Verbatim} in a Verbatim context
        ("\\{", "\\\\,{"), # Prevent unexpected commands in Verbatim
        ("$", "\\$"), # Prevent unexpected math in Verbatim
    ]
    clean = text
    for from_, to_ in quoted_strings:
        clean = clean.replace(from_, to_)
    return clean

latex_weaver_template = dedent("""\
    {%- macro text(command) -%}
    {{command.text}}
    {%- endmacro -%}

    {%- macro begin_code(chunk) %}
    \\label{pyweb-{{chunk.seq}}}
    \\begin{flushleft}
    \\textit{Code example {{chunk.full_name or chunk.name}} ({{chunk.seq}})}
    \\begin{Verbatim}[commandchars=\\\\\\{\\},codes={\\catcode`$$=3\\catcode`^=7},frame=single]
    {%- endmacro -%}

    {%- macro code(command) -%}{{command.text | quote_rules}}{%- endmacro -%}

    {%- macro ref(id) %}$$\\rightarrow$$ Code Example {{id.full_name or id.name}} ({{id.seq}}){% endmacro -%}

    {%- macro end_code(chunk) %}
    \\end{Verbatim}
    \\end{flushleft}
    {% endmacro -%}

    {%- macro file_xref(command) %}
    \\begin{itemize}
    {% for file in command.files -%}
      \\item {{file.name}}: {{ref(file)}}
    {%- endfor %}
    \\end{itemize}
    {% endmacro -%}

    {%- macro macro_xref(command) %}
    \\begin{itemize}
    {% for macro in command.macros -%}
      \\item {{macro.full_name}} \\\\
            {% for d in macro.def_list -%}{{ref(d)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}
    {% endfor %}
    \\end{itemize}
    {% endmacro -%}

    {%- macro userid_xref(command) %}
    \\begin{itemize}
    {% for userid in command.userids -%}
      \\item {{userid.userid}} \\\\
            {% for r in userid.ref_list -%}{{ref(r)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}
    {% endfor %}
    \\end{itemize}
    {% endmacro -%}
    """)

tex_overrides_template = dedent("""\
    """)

LaTeX Templates – emit LaTeX weave output (27). Used by → Weaver Subclass – Uses Jinja templates to weave documentation (22).

The Tangler Subclasses

Tangling is a variation on emitting that includes all the code in the order defined by the @o file commands. This is not necessarily the order they’re presented in the document.

The whole point of Weaving and Tangling is to write a document in an order that’s sensible for people to understand. The tangled output is for compilers and run-time environments.

Each file is individually tangled, unrelated to the order of the source WEB document. The emit() process, therefore, iterates through all of the files defined in the WEB.

There’s a complex interplay between Tangler and CodeCommand to maintain the indentations.

participant Tangler

participant ReferenceCommand

participant Command

Tangler --> ReferenceCommand : tangle()
ReferenceCommand --> Tangler : get len(fragment)
ReferenceCommand --> Tangler : addIndent(i)
group [for all] commands in the referenced chunk
    ReferenceCommand --> Command : tangle()
end
ReferenceCommand --> Tangler : clrIndent()

This approach can preserves the indentation in front of a @< reference @> command.

Tangler Subclass – emits the output files (28) =

class Tangler(Emitter):
    code_indent = 0  #: Initial indent

    def __init__(self, output: Path = Path.cwd()) -> None:
        super().__init__(output)
        self.context: list[int] = []  #: Indentations
        self.fragment = ""  # Nothing written yet.
        # Create context and initial lastIndent values
        self.resetIndent(self.code_indent)
        # Summaries
        self.reference_names: set[str] = set()
        self.linesWritten = 0
        self.totalFiles = 0
        self.totalLines = 0

    def emit(self, web: Web) -> None:
        for file_chunk in web.files:
            self.logger.info("Tangling %s", file_chunk.name)
            self.emit_file(web, file_chunk)

    def emit_file(self, web: Web, file_chunk: Chunk) -> None:
        target_path = self.output / (file_chunk.name or "Untitled.out")
        self.logger.debug("Writing %s", target_path)
        self.logger.debug("Chunk %r", file_chunk)
        with target_path.open("w") as target:
            # An initial command to provide indentations.
            for command in file_chunk.commands:
                command.tangle(self, target)


→ Emitter write a block of code with proper indents (29)Emitter indent control: set, clear and reset (30)

Tangler Subclass – emits the output files (28). Used by → Base Class Definitions (19).

The codeBlock() method is used by each block of code tangled into a document. There are two sources of indentation:

  • A Chunk can provide an indent setting as an option. This is provided by the indent attribute of the tangle context. If specified, this is the indentation.

  • A @< name @> ReferenceCommand may be indented. This will be in a Chunk as the following three commands:

    1. A CodeCommand with only spaces and no trailing \n. The indent is buffered – not written – and the fragment attribute is set.

    2. The ReferenceCommand. This interpolates text from a NamedChunk using the prevailing indent. The tangle() method uses addIndent() and clrIndent() to mark this. The processing depends on this tangler’s fragment attribute to provide the pending indentation; the addIndent() must consume the fragment to prevent confusion with subsequent indentations.

    3. A CodeCommand with a trailing \n. (Often it’s only the newline.) If the fragment attribute is set, there’s a pending indentation that hasn’t yet been written. This can happen with there’s a @@ command at the left end of a line; often a Python decorator. The fragment is written and the fragment attribute cleared. No addIdent() will have been done to consume the fragment.

While the WEB language permits multiple @<name@> @<name@> on a single line, this is odd and potentially confusing. It isn’t clear how the second reference should be indented.

The ReferenceCommand tangle() implementation handles much of this. The following two rules apply:

  • A line of text that does not end with a newline, sets a new prevailing indent for the following command(s).

  • A line of text ending with a newline resets the prevailing indent.

This a stack, maintained by the Tangler.

Emitter write a block of code with proper indents (29) =

def codeBlock(self, target: TextIO, text: str) -> None:
    """Indented write of text in a ``CodeCommand``.
    Counts lines and saves position to indent to when expanding ``@<...@>`` references.

    The ``fragment`` is the prevailing indent used in reference expansion.
    """
    for line in text.splitlines(keepends=True):
        self.logger.debug("codeBlock(%r)", line)
        indent = self.context[-1]
        if len(line) == 0:
            # Degenerate case of empty CodeText command. Should not occur.
            pass
        elif not line.endswith('\n'):
            # Possible start of indentation prior to a ``@<name@>``
            target.write(indent*' ')
            wrote = target.write(line)
            self.fragment = ' ' * wrote
            # May be used by a ``ReferenceCommand``, if needed.
        elif line.endswith('\n'):
            target.write(indent*' ')
            target.write(line)
            self.linesWritten += 1
        else:
            raise RuntimeError("Non-exhaustive if statement.")

Emitter write a block of code with proper indents (29). Used by → Tangler Subclass – emits the output files (28).

The addIndent() increments the indent. Used by @<name@> to set a prevailing indent.

The setIndent() pushes a fixed indent instead adding an increment. Used by a Chunk with an -indent option.

The clrIndent() method discards the most recent indent from the context stack. This is used when finished tangling a source chunk. This restores the indent to the prevailing indent.

The resetIndent() method removes all indent context information and resets the indent to a default.

Emitter indent control: set, clear and reset (30) =

def addIndent(self, increment: int) -> None:
    self.lastIndent = self.context[-1]+increment
    self.context.append(self.lastIndent)
    self.log_indent.debug("addIndent %d: %r", increment, self.context)
    self.fragment = ""

def setIndent(self, indent: int) -> None:
    self.context.append(indent)
    self.lastIndent = self.context[-1]
    self.log_indent.debug("setIndent %d: %r", indent, self.context)
    self.fragment = ""

def clrIndent(self) -> None:
    if len(self.context) > 1:
        self.context.pop()
    self.lastIndent = self.context[-1]
    self.log_indent.debug("clrIndent %r", self.context)
    self.fragment = ""

def resetIndent(self, indent: int = 0) -> None:
    """Resets the indentation context."""
    self.lastIndent = indent
    self.context = [self.lastIndent]
    self.log_indent.debug("resetIndent %d: %r", indent, self.context)

Emitter indent control: set, clear and reset (30). Used by → Tangler Subclass – emits the output files (28).

An extension to the Tangler class that only updates a file if the content has changed. This tangles to a temporary file. If the content is identical, the temporary file is quietly disposed of. Otherwise, the temporary file is linked to the original name.

Files are compared with the filecmp module.

Imports (31) +=

import filecmp
import tempfile
import os

Imports (31). Used by → pyweb.py (79).

TanglerMake Subclass – extends Tangler to avoid touching files that didn’t change (32) =

class TanglerMake(Tangler):
    def emit_file(self, web: Web, file_chunk: Chunk) -> None:
        target_path = self.output / (file_chunk.name or "Untitled.out")
        self.logger.debug("Writing %s via a temp file", target_path)
        self.logger.debug("Chunk %r", file_chunk)

        fd, tempname = tempfile.mkstemp(dir=os.curdir)
        with os.fdopen(fd, "w") as target:
            for command in file_chunk.commands:
                command.tangle(self, target)

        try:
            same = filecmp.cmp(tempname, target_path)
        except OSError as e:
            same = False  # Doesn't exist. (Could check for errno.ENOENT)

        if same:
            self.logger.info("Unchanged '%s'", target_path)
            os.remove(tempname)
        else:
            # Windows requires the original file name be removed first.
            try:
                target_path.unlink()
            except OSError as e:
                pass  # Doesn't exist. (Could check for errno.ENOENT)
            target_path.parent.mkdir(parents=True, exist_ok=True)
            target_path.hardlink_to(tempname)
            os.remove(tempname)
            self.logger.info("Wrote %d lines to %s", self.linesWritten, target_path)

TanglerMake Subclass – extends Tangler to avoid touching files that didn’t change (32). Used by → Base Class Definitions (19).

Input Parsing

There are three tiers to the input parsing:

  • A base tokenizer.

= Additionally, a separate parser is used for options in @d and @o commands.

  • The overall WebReader class.

class WebReader {
    load(path) : Web
    parse_source()
}

class Tokenizer <<Iterator>> {
    __next__(self) : str
}

WebReader --> Tokenizer
WebReader --> WebReader : "parent"
WebReader --> argparse.ArgumentParser

We’ll start with the WebReader class definition

Base Class Definitions (33) +=

Tokenizer class - breaks input into tokens (51)WebReader class - parses the input file, building the Web structure (34)

Base Class Definitions (33). Used by → pyweb.py (79).

The WebReader Class

There are two forms of the constructor for a WebReader. The initial WebReader instance is created with code like the following:

p = WebReader()
p.command = options.commandCharacter

This will define the command character; usually provided as a command-line parameter to the application.

When processing an include file (with the @i command), a child WebReader instance is created with code like the following:

c = WebReader(parent=parentWebReader)

This will inherit the configuration from the parent WebReader. This will also include a reference from child to parent so that embedded Python expressions can view the entire input context.

The WebReader class parses the input file into command blocks. These are assembled into Chunks, and the Chunks are assembled into the document Web. Once this input pass is complete, the resulting Web can be tangled or woven.

The commands have three general types:

  • “Structural” commands define the structure of the Chunks. The structural commands are @d and @o, as well as the @{, @}, @[, @] brackets, and the @i command to include another file.

  • “Inline” commands are inline within a Chunk: they define internal Commands. Blocks of text are minor commands, as well as the @<name@> references. The @@ escape is also handled here so that all further processing is independent of any parsing.

  • “Content” commands generate woven content. These include the various cross-reference commands (@f, @m and @u).

There are two class-level argparse.ArgumentParser instances used by this class.

output_option_parser:

An argparse.ArgumentParser used to parse the @o command’s options.

definition_option_parser:

An argparse.ArgumentParser used to parse the @d command’s options.

The class has the following attributes:

parent:

is the outer WebReader when processing a @i command.

command:

is the command character; a WebReader will use the parent command character if the parent is not None. Default is @.

permitList:

is the list of commands that are permitted to fail. This is generally an empty list or ('@i',).

_source:

The open file-like object being used by load().

filePath:

The path being processed; this provides a visible file name.

tokenizer:

An instance of Tokenizer used to parse the input. This is built when load() is called.

totalLines:

totalFiles:

errors:

Summary counts.

WebReader class - parses the input file, building the Web structure (34) =

class WebReader:
    """Parse an input file, creating Chunks and Commands."""

    # Configuration
    #: The command prefix, default ``@``.
    command: str
    #: Permitted errors, usually @i commands
    permitList: list[str]
    #: Working directory to resolve @i commands
    base_path: Path
    #: The tokenizer used to find commands
    tokenizer: Tokenizer

    # State of the reader
    #: Parent context for @i commands
    parent: Optional["WebReader"]
    #: Input Path
    filePath: Path
    #: Input file-like object, default is self.filePath.open()
    _source: TextIO
    #: The sequence of Chunk instances being built
    content: list[Chunk]

    def __init__(self, parent: Optional["WebReader"] = None) -> None:
        self.logger = logging.getLogger(self.__class__.__qualname__)

        self.output_option_parser = argparse.ArgumentParser(add_help=False, exit_on_error=False)
        self.output_option_parser.add_argument("-start", dest='start', type=str, default=None)
        self.output_option_parser.add_argument("-end", dest='end', type=str, default="")
        self.output_option_parser.add_argument("argument", type=str, nargs="*")

        # TODO: Allow a numeric argument value in ``-indent``
        self.definition_option_parser = argparse.ArgumentParser(add_help=False, exit_on_error=False)
        self.definition_option_parser.add_argument("-indent", dest='indent', action='store_true', default=False)
        self.definition_option_parser.add_argument("-noindent", dest='noindent', action='store_true', default=False)
        self.definition_option_parser.add_argument("argument", type=str, nargs="*")

        # Configuration comes from the parent or defaults if there is no parent.
        self.parent = parent
        if self.parent:
            self.command = self.parent.command
            self.permitList = self.parent.permitList
        else: # Defaults until overridden
            self.command = '@'
            self.permitList = []

        # Summary
        self.totalLines = 0
        self.totalFiles = 0
        self.errors = 0


→ WebReader command literals (49)

    def __str__(self) -> str:
        return self.__class__.__name__


→ WebReader location in the input stream (46)WebReader load the web (48)WebReader handle a command string (35)

WebReader class - parses the input file, building the Web structure (34). Used by → Base Class Definitions (33).

The reader maintains a context into which constructs are added. The Web contains Chunk instances in self.web.chunks. The current chunk is self.web.chunks[-1]. Each Chunk, similarly, has a command context in chunk.commands[-1].

This works because the language is “flat”: there are no nested @d or @o chunks.

Command recognition is done via a Chain of Command-like design. There are two conditions: the command string is recognized or it is not recognized. If the command is recognized, handleCommand() will do one of the following:

  • For “structural” commands, it will attach the current Chunk (self.aChunk) to the current Web (self.aWeb), and start a new Chunk. This becomes the context for processing commands. By default an anonymous Chunk used to accumulate text is available for all of the content outside named chunks.

  • For “inline” and “content” commands, create a Command, attach it to the current Chunk (self.aChunk).

If the command is not recognized, handleCommand() returns false, and this is a syntax error.

A subclass can override handleCommand() to

  1. Evaluate this superclass version;

  2. If the command is unknown to the superclass, then the subclass can process it;

  3. If the command is unknown to both classes, then return False. Either a subclass will handle it, or the default activity taken by load() is to treat the command as a syntax error.

The handleCommand() implementation is a massive match statement. It might be a good idea to decompose this into a number of separate methods. This would make the match statement shorter and easier to understand.

WebReader handle a command string (35) =

def handleCommand(self, token: str) -> bool:
    self.logger.debug("Reading %r", token)
    new_chunk: Optional[Chunk] = None
    match token[:2]:
        case self.cmdo:

→ start an OutputChunk, adding it to the web (36)
        case self.cmdd:

→ start a NamedChunk or NamedDocumentChunk, adding it to the web (37)
        case self.cmdi:

→ include another file (38)
        case self.cmdrcurl | self.cmdrbrak:

→ finish a chunk, start a new Chunk adding it to the web (39)
        case self.cmdpipe:

→ assign user identifiers to the current chunk (40)
        case self.cmdf:
            self.content[-1].commands.append(FileXrefCommand(self.location()))
        case self.cmdm:
            self.content[-1].commands.append(MacroXrefCommand(self.location()))
        case self.cmdu:
            self.content[-1].commands.append(UserIdXrefCommand(self.location()))
        case self.cmdlangl:

→ add a reference command to the current chunk (41)
        case self.cmdlexpr:

→ add an expression command to the current chunk (43)
        case self.cmdcmd:

→ double at-sign replacement, append this character to previous TextCommand (44)
        case self.cmdlcurl | self.cmdlbrak:
            # These should have been consumed as part of @o and @d parsing
            self.logger.error("Extra %r (possibly missing chunk name) near %r", token, self.location())
            self.errors += 1
        case _:
            return False  # did not recogize the command
    return True  # did recognize the command

WebReader handle a command string (35). Used by → WebReader class - parses the input file, building the Web structure (34).

An output chunk has the form @o name @{ content @}. We use the first two tokens to name the OutputChunk. We expect the @{ separator. We then attach all subsequent commands to this chunk while waiting for the final @} token to end the chunk.

We’ll use an ArgumentParser to locate the optional parameters. This will then let us build an appropriate instance of OutputChunk.

With some small additional changes, we could use OutputChunk(**options).

start an OutputChunk, adding it to the web (36) =

arg_str = next(self.tokenizer)
self.expect({self.cmdlcurl})
options = self.output_option_parser.parse_args(shlex.split(arg_str))
new_chunk = OutputChunk(
    name=' '.join(options.argument),
    comment_start=options.start if '-start' in options else "# ",
    comment_end=options.end if '-end' in options else "",
)
self.content.append(new_chunk)
# capture an OutputChunk up to @}

start an OutputChunk, adding it to the web (36). Used by → WebReader handle a command string (35).

A named chunk has the form @d name @{ content @} for code and @d name @[ content @] for document source. We use the first two tokens to name the NamedChunk or NamedDocumentChunk. We expect either the @{ or @[ separator, and use the actual token found to choose which subclass of Chunk to create. We then attach all subsequent commands to this chunk while waiting for the final @} or @] token to end the chunk.

We’ll use an ArgumentParser to locate the optional parameter of -noindent.

TODO: Extend this to support -indent number

Then we can use the options value to create an appropriate subclass of NamedChunk.

If `"-indent" is in options, this is the default. If both are in the options, we should provide a warning.

TODO: Add a warning for conflicting options.

start a NamedChunk or NamedDocumentChunk, adding it to the web (37) =

arg_str = next(self.tokenizer)
brack = self.expect({self.cmdlcurl, self.cmdlbrak})
options = self.definition_option_parser.parse_args(shlex.split(arg_str))
name = ' '.join(options.argument)

if brack == self.cmdlbrak:
    new_chunk = NamedDocumentChunk(name)
elif brack == self.cmdlcurl:
    if 'noindent' in options and options.noindent:
        new_chunk = NamedChunk_Noindent(name)
    else:
        new_chunk = NamedChunk(name)
elif brack == None:
    new_chunk = None
    pass  # Error already noted by ``expect()``
else:
    raise RuntimeError("Design Error")

if new_chunk:
    self.content.append(new_chunk)
# capture a NamedChunk up to @} or @]

start a NamedChunk or NamedDocumentChunk, adding it to the web (37). Used by → WebReader handle a command string (35).

An import command has the unusual form of @i name, with no trailing separator. When we encounter the @i token, the next token will start with the file name, but may continue with an anonymous chunk. To avoid confusion, we require that all @i commands occur at the end of a line, The break on the '\n' which ends the file name. This permits file names with embedded spaces. It also permits arguments and options, if really necessary.

Once we have split the file name away from the rest of the following anonymous chunk, we push the following token (a \n) back into the token stream, so that it will be the first token examined at the top of the load() loop.

We create a child WebReader instance to process the included file. The entire file is loaded into the current Web instance. A new, empty Chunk is created at the end of the file so that processing can resume with an anonymous Chunk.

The reader has a permitList attribute. This lists any commands where failure is permitted. Currently, only the @i command can be set to permit failure; this allows a .w to include a file that does not yet exist.

The primary use case for this permitted error feature is when weaving test output. A first use of the py-web-lp can be used to tangle the program source files, ignoring a missing test output file, named in an @i command. The application can then be run to create the missing test output file. After this, a second use of the py-web-lp can weave the test output file into a final, complete document.

include another file (38) =

incPath = Path(next(self.tokenizer).strip())
try:
    include = WebReader(parent=self)
    if not incPath.is_absolute():
        incPath = self.base_path / incPath
    self.logger.info("Including '%s'", incPath)
    self.content.extend(include.load(incPath))
    self.totalLines += include.tokenizer.lineNumber
    self.totalFiles += include.totalFiles
    if include.errors:
        self.errors += include.errors
        self.logger.error("Errors in included file '%s', output is incomplete.", incPath)
except Error as e:
    self.logger.error("Problems with included file '%s', output is incomplete.", incPath)
    self.errors += 1
except IOError as e:
    self.logger.error("Problems finding included file '%s', output is incomplete.", incPath)
    # Discretionary -- sometimes we want to continue
    if self.cmdi in self.permitList: pass
    else: raise  # Seems heavy-handed, but, the file wasn't found!
# Start a new context for text or commands *after* the ``@i``.
self.content.append(Chunk())

include another file (38). Used by → WebReader handle a command string (35).

When a @} or @] are found, this finishes a named chunk. The next text is therefore part of an anonymous chunk.

Note that no check is made to assure that the previous Chunk was indeed a named chunk or output chunk started with @{ or @[. To do this, an attribute would be needed for each Chunk subclass that indicated if a trailing bracket was necessary. For the base Chunk class, this would be false, but for all other subclasses of Chunk, this would be true.

finish a chunk, start a new Chunk adding it to the web (39) =

# Start a new context for text or commands *after* this command.
self.content.append(Chunk())

finish a chunk, start a new Chunk adding it to the web (39). Used by → WebReader handle a command string (35).

User identifiers occur after a @| command inside a NamedChunk.

Note that no check is made to assure that the previous Chunk was indeed a named chunk or output chunk started with @{. To do this, an attribute would be needed for each Chunk subclass that indicated if user identifiers are permitted. For the base Chunk class, this would be false, but for the NamedChunk class and OutputChunk class, this would be true.

User identifiers are name references at the end of a NamedChunk These are accumulated and expanded by @u reference

assign user identifiers to the current chunk (40) =

try:
    names = next(self.tokenizer).strip().split()
    self.content[-1].def_names.extend(names)
except AttributeError:
    # Out of place @| user identifier command
    self.logger.error("Unexpected references near %r: %r", self.location(), token)
    self.errors += 1

assign user identifiers to the current chunk (40). Used by → WebReader handle a command string (35).

A reference command has the form @<name@>. We accept three tokens from the input, the middle token is the referenced name.

add a reference command to the current chunk (41) =

# get the name, introduce into the named Chunk dictionary
name = next(self.tokenizer).strip()
closing = self.expect({self.cmdrangl})
self.content[-1].commands.append(ReferenceCommand(name, self.location()))
self.logger.debug("Reading %r %r", name, closing)

add a reference command to the current chunk (41). Used by → WebReader handle a command string (35).

An expression command has the form @(Python Expression@). We accept three tokens from the input, the middle token is the expression.

There are two alternative semantics for an embedded expression.

  • Deferred Execution. This requires definition of a new subclass of Command, ExpressionCommand, and appends it into the current Chunk. At weave and tangle time, this expression is evaluated. The insert might look something like this: aChunk.append(ExpressionCommand(expression, self.location())).

  • Immediate Execution. This simply creates a context and evaluates the Python expression. The output from the expression becomes a TextCommand, and is append to the current Chunk.

We use the Immediate Execution semantics – the expression is immediately appended to the current chunk’s text.

We provide a few elements of the os module. We provide os.path library. The os.getcwd() could be changed to os.path.realpath('.'), but that seems too long-winded.

Imports (42) +=

import builtins
import sys
import platform

Imports (42). Used by → pyweb.py (79).

TODO: Appening the text should be a method of a Chunk – either append text, or append a command.

add an expression command to the current chunk (43) =

# get the Python expression, create the expression result
expression = next(self.tokenizer)
self.expect({self.cmdrexpr})
try:
    # Build Context
    # **TODO:** Parts of this are static and can be built as part of ``__init__()``.
    dangerous = {
        'breakpoint', 'compile', 'eval', 'exec', 'execfile', 'globals', 'help', 'input',
        'memoryview', 'open', 'print', 'super', '__import__'
    }
    safe = types.SimpleNamespace(**dict(
        (name, obj)
        for name,obj in builtins.__dict__.items()
        if name not in dangerous
    ))
    globals = dict(
        __builtins__=safe,
        os=types.SimpleNamespace(path=os.path, getcwd=os.getcwd, name=os.name),
        time=time,
        datetime=datetime,
        platform=platform,
        theWebReader=self,
        theFile=self.filePath,
        thisApplication=sys.argv[0],
        __version__=__version__,  # Legacy compatibility. Deprecated.
        version=__version__,
        theLocation=str(self.location()),  # The only thing that's dynamic
        )
    # Evaluate
    result = str(eval(expression, globals))
except Exception as exc:
    self.logger.error('Failure to process %r: exception is %r', expression, exc)
    self.errors += 1
    result = f"@({expression!r}: Error {exc!r}@)"
self.content[-1].add_text(result, self.location())

add an expression command to the current chunk (43). Used by → WebReader handle a command string (35).

A double command sequence ('@@', when the command is an '@') has the usual meaning of '@' in the input stream. We do this by appending text to the last command in the current Chunk. This will append the character on the end of the most recent TextCommand or CodeCommand`; if this fails, it will create a new, empty TextCommand or CodeCommand.

TODO: This should be a method of a Chunk – either append text, or append a command.

double at-sign replacement, append this character to previous TextCommand (44) =

self.logger.debug(f"double-command: {self.content[-1]=}")
self.content[-1].add_text(self.command, self.location())

double at-sign replacement, append this character to previous TextCommand (44). Used by → WebReader handle a command string (35).

The expect() method examines the next token to see if it is the expected item. '\n' are absorbed. If this is not found, a standard type of error message is raised. This is used by handleCommand().

WebReader handle a command string (45) +=

def expect(self, tokens: set[str]) -> str | None:
    """Compare next token with expectation, quietly skipping whitespace (i.e., ``\n``)."""
    try:
        t = next(self.tokenizer)
        while t == '\n':
            t = next(self.tokenizer)
    except StopIteration:
        self.logger.error("At %r: end of input, %r not found", self.location(), tokens)
        self.errors += 1
        return None
    if t in tokens:
        return t
    else:
        self.logger.error("At %r: expected %r, found %r", self.location(), tokens, t)
        self.errors += 1
        return None

WebReader handle a command string (45). Used by → WebReader class - parses the input file, building the Web structure (34).

The location() provides the file name and line number. This allows error messages as well as tangled or woven output to correctly reference the original input files.

WebReader location in the input stream (46) =

def location(self) -> tuple[str, int]:
    return (str(self.filePath), self.tokenizer.lineNumber+1)

WebReader location in the input stream (46). Used by → WebReader class - parses the input file, building the Web structure (34).

The load() method reads the entire input file as a sequence of tokens, split up by the Tokenizer. Each token that appears to be a command is passed to the handleCommand() method. If the handleCommand() method returns a True result, the command was recognized and placed in the Web. If handleCommand() returns a False result, the command was unknown, and we write a warning but treat it as text.

The load() method is used recursively to handle the @i command. The issue is that it’s always loading a single top-level web.

Imports (47) +=

from typing import TextIO, cast

Imports (47). Used by → pyweb.py (79).

WebReader load the web (48) =

def load(self, filepath: Path, source: TextIO | None = None) -> list[Chunk]:
    """Returns a flat list of chunks to be made into a Web.
    Also used to expand ``@i`` included files.
    """
    self.filePath = filepath
    self.base_path = self.filePath.parent

    if source:
        self._source = source
        self.parse_source()
    else:
        with self.filePath.open() as self._source:
            self.parse_source()
    return self.content

def parse_source(self) -> None:
    """Builds a sequence of Chunks."""
    self.tokenizer = Tokenizer(self._source, self.command)
    self.totalFiles += 1

    # Initial anonymous chunk.
    self.content = [Chunk()]

    for token in self.tokenizer:
        if len(token) >= 2 and token.startswith(self.command):
            if self.handleCommand(token):
                continue
            else:
                self.logger.error('Unknown @-command in input: %r near %r', token, self.location())
                self.content[-1].add_text(token, self.location())

        elif token:
            # Accumulate a non-empty block of text in the current chunk.
            self.content[-1].add_text(token, self.location())

        else:
            # Whitespace
            pass
    self.logger.debug("parse_source: [")
    for c in self.content:
        self.logger.debug("  %r", c)
    self.logger.debug("]")

WebReader load the web (48). Used by → WebReader class - parses the input file, building the Web structure (34).

The command character can be changed to permit some flexibility when working with languages that make extensive use of the @ symbol, i.e., PERL. The initialization of the WebReader is based on the selected command character.

WebReader command literals (49) =

# Structural ("major") commands
self.cmdo = self.command+'o'
self.cmdd = self.command+'d'
self.cmdlcurl = self.command+'{'
self.cmdrcurl = self.command+'}'
self.cmdlbrak = self.command+'['
self.cmdrbrak = self.command+']'
self.cmdi = self.command+'i'

# Inline ("minor") commands
self.cmdlangl = self.command+'<'
self.cmdrangl = self.command+'>'
self.cmdpipe = self.command+'|'
self.cmdlexpr = self.command+'('
self.cmdrexpr = self.command+')'
self.cmdcmd = self.command+self.command

# Content "minor" commands
self.cmdf = self.command+'f'
self.cmdm = self.command+'m'
self.cmdu = self.command+'u'

WebReader command literals (49). Used by → WebReader class - parses the input file, building the Web structure (34).

The Tokenizer Class

The WebReader requires a tokenizer. The tokenizer breaks the input text into a stream of tokens. There are two broad classes of tokens:

  • r'@.' command tokens, including the structural, inline, and content commands.

  • r'\n'. Inside text, these matter. Within structural command tokens, these don’t matter. Except after the filename after an @i command, where it ends the command.

  • The remaining text; neither newlines nor commands.

The tokenizer works by reading the entire file and splitting on r'@.' patterns. The re.split() function will separate the input and preserve the actual character sequence on which the input was split. This breaks the input into blocks of text separated by the r'@.' characters.

For example:

>>> pat.split( "@{hi mom@}")
['', '@{', 'hi mom', '@}', '']

This tokenizer splits the input using r'@.|\n'. The idea is that we locate commands, newlines and the interstitial text as three classes of tokens. We can then assemble each Command instance from a short sequence of tokens. The core TextCommand and CodeCommand will be a line of text ending with the \n.

The tokenizer counts newline characters for us, so that error messages can include a line number. Also, we can tangle extract comments into a file to reveal source line numbers.

Since the tokenizer is a proper iterator, we can use tokens = iter(Tokenizer(source)) and next(tokens) to step through the sequence of tokens until we raise a StopIteration exception.

Imports (50) +=

import re
from collections.abc import Iterator, Iterable

Imports (50). Used by → pyweb.py (79).

Tokenizer class - breaks input into tokens (51) =

class Tokenizer(Iterator[str]):
    def __init__(self, stream: TextIO, command_char: str='@') -> None:
        self.command = command_char
        self.parsePat = re.compile(f'({self.command}.|\\n)')
        self.token_iter = (t for t in self.parsePat.split(stream.read()) if len(t) != 0)
        self.lineNumber = 0

    def __next__(self) -> str:
        token = next(self.token_iter)
        self.lineNumber += token.count('\n')
        return token

    def __iter__(self) -> Iterator[str]:
        return self

Tokenizer class - breaks input into tokens (51). Used by → Base Class Definitions (33).

Other Application Components

There are a number of other components:

  • Error class Defines a uniform exception for this module.

  • Action Class Hierarchy defines the actions the application can perform. This includes loading the WEB file, weaving, tangling, and doing combinations of actions.

  • The Application Class is a high-level definition of the py-web-lp application as a whole.

  • Logging setup defines a handy context manager to configure and shut down logging.

  • The Main Function is a top-level function to create an instance of Application, and execute it with either supplied arguments or (by default) the actual command-line arguments. This makes it easy to import and reuse main() in other applications.

Error class

An Error is raised whenever processing cannot continue. Since it is a subclass of Exception, it takes an arbitrary number of arguments. The first should be the basic message text. Subsequent arguments provide additional details. We will try to be sure that all of our internal exceptions reference a specific chunk, if possible. This means either including the chunk as an argument, or catching the exception and appending the current chunk to the exception’s arguments.

The Python raise statement takes an instance of Error and passes it to the enclosing try/except statement for processing.

The typical creation is as follows:

raise Error(f"No full name for {chunk.name!r}", chunk)

A typical exception-handling suite might look like this:

try:
    ...something that may raise an Error or Exception...
except Error as e:
    print(e.args) # this is a pyWeb internal Error
except Exception as w:
    print(w.args) # this is some other Python Exception

The Error class is a subclass of Exception used to differentiate application-specific exceptions from other Python exceptions. It does no additional processing, but merely creates a distinct class to facilitate writing except statements.

Error class defines the errors raised (52) =

class Error(Exception): pass

Error class defines the errors raised (52). Used by → pyweb.py (79).

Action Class Hierarchy

This application performs three major actions: loading the document web, weaving and tangling. Generally, the use case is to perform a load, weave and tangle. However, a less common use case is to first load and tangle output files, run a regression test and then load and weave a result that includes the test output file.

The -x option excludes one of the two output actions. The -xw excludes the weave pass, doing only the tangle action. The -xt excludes the tangle pass, doing the weave action.

This two pass action might be embedded in the following type of Python program.

import pyweb, os, runpy, sys, pathlib, contextlib
log = pathlib.Path("source.log")
Tangler().emit(web)
with log.open("w") as target:
    with contextlib.redirect_stdout(target):
        # run the app, capturing the output
        runpy.run_path('source.py')
        # Alternatives include using pytest or doctest
Weaver().emit(web, "something.rst")

The first step runs py-web-lp , excluding the final weaving pass. The second step runs the tangled program, source.py, and produces test results in some log file, source.log. The third step runs py-web-lp excluding the tangle pass. This produces a final document that includes the source.log test results.

To accomplish this, we provide a class hierarchy that defines the various actions of the py-web-lp application. This class hierarchy defines an extensible set of fundamental actions. This gives us the flexibility to create a simple sequence of actions and execute any combination of these. It eliminates the need for a forest of if-statements to determine precisely what will be done.

Each action has the potential to update the state of the overall application. A partner with this command hierarchy is the Application class that defines the application options, inputs and results.

Action class hierarchy used to describe actions of the application (53) =

Action superclass has common features of all actions (54)ActionSequence subclass that holds a sequence of other actions (57)WeaveAction subclass initiates the weave action (60)TangleAction subclass initiates the tangle action (63)LoadAction subclass loads the document web (66)

Action class hierarchy used to describe actions of the application (53). Used by → pyweb.py (79).

The Action class embodies the basic operations of py-web-lp . The intent of this hierarchy is to both provide an easily expanded method of adding new actions, but an easily specified list of actions for a particular run of py-web-lp .

The overall process of the application is defined by an instance of Action. This instance may be the WeaveAction instance, the TangleAction instance or a ActionSequence instance.

The instance is constructed during parsing of the input parameters. Then the Action class perform() method is called to actually perform the action. There are three standard Action instances available: an instance that is a macro and does both tangling and weaving, an instance that excludes tangling, and an instance that excludes weaving. These correspond to the command-line options.

anOp = SomeAction("name")
anOp(argparse.Namespace)

The Action is the superclass for all actions. An Action has a number of common attributes.

name:

A name for this action.

options:

The argparse.Namespace object. A LoadAction will update this with the Web object that was loaded.

!start:

The time at which the action started.

Action superclass has common features of all actions (54) =

class Action:
    """An action performed by pyWeb."""
    start: float
    options: argparse.Namespace

    def __init__(self, name: str) -> None:
        self.name = name
        self.logger = logging.getLogger(self.__class__.__qualname__)

    def __str__(self) -> str:
        return f"{self.name!s} [{self.options.web!s}]"


→ Action call method actually does the real work (55)Action final summary of what was done (56)

Action superclass has common features of all actions (54). Used by → Action class hierarchy used to describe actions of the application (53).

The __call__() method does the real work of the action. For the superclass, it merely logs a message. This is overridden by a subclass.

Action call method actually does the real work (55) =

def __call__(self, options: argparse.Namespace) -> None:
    self.logger.info("Starting %s", self.name)
    self.options = options
    self.start = time.process_time()

Action call method actually does the real work (55). Used by → Action superclass has common features of all actions (54).

The summary() method returns some basic processing statistics for this action.

Action final summary of what was done (56) =

def duration(self) -> float:
    """Return duration of the action."""
    return (self.start and time.process_time()-self.start) or 0

def summary(self) -> str:
    return f"{self.name!s} in {self.duration():0.3f} sec."

Action final summary of what was done (56). Used by → Action superclass has common features of all actions (54).

ActionSequence Class

A ActionSequence defines a composite action; it is a sequence of other actions. When the macro is performed, it delegates to the sub-actions.

The instance is created during parsing of input parameters. An instance of this class is one of the three standard actions available; it generally is the default, “do everything” action.

This class overrides the perform() method of the superclass. It also adds an append() method that is used to construct the sequence of actions.

ActionSequence subclass that holds a sequence of other actions (57) =

class ActionSequence(Action):
    """An action composed of a sequence of other actions."""
    def __init__(self, name: str, opSequence: list[Action] | None = None) -> None:
        super().__init__(name)
        if opSequence: self.opSequence = opSequence
        else: self.opSequence = []

    def __str__(self) -> str:
        return "; ".join([str(x) for x in self.opSequence])


→ ActionSequence call method delegates the sequence of ations (58)ActionSequence summary summarizes each step (59)

ActionSequence subclass that holds a sequence of other actions (57). Used by → Action class hierarchy used to describe actions of the application (53).

Since the macro __call__() method delegates to other Actions, it is possible to short-cut argument processing by using the Python *args construct to accept all arguments and pass them to each sub-action.

ActionSequence call method delegates the sequence of ations (58) =

def __call__(self, options: argparse.Namespace) -> None:
    super().__call__(options)
    for o in self.opSequence:
        o(self.options)

ActionSequence call method delegates the sequence of ations (58). Used by → ActionSequence subclass that holds a sequence of other actions (57).

The summary() method returns some basic processing statistics for each step of this action.

ActionSequence summary summarizes each step (59) =

def summary(self) -> str:
    return ", ".join([o.summary() for o in self.opSequence])

ActionSequence summary summarizes each step (59). Used by → ActionSequence subclass that holds a sequence of other actions (57).

WeaveAction Class

The WeaveAction defines the action of weaving. This action logs a message, and invokes the weave() method of the Web instance. This method also includes the basic decision on which weaver to use. If a Weaver was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected.

This class overrides the __call__() method of the superclass.

If the options include theWeaver, that Weaver instance will be used. Otherwise, the web.language() method function is used to guess what weaver to use.

WeaveAction subclass initiates the weave action (60) =

class WeaveAction(Action):
    """Weave the final document."""
    def __init__(self) -> None:
        super().__init__("Weave")

    def __str__(self) -> str:
        return f"{self.name!s} [{self.options.web!s}, {self.options.theWeaver!s}]"


→ WeaveAction call method to pick the language (61)WeaveAction summary of language choice (62)

WeaveAction subclass initiates the weave action (60). Used by → Action class hierarchy used to describe actions of the application (53).

The language is picked just prior to weaving. It is either (1) the language specified on the command line, or, (2) if no language was specified, a language is selected based on the first few characters of the input.

Weaving can only raise an exception when there is a reference to a chunk that is never defined.

WeaveAction call method to pick the language (61) =

def __call__(self, options: argparse.Namespace) -> None:
    super().__call__(options)
    if not self.options.weaver:
        # Examine first few chars of first chunk of web to determine language
        self.options.weaver = self.options.web.language()
        self.logger.info("Using %s", self.options.theWeaver)
    self.options.theWeaver.output = self.options.output
    try:
        self.options.theWeaver.set_markup(self.options.weaver)
        self.options.theWeaver.emit(self.options.web)
        self.logger.info("Finished Normally")
    except Error as e:
        self.logger.error("Problems weaving document from %r (weave file is faulty).", self.options.web.web_path)
        #raise

WeaveAction call method to pick the language (61). Used by → WeaveAction subclass initiates the weave action (60).

The summary() method returns some basic processing statistics for the weave action.

WeaveAction summary of language choice (62) =

def summary(self) -> str:
    if self.options.theWeaver and self.options.theWeaver.linesWritten > 0:
        return (
            f"{self.name!s} {self.options.theWeaver.linesWritten:d} lines in {self.duration():0.3f} sec."
        )
    return f"did not {self.name!s}"

WeaveAction summary of language choice (62). Used by → WeaveAction subclass initiates the weave action (60).

TangleAction Class

The TangleAction defines the action of tangling. This operation logs a message, and invokes the weave() method of the Web instance. This method also includes the basic decision on which weaver to use. If a Weaver was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected.

This class overrides the __call__() method of the superclass.

The options must include theTangler, with the Tangler instance to be used.

TangleAction subclass initiates the tangle action (63) =

class TangleAction(Action):
    """Tangle source files."""
    def __init__(self) -> None:
        super().__init__("Tangle")


→ TangleAction call method does tangling of the output files (64)TangleAction summary method provides total lines tangled (65)

TangleAction subclass initiates the tangle action (63). Used by → Action class hierarchy used to describe actions of the application (53).

Tangling can only raise an exception when a cross reference request (@f, @m or @u) occurs in a program code chunk. Program code chunks are defined with any of @d or @o and use @{ @} brackets.

TangleAction call method does tangling of the output files (64) =

def __call__(self, options: argparse.Namespace) -> None:
    super().__call__(options)
    self.options.theTangler.include_line_numbers = self.options.tangler_line_numbers
    self.options.theTangler.output = self.options.output
    try:
        self.options.theTangler.emit(self.options.web)
    except Error as e:
        self.logger.error("Problems tangling outputs from %r (tangle files are faulty).", self.options.web.web_path)
        #raise

TangleAction call method does tangling of the output files (64). Used by → TangleAction subclass initiates the tangle action (63).

The summary() method returns some basic processing statistics for the tangle action.

TangleAction summary method provides total lines tangled (65) =

def summary(self) -> str:
    if self.options.theTangler and self.options.theTangler.linesWritten > 0:
        return (
            f"{self.name!s} {self.options.theTangler.totalLines:d} lines in {self.duration():0.3f} sec."
        )
    return f"did not {self.name!r}"

TangleAction summary method provides total lines tangled (65). Used by → TangleAction subclass initiates the tangle action (63).

LoadAction Class

The LoadAction defines the action of loading the web structure. This action uses the application’s webReader to actually do the load.

An instance is created during parsing of the input parameters. An instance of this class is part of any of the weave, tangle and “do everything” action.

This class overrides the __call__() method of the superclass.

The options must include webReader, with the WebReader instance to be used.

LoadAction subclass loads the document web (66) =

class LoadAction(Action):
    """Load the source web."""
    def __init__(self) -> None:
        super().__init__("Load")

    def __str__(self) -> str:
        return f"Load [{self.webReader!s}, {self.options.web!s}]"


→ LoadAction call method loads the input files (67)LoadAction summary provides lines read (68)

LoadAction subclass loads the document web (66). Used by → Action class hierarchy used to describe actions of the application (53).

Trying to load the web involves two steps, either of which can raise exceptions due to incorrect inputs.

  1. The WebReader class load() method can raise exceptions for a number of syntax errors as well as OS errors.

    • Missing closing brackets (@}, @] or @>).

    • Missing opening bracket (@{ or @[) after a chunk name (@d or @o).

    • Extra brackets (@{, @[, @}, @]).

    • Extra @|.

    • The input file does not exist or is not readable.

  2. The Web class createUsedBy() method can raise an exception when a chunk reference cannot be resolved to a named chunk.

LoadAction call method loads the input files (67) =

def __call__(self, options: argparse.Namespace) -> None:
    super().__call__(options)
    self.webReader = self.options.webReader
    self.webReader.command = self.options.command
    self.webReader.permitList = self.options.permitList
    self.logger.debug("Reader Class %s", self.webReader.__class__.__name__)

    error = f"Problems with source file {self.options.source_path!r}, no output produced."
    try:
        chunks = self.webReader.load(self.options.source_path)
        if self.webReader.errors != 0:
            raise Error("Syntax Errors in the Web")
        self.logger.debug("Read %d Chunks", len(chunks))
        self.options.web = Web(chunks)
        self.options.web.web_path = self.options.source_path
        self.logger.debug("Web contains %3d chunks", len(self.options.web.chunks))
        self.logger.debug("Web defines  %3d files", len(self.options.web.files))
        self.logger.debug("Web defines  %3d macros", len(self.options.web.macros))
        self.logger.debug("Web defines  %3d names", len(self.options.web.userids))
    except Error as e:
        self.logger.error(error)
        raise  # Could not be parsed or built.
    except IOError as e:
        self.logger.error(error)
        raise

LoadAction call method loads the input files (67). Used by → LoadAction subclass loads the document web (66).

The summary() method returns some basic processing statistics for the load action.

LoadAction summary provides lines read (68) =

def summary(self) -> str:
    return (
        f"{self.name!s} {self.webReader.totalLines:d} lines from {self.webReader.totalFiles:d} files in {self.duration():0.3f} sec."
    )

LoadAction summary provides lines read (68). Used by → LoadAction subclass loads the document web (66).

The Application Class

The Application class is provided so that the Action instances have an overall application to update. This allows the WeaveAction to provide the selected Weaver instance to the application. It also provides a central location for the various options and alternatives that might be accepted from the command line.

The constructor creates a default argparse.Namespace with values suitable for weaving and tangling.

The parseArgs() method uses the sys.argv sequence to parse the command line arguments and update the options. This allows a program to pre-process the arguments, passing other arguments to this module.

The process() method processes a list of files. This is either the list of files passed as an argument, or it is the list of files parsed by the parseArgs() method.

The parseArgs() and process() functions are separated so that another application can import pyweb, bypass command-line parsing, yet still perform the basic actionss simply and consistently. For example:

import pyweb, argparse

p = argparse.ArgumentParser()
argument definition
config = p.parse_args()

a = pyweb.Application()
Configure the Application based on options
a.process(config)

The main() function creates an Application instance and calls the parseArgs() and process() methods to provide the expected default behavior for this module when it is used as the main program.

The configuration can be either a types.SimpleNamespace or an argparse.Namespace instance.

Imports (69) +=

import argparse
import shlex

Imports (69). Used by → pyweb.py (79).

Application Class for overall CLI operation (70) =

class Application:
    def __init__(self, base_config: dict[str, Any] | None = None) -> None:
        self.logger = logging.getLogger(self.__class__.__qualname__)

→ Application default options (71)Application parse command line (72)Application class process all files (73)

Application Class for overall CLI operation (70). Used by → pyweb.py (79).

The first part of parsing the command line is setting default values that apply when parameters are omitted. The default values are set as follows:

defaults:

A default configuration.

webReader:

is the WebReader instance created for the current input file.

doWeave:

instance of Action that does weaving only.

doTangle:

instance of Action that does tangling only.

theAction:

is an instance of Action that describes the default overall action: load, tangle and weave. This is the default unless overridden by an option.

Here are the configuration values. These are attributes of the argparse.namespace default as well as the updated namespace returned by parseArgs().

verbosity:

Either logging.INFO, logging.WARN or logging.DEBUG

command:

is set to @ as the default command introducer.

permit:

The raw list of permitted command characters, perhaps 'i'.

permitList:

provides a list of commands that are permitted to fail. Typically this is empty, or contains @i to allow the include command to fail.

files:

is the final list of argument files from the command line; these will be processed unless overridden in the call to process().

!skip:

a list of steps to skip: perhaps 'w' or 't' to skip weaving or tangling.

weaver:

the short name of the weaver.

theTangler:

is set to a TanglerMake instance to create the output files.

theWeaver:

is set to an instance of a subclass of Weaver based on weaver

Application default options (71) =

self.defaults = argparse.Namespace(
    verbosity=logging.INFO,
    command='@',
    weaver='rst',
    skip='',  # Don't skip any steps
    permit='',  # Don't tolerate missing includes
    reference='s',  # Simple references
    tangler_line_numbers=False,
    output=Path.cwd(),
    )

# Primitive Actions
self.loadOp = LoadAction()
self.weaveOp = WeaveAction()
self.tangleOp = TangleAction()

# Composite Actions
self.doWeave = ActionSequence("load and weave", [self.loadOp, self.weaveOp])
self.doTangle = ActionSequence("load and tangle", [self.loadOp, self.tangleOp])
self.theAction = ActionSequence("load, tangle and weave", [self.loadOp, self.tangleOp, self.weaveOp])

Application default options (71). Used by → Application Class for overall CLI operation (70).

The algorithm for parsing the command line parameters uses the built in argparse module. We have to build a parser, define the options, and the parse the command-line arguments, updating the default namespace.

We further expand on the arguments. This transforms simple strings into object instances.

Application parse command line (72) =

def parseArgs(self, argv: list[str]) -> argparse.Namespace:
    p = argparse.ArgumentParser()
    p.add_argument("-v", "--verbose", dest="verbosity", action="store_const", const=logging.INFO)
    p.add_argument("-s", "--silent", dest="verbosity", action="store_const", const=logging.WARN)
    p.add_argument("-d", "--debug", dest="verbosity", action="store_const", const=logging.DEBUG)
    p.add_argument("-c", "--command", dest="command", action="store")
    p.add_argument("-w", "--weaver", dest="weaver", action="store")
    p.add_argument("-x", "--except", dest="skip", action="store", choices=('w', 't'))
    p.add_argument("-p", "--permit", dest="permit", action="store")
    p.add_argument("-n", "--linenumbers", dest="tangler_line_numbers", action="store_true")
    p.add_argument("-o", "--output", dest="output", action="store", type=Path)
    p.add_argument("-V", "--Version", action='version', version=f"py-web-lp pyweb.py {__version__}")
    p.add_argument("files", nargs='+', type=Path)
    config = p.parse_args(argv, namespace=self.defaults)
    self.expand(config)
    return config

def expand(self, config: argparse.Namespace) -> argparse.Namespace:
    """Translate the argument values from simple text to useful objects.
    Weaver. Tangler. WebReader.
    """
    # Weaver & Tangler
    config.theWeaver = Weaver(config.output)
    config.theTangler = TanglerMake(config.output)

    if config.permit:
        # save permitted errors, usual case is ``-pi`` to permit ``@i`` include errors
        config.permitList = [f'{config.command!s}{c!s}' for c in config.permit]
    else:
        config.permitList = []

    config.webReader = WebReader()

    return config

Application parse command line (72). Used by → Application Class for overall CLI operation (70).

The process() function uses the current Application settings to process each file as follows:

  1. Create a new WebReader for the Application, providing the parameters required to process the input file.

  2. Create a Web instance, w and set the Web’s sourceFileName from the WebReader’s filePath.

  3. Perform the given command, typically a ActionSequence, which does some combination of load, tangle the output files and weave the final document in the target language; if necessary, examine the Web to determine the documentation language.

  4. Print a performance summary line that shows lines processed per second.

In the event of failure in any of the major processing steps, a summary message is produced, to clarify the state of the output files, and the exception is reraised. The re-raising is done so that all exceptions are handled by the outermost main program.

Application class process all files (73) =

def process(self, config: argparse.Namespace) -> None:
    root = logging.getLogger()
    root.setLevel(config.verbosity)
    self.logger.debug("Setting root log level to %r", logging.getLevelName(root.getEffectiveLevel()))

    if config.command:
        self.logger.debug("Command character %r", config.command)

    if config.skip:
        if config.skip.lower().startswith('w'):  # not weaving == tangling
            self.theAction = self.doTangle
        elif config.skip.lower().startswith('t'):  # not tangling == weaving
            self.theAction = self.doWeave
        else:
            raise Exception(f"Unknown -x option {config.skip!r}")

    for f in config.files:
        self.logger.info("%s %s %r", self.theAction.name, __version__, f)
        config.source_path = f
        self.theAction(config)
        self.logger.info(self.theAction.summary())

Application class process all files (73). Used by → Application Class for overall CLI operation (70).

Logging Setup

We’ll create a logging context manager. This allows us to wrap the main() function in an explicit with statement that assures that logging is configured and cleaned up politely.

Imports (74) +=

import logging
import logging.config

Imports (74). Used by → pyweb.py (79).

This has two configuration approaches. If a positional argument is given, that dictionary is used for logging.config.dictConfig. Otherwise, keyword arguments are provided to logging.basicConfig.

A subclass might properly load a dictionary encoded in YAML and use that with logging.config.dictConfig.

Logging Setup (75) =

class Logger:
    def __init__(self, dict_config: dict[str, Any] | None = None, **kw_config: Any) -> None:
        self.dict_config = dict_config
        self.kw_config = kw_config

    def __enter__(self) -> "Logger":
        if self.dict_config:
            logging.config.dictConfig(self.dict_config)
        else:
            logging.basicConfig(**self.kw_config)
        return self

    def __exit__(self, *args: Any) -> Literal[False]:
        logging.shutdown()
        return False

Logging Setup (75). Used by → pyweb.py (79).

Here’s a sample logging setup. This creates a simple console handler and a formatter that matches the basicConfig formatter.

This configuration dictionary defines the root logger plus some overrides for class loggers that might be used to gather additional information.

Logging Setup (76) +=

default_logging_config = {
    'version': 1,
    'disable_existing_loggers': False, # Allow pre-existing loggers to work.
    'style': '{',
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stderr',
            'formatter': 'basic',
        },
    },
    'formatters': {
        'basic': {
            'format': "{levelname}:{name}:{message}",
            'style': "{",
        }
    },

    'root': {'handlers': ['console'], 'level': logging.INFO,},

    # For specific debugging support...
    'loggers': {
        'Weaver': {'level': logging.INFO},
        'WebReader': {'level': logging.INFO},
        'Tangler': {'level': logging.INFO},
        'TanglerMake': {'level': logging.INFO},
        'indent.TanglerMake': {'level': logging.INFO},
        'Web': {'level': logging.INFO},
        # Unit test requires this...
        'ReferenceCommand': {'level': logging.INFO},
    },
}

Logging Setup (76). Used by → pyweb.py (79).

The above is wired into the application as a default. Exposing this via a configuration file is better.

pyweb.toml (77) =

[pyweb]
# PyWeb options go here.

[logging]
version = 1
disable_existing_loggers = false

[logging.root]
handlers = [ "console",]
level = "INFO"

[logging.handlers.console]
class = "logging.StreamHandler"
stream = "ext://sys.stderr"
formatter = "basic"

[logging.formatters.basic]
format = "{levelname}:{name}:{message}"
style = "{"

[logging.loggers.Weaver]
level = "INFO"

[logging.loggers.WebReader]
level = "INFO"

[logging.loggers.Tangler]
level = "INFO"

[logging.loggers.TanglerMake]
level = "INFO"

[logging.loggers.indent.TanglerMake]
level = "INFO"

[logging.loggers.ReferenceCommand]
# Unit test requires this...
level = "INFO"

pyweb.toml (77).

We can load this with something like the following:

config_path = Path("pyweb.toml")
with config_path.open('rb') as config_file:
    config = toml.load(config_file)
log_config = config.get('logging', {'version': 1, level=logging.INFO})

This makes it slightly easier to add and change debuging alternatives. Rather then use the -v and -d options, the pyweb.toml provides a complete logging config.

Also, we might want a decorator to define loggers more consistently for each class definition.

The Main Function

The top-level interface is the main() function. This function creates an Application instance.

The Application object parses the command-line arguments. Then the Application object does the requested processing. This two-step process allows for some dependency injection to customize argument processing.

We might also want to parse a logging configuration file, as well as a weaver template configuration file.

Interface Functions (78) =

def main(argv: list[str] = sys.argv[1:], base_config: dict[str, Any] | None=None) -> None:
    a = Application(base_config)
    config = a.parseArgs(argv)
    a.process(config)

Interface Functions (78). Used by → pyweb.py (79).

pyWeb Module File

The pyWeb application file is shown below:

pyweb.py (79) =

Overheads (81)Imports (2)Error class defines the errors raised (52)Base Class Definitions (1)Action class hierarchy used to describe actions of the application (53)Application Class for overall CLI operation (70)Logging Setup (75)Interface Functions (78)

if __name__ == "__main__":
    config_paths = Path("pyweb.toml"), Path.home()/"pyweb.toml"
    base_config: dict[str, Any] = {}
    for cp in config_paths:
        if cp.exists():
            with cp.open('rb') as config_file:
                base_config = toml.load(config_file)
            break
    log_config = base_config.get('logging', default_logging_config)
    with Logger(log_config):
        main(base_config=base_config.get('pyweb', {}))

pyweb.py (79).

The Overheads are described below, they include things like:

  • shell escape

  • doc string

  • __version__ setting

Python Library Imports are actually scattered in various places in this description.

The more important elements are described in separate sections:

  • Base Class Definitions

  • Application Class and Main Functions

  • Interface Functions

Python Library Imports

Numerous Python library modules are used by this application.

A few are listed here because they’re used widely. Others are listed closer to where they’re referenced.

  • The os module provide os-specific file and path manipulations; it is used to transform the input file name into the output file name as well as track down file modification times.

  • The time module provides a handy current-time string; this is used to by the HTML Weaver to write a closing timestamp on generated HTML files, as well as log messages.

  • The datetime module is used to format times, phasing out use of time.

  • The types module is used to get at SimpleNamespace for configuration.

Imports (80) +=

import os
import time
import datetime
import sys
import types

if sys.version_info[:2] <= (3, 10):
    import tomli as toml
else:
    import tomllib as toml

Imports (80). Used by → pyweb.py (79).

Note that os.path, time, datetime and platform` are provided in the expression context.

Overheads

The shell escape is provided so that the user can define this file as executable, and launch it directly from their shell. The shell reads the first line of a file; when it finds the '#!' shell escape, the remainder of the line is taken as the path to the binary program that should be run. The shell runs this binary, providing the file as standard input.

Overheads (81) =

#!/usr/bin/env python

Overheads (81). Used by → pyweb.py (79).

A Python __doc__ string provides a standard vehicle for documenting the module or the application program. The usual style is to provide a one-sentence summary on the first line. This is followed by more detailed usage information.

Overheads (82) +=

"""py-web-lp Literate Programming.

Yet another simple literate programming tool derived from **nuweb**,
implemented entirely in Python.
With a suitable configuration, this weaves documents with any markup language,
and tangles source files for any programming language.
"""

Overheads (82). Used by → pyweb.py (79).

The keyword cruft is a standard way of placing version control information into a Python module so it is preserved. See PEP (Python Enhancement Proposal) #8 for information on recommended styles.

We also sneak in a “DO NOT EDIT” warning that belongs in all generated application source files.

Overheads (83) +=

__version__ = """3.2"""

### DO NOT EDIT THIS FILE!
### It was created by pyweb.py, __version__='3.2'.
### From source impl.w modified Tue Jul 11 09:03:19 2023.
### In working directory '/Users/slott/Documents/Projects/py-web-tool/src'.

Overheads (83). Used by → pyweb.py (79).

This can be extended by doing something like the following.

  1. Subclass Weaver create a subclass with different templates.

  2. Update the pyweb.weavers dictionary.

  3. Call pyweb.main() to run the existing main program with extra classes available to it.

import pyweb
class MyWeaver(HTML):
   Any template changes

pyweb.weavers['myweaver']= MyWeaver()
pyweb.main()

This will create a variant on py-web-lp that will handle a different weaver via the command-line option -w myweaver.

Unit Tests

The tests directory includes pyweb_test.w, which will create a complete test suite.

This source will weaves a pyweb_test.html file. See tests/pyweb_test.html.

This source will tangle several test modules: test.py, test_tangler.py, test_weaver.py, test_loader.py, test_unit.py, and test_scripts.py.

Use pytest to discover and run all 80+ test cases.

Here’s a script that works out well for running this without disturbing the development environment. The PYTHONPATH setting is essential to support importing pyweb.

python pyweb.py -o tests tests/pyweb_test.w
PYTHONPATH=$(PWD) pytest

Note that the last line really does set an environment variable and run the pytest tool on a single line.

Handy Scripts and Other Files

Two aditional scripts, tangle.py and weave.py, are provided as examples which can be customized and extended.

tangle.py Script

This script shows a simple version of Tangling. This has a permitted error for ‘@i’ commands to allow an include file (for example test results) to be omitted from the tangle operation.

Note the general flow of this top-level script.

  1. Create the logging context.

  2. Create the options. This hard-coded object is a stand-in for parsing command-line options.

  3. Create the web object.

  4. For each action (LoadAction and TangleAction in this example) Set the web, set the options, execute the callable action, and write a summary.

tangle.py (84) =

#!/usr/bin/env python3
"""Sample tangle.py script."""
import argparse
import logging
from pathlib import Path
import pyweb

def main(source: Path) -> None:
    with pyweb.Logger(pyweb.default_logging_config):
        logger = logging.getLogger(__file__)

        options = argparse.Namespace(
            source_path=source,
            output=source.parent,
            verbosity=logging.INFO,
            command='@',
            permitList=['@i'],
            tangler_line_numbers=False,
            webReader=pyweb.WebReader(),
            theTangler=pyweb.TanglerMake(),
        )

        for action in pyweb.LoadAction(), pyweb.TangleAction():
            action(options)
            logger.info(action.summary())

if __name__ == "__main__":
    main(Path("examples/test_rst.w"))

tangle.py (84).

weave.py Script

This script shows a simple version of Weaving. This shows how to define a customized set of templates for a different markup language.

A customized weaver generally has three parts.

weave.py (85) =

weave.py overheads for correct operation of a script (86)weave.py custom weaver definition to customize the Weaver being used (87)weaver.py processing: load and weave the document (88)

weave.py (85).

weave.py overheads for correct operation of a script (86) =

#!/usr/bin/env python3
"""Sample weave.py script."""
import argparse
import logging
import string
from pathlib import Path
from textwrap import dedent

import pyweb

weave.py overheads for correct operation of a script (86). Used by → weave.py (85).

To override templates, a class needs to provide a text definition of the Jinja {% macro %} definitions. This is used to update the superclass template_map.

Something like the following sets the macros in use.

self.template_map['html_macros'] = my_templates

Any macro not defined gets a default implementation.

weave.py custom weaver definition to customize the Weaver being used (87) =

class MyHTML(pyweb.Weaver):
    bootstrap_html = dedent("""
    {%- macro begin_code(chunk) %}
    <div class="card">
      <div class="card-header">
        <a type="button" class="btn btn-primary" name="pyweb_{{chunk.seq}}"></a>
        <!--line number {{chunk.location}}-->
        <p class="small"><em>{{chunk.full_name or chunk.name}} ({{chunk.seq}})</em> {% if chunk.initial %}={% else %}+={% endif %}</p>
       </div>
      <div class="card-body">
        <pre><code>
    {%- endmacro -%}

    {%- macro end_code(chunk) %}
        </code></pre>
      </div>
    <div class="card-footer">
      <p>&#8718; <em>{{chunk.full_name or chunk.name}} ({{chunk.seq}})</em>.
      </p>
    </div>
    </div>
    {% endmacro -%}
    """)

    def __init__(self, output: Path = Path.cwd()) -> None:
        super().__init__(output)
        self.template_map = pyweb.Weaver.template_map | {"html_macros": self.bootstrap_html}

weave.py custom weaver definition to customize the Weaver being used (87). Used by → weave.py (85).

weaver.py processing: load and weave the document (88) =

def main(source: Path) -> None:
    with pyweb.Logger(pyweb.default_logging_config):
        logger = logging.getLogger(__file__)

        options = argparse.Namespace(
            source_path=source,
            output=source.parent,
            verbosity=logging.INFO,
            weaver="html",
            command='@',
            permitList=[],
            tangler_line_numbers=False,
            webReader=pyweb.WebReader(),

            theWeaver=MyHTML(),  # Customize with a specific Weaver subclass
        )

        for action in pyweb.LoadAction(), pyweb.WeaveAction():
            action(options)
            logger.info(action.summary())

if __name__ == "__main__":
    main(Path("examples/test_rst.w"))

weaver.py processing: load and weave the document (88). Used by → weave.py (85).

To Do

  1. Change the comment start and comment end options to use Jinja template fragments. There needs to be an -add '# {{chunk.position}}' which overrides the default of '' and injects this into each Tangled chunk. Indented appropriately.

  1. Implement all four alternative references in the end_code() macro.

    • Nothing.

    • The immediate reference.

    • The two variants on full paths:

      • top-down Named (1) / Sub-Named (2) / Sub-Sub-Named (3)

      • bottom-up Sub-Sub-Named (3) Sub-Named (2) Named (1)

  2. Implement @h command and HiddenOutputChunk. This may be sufficient. Tangling can then include non-woven output files. Viewed the other way, Weaving can exclude the non-woven file. The use case is a book chapter with test cases that are not woven into the text. See https://github.com/slott56/py-web-tool/wiki/Tangle-only-Output.

  3. Permit selecting a specific begin_code() template for a named block. A book may have distinct formatting for REPL examples and code examples. It may be as small as CSS classes for RST or HTML. It may be a more elaborate set of differences in LaTeX.

  4. Update the -indent option on @d chunks to accept a numeric argument with the specific indentation value. This becomes a kind of “noindent” with a given value. The -noindent would then be the same as -indent 0. Currently, -indent and -noindent are true/false flags.

  5. We might want to decompose the impl.w file: it’s huge. The four major sections – base model, output, input parsing, other components – make sense. However, since the output is a single .rst file, it doesn’t change much to do this.

  6. We might want to interleave code and test into a document that presents both side-by-side. We can route the tangled code to multiple files. It can be awkward to create tangled files in multiple directories, however. We’d have to use ../tests/whatever.py, assuming we were always using -o src.

  7. Rename the module from pyweb to pylpweb to avoid name squatting issues. Rename the project from py-web-lp to py-lpweb.

  8. Offer a basic XHTML template that uses CDATA sections instead of quoting. Does require the standard quoting for the CDATA end tag.

  9. Note that the overall Web is a bit like a NamedChunk that contains Chunks. This similarity could be factored out. While this will create a more proper Composition pattern implementation, it leads to the question of why nest @d or @o chunks in the first place?

Change Log

Changes for 3.2

  • Rename to py-web-lp to limit collisions in PyPI.

  • Replaced toml with tomli or built-in tomllib.

  • Fiddle with pyproject.toml and tox.ini to eliminate setup.py.

  • Replaced home-brewed OptionParser with argparse.ArgumentParser. Now the options for @d and @o can be much more flexible.

  • Added the transitive path of references as a Chunk property. Removes SimpleReference and TransitiveReference classes, and the associated “reference_style” attribute of a Weaver. To show transitive references, a revised code_end template is required.

  • Added a TOML-based configuration file with a [logging] section.

  • Incorporated Sphinx and PlantUML into the documentation. Support continues for `rst2html.py. It’s used for test documentaion. See https://github.com/slott56/py-web-tool/wiki/PlantUML-support-for-RST-%5BCompleted%5D

  • Replaced weaving process with Jinja templates. See https://github.com/slott56/py-web-tool/wiki/Jinja-Template-for-Weaving-%5BCompleted%5D

  • Dramatic redesign to Class, Chunk, and Command class hierarchies.

  • Dramatic redesign to Emitters to switch to using Jinja templates. By stepping away from the string.Template, we can incorporate list-processing {% for %}...{% endfor %} construct that pushes some processing into the template.

  • Removed '\N{LOZENGE}' (borrowed from Interscript) and use the '\N{END OF PROOF}' symbol instead.

  • Created a better weave.py example that shows how to incorporate bootstrap CSS into HTML overrides. This also requires designing a more easily extended Weaver class.

Changes for 3.1

  • Change to Python 3.10 as the supported version.

  • Add type hints, f-strings, pathlib, abc.ABC.

  • Replace some complex elif blocks with match statements.

  • Use pytest as a test runner.

  • Add a Makefile, pyproject.toml, requirements.txt and requirements-dev.txt.

  • Implement -o dir option to write output to a directory of choice, simplifying tox setup.

  • Add bootstrap directory with a snapshot of a previous working release to simplify development.

  • Add Test cases for weave.py and tangle.py

  • Replace hand-build mock classes with unittest.mock.Mock objects

  • Separate the projec into src, tests, examples. Cleanup Makefile, pyproject.toml, etc.

  • Silence the ERROR-level logging during testing.

  • Clean up the examples

Changes for 3.0

  • Move to GitHub

Changes for 2.3.2.

  • Fix all {:s} format strings to be {!s}.

Changes for 2.3.1.

  • Cleanup some stray comment errors.

  • Revise the documentation structure and organization.

  • Tweak the error messages.

Changes for 2.3.

  • Changed to Python 3.3 – Fixed except, raise and %.

  • Removed doWrite() and simplified doOpen() and doClose().

  • Cleaned up RST output to be much nicer.

  • Change the baseline pyweb.w file to be RST instead of HTML. docutils required to produce HTML from the woven output.

  • Removed the unconstrained eval() function. Provided a slim set of globals. os is really just os.path. Any os.getcwd() can be changed to os.path.realpath('.'). time was removed and replaced with datetime. Any time.asctime() must be datetime.datetime.now().ctime().

  • Resolved a small dispute between weaveReferenceTo() (wrong) and tangle() (right). for NamedChunks. The issue was one of failure to understand the differences between weaving – where indentation is localized – and tangling – where indentation must be tracked globally. Root cause was a huge problem in codeBlock() which didn’t really weave properly at all.

  • Fix the tokenizer and parsing. Stop using a complex tokenizer and use a simpler iterator over the tokens with StopIteration exception handling.

  • Replace optparse with argparse.

  • Get rid of the global logger variable.

  • Remove the filename as part of Web() initial creation. A basename comes from the initial .w file loaded by the WebReader.

  • Fix the Action class hierarchy so that composite actions are simpler.

  • Change references to return Chunk objects, not (name,sequence) pairs.

  • Make the ref list separator in Weaver reference summary... a proper template feature, not a hidden punctuation mark in the code.

  • Configure Web.reference_style properly so that simple or transitive references can be included as a command-line option. The default is Simple. Add the -r option so that -rt includes transitive references.

  • Reduce the “hard-coded” punctuation. For example, the ", " in @d Web weave... weaveChunk(). This was moved into a template.

  • Add an __enter__() and __exit__() to make an Emitter into a proper Context Manager that can be used with a with statement.

  • Add the -n option to include tangler line numbers if the @o includes the comment characters.

  • Cleanup the TanglerMake unit tests to remove the sleep() used to assure that the timestamps really are different.

  • Cleanup the syntax for adding a comment template to @o. Use -start and -end before the filename.

  • Cleanup the syntax for noindent named chunks. Use -noindent before the chunk name. This creates a distinct NamedChunk_Noindent instance that handles indentation differently from other Chunk subclasses.

  • Cleanup the TangleAction summary.

  • Clean up the error messages. Raising an exception seems heavy-handed and confusing. Count errors instead.

Changes since version 1.4

  • Removed home-brewed logger.

  • Replaced getopt with optparse.

  • Replaced LaTeX markup.

  • Corrected significant problems in cross-reference resolution.

  • Replaced all HTML and LaTeX-specific features with a much simpler template engine which applies a template to a Chunk. The Templates are separate configuration items. The big issue with templates are conditional processing and the use of loops to handle multiple references in a transitive closure. While it’s nice to depend on Jinja2, it’s also nice to be totally stand-alone. Sigh. Choices include the no-logic string.Template in the standard library or the Templite+ Recipe 576663.

  • Looked at SCons API. Renamed “Operation” to “Action”; renamed “perform” to “__call__”. Consider having “__call__” which does logging, then call “execute”.

  • Eliminated the EmitterFactory; replace this with simple injection of the proper template configuration.

  • Removed the @O command; it was essentially a variant template for LaTeX.

  • Disentangled indentation and quoting in the codeBlock. Indentation rules vary between Tangling and Weaving. Quoting is unique to a woven codeBlock. Fix referenceTo() to write indented without code quoting.

  • Offer a basic RST template. Note that colorizing may be easier to handle with an RST template. The weaving markup template degenerates to ..   parsed-literal:: and indent. By doing this, the RST output from pyWeb can be run through DocUtils rst2html.py or perhaps Sphix to create final HTML. The hard part is the indent.

  • Tweaked (but didn’t fix) ReferenceCommand tangle and all setIndent/clrIndent operations. Only a ReferenceCommand actually cares about indentation. And that indentation is totally based on the “context” plus the text in the Command immediate in front of the ReferenceCommand.

Indices

Files

pyweb.toml:

pyweb.toml (77):pyweb.py: → pyweb.py (79):tangle.py: → tangle.py (84):weave.py: → weave.py (85)

Macros

Action call method actually does the real work:

Action call method actually does the real work (55)

Action class hierarchy used to describe actions of the application:

Action class hierarchy used to describe actions of the application (53)

Action final summary of what was done:

Action final summary of what was done (56)

Action superclass has common features of all actions:

Action superclass has common features of all actions (54)

ActionSequence call method delegates the sequence of ations:

ActionSequence call method delegates the sequence of ations (58)

ActionSequence subclass that holds a sequence of other actions:

ActionSequence subclass that holds a sequence of other actions (57)

ActionSequence summary summarizes each step:

ActionSequence summary summarizes each step (59)

Application Class for overall CLI operation:

Application Class for overall CLI operation (70)

Application class process all files:

Application class process all files (73)

Application default options:

Application default options (71)

Application parse command line:

Application parse command line (72)

Base Class Definitions:

Base Class Definitions (1), → Base Class Definitions (19), → Base Class Definitions (33)

Chunk class hierarchy – used to describe individual chunks:

Chunk class hierarchy – used to describe individual chunks (8), → Chunk class hierarchy – used to describe individual chunks (9)

Command class hierarchy – used to describe individual commands in a chunk:

Command class hierarchy – used to describe individual commands in a chunk (10)

Common base template – this is used for ALL weaving:

Common base template – this is used for ALL weaving (23)

Debug Templates – these display debugging information:

Debug Templates – these display debugging information (24)

Emitter Superclass:

Emitter Superclass (21)

Emitter indent control:

set, clear and reset: → Emitter indent control: set, clear and reset (30)

Emitter write a block of code with proper indents:

Emitter write a block of code with proper indents (29)

Error class defines the errors raised:

Error class defines the errors raised (52)

HTML Templates – emit HTML weave output:

HTML Templates – emit HTML weave output (26)

Imports:

Imports (2), → Imports (11), → Imports (20), → Imports (31), → Imports (42), → Imports (47), → Imports (50), → Imports (69), → Imports (74), → Imports (80)

Interface Functions:

Interface Functions (78)

LaTeX Templates – emit LaTeX weave output:

LaTeX Templates – emit LaTeX weave output (27)

LoadAction call method loads the input files:

LoadAction call method loads the input files (67)

LoadAction subclass loads the document web:

LoadAction subclass loads the document web (66)

LoadAction summary provides lines read:

LoadAction summary provides lines read (68)

Logging Setup:

Logging Setup (75), → Logging Setup (76)

Overheads:

Overheads (81), → Overheads (82), → Overheads (83)

RST Templates – the default weave output:

RST Templates – the default weave output (25)

TangleAction call method does tangling of the output files:

TangleAction call method does tangling of the output files (64)

TangleAction subclass initiates the tangle action:

TangleAction subclass initiates the tangle action (63)

TangleAction summary method provides total lines tangled:

TangleAction summary method provides total lines tangled (65)

Tangler Subclass – emits the output files:

Tangler Subclass – emits the output files (28)

TanglerMake Subclass – extends Tangler to avoid touching files that didn’t change:

TanglerMake Subclass – extends Tangler to avoid touching files that didn’t change (32)

The CodeCommand Class:

The CodeCommand Class (16)

The Command Abstract Base Class:

The Command Abstract Base Class (13)

The HasText Type Hint – used instead of another abstract class:

The HasText Type Hint – used instead of another abstract class (14)

The ReferenceCommand Class:

The ReferenceCommand Class (17)

The TextCommand Class:

The TextCommand Class (15)

The TypeId Class – to help the template engine:

The TypeId Class – to help the template engine (12)

The XrefCommand Subclasses – files, macros, and user names:

The XrefCommand Subclasses – files, macros, and user names (18)

Tokenizer class - breaks input into tokens:

Tokenizer class - breaks input into tokens (51)

WeaveAction call method to pick the language:

WeaveAction call method to pick the language (61)

WeaveAction subclass initiates the weave action:

WeaveAction subclass initiates the weave action (60)

WeaveAction summary of language choice:

WeaveAction summary of language choice (62)

Weaver Subclass – Uses Jinja templates to weave documentation:

Weaver Subclass – Uses Jinja templates to weave documentation (22)

Web class – describes the overall “web” of chunks:

Web class – describes the overall “web” of chunks (3), → Web class – describes the overall “web” of chunks (4), → Web class – describes the overall “web” of chunks (5), → Web class – describes the overall “web” of chunks (6), → Web class – describes the overall “web” of chunks (7)

WebReader class - parses the input file, building the Web structure:

WebReader class - parses the input file, building the Web structure (34)

WebReader command literals:

WebReader command literals (49)

WebReader handle a command string:

WebReader handle a command string (35), → WebReader handle a command string (45)

WebReader load the web:

WebReader load the web (48)

WebReader location in the input stream:

WebReader location in the input stream (46)

add a reference command to the current chunk:

add a reference command to the current chunk (41)

add an expression command to the current chunk:

add an expression command to the current chunk (43)

assign user identifiers to the current chunk:

assign user identifiers to the current chunk (40)

double at-sign replacement, append this character to previous TextCommand:

double at-sign replacement, append this character to previous TextCommand (44)

finish a chunk, start a new Chunk adding it to the web:

finish a chunk, start a new Chunk adding it to the web (39)

include another file:

include another file (38)

start a NamedChunk or NamedDocumentChunk, adding it to the web:

start a NamedChunk or NamedDocumentChunk, adding it to the web (37)

start an OutputChunk, adding it to the web:

start an OutputChunk, adding it to the web (36)

weave.py custom weaver definition to customize the Weaver being used:

weave.py custom weaver definition to customize the Weaver being used (87)

weave.py overheads for correct operation of a script:

weave.py overheads for correct operation of a script (86)

weaver.py processing:

load and weave the document: → weaver.py processing: load and weave the document (88)

User Identifiers

Action:

Action superclass has common features of all actions (54)

ActionSequence:

ActionSequence subclass that holds a sequence of other actions (57)

Application:

Application Class for overall CLI operation (70)

Chunk:

Chunk class hierarchy – used to describe individual chunks (9)

Error:

Error class defines the errors raised (52)

LoadAction:

LoadAction subclass loads the document web (66)

NamedChunk:

Chunk class hierarchy – used to describe individual chunks (9)

NamedChunk_Noindent:

Chunk class hierarchy – used to describe individual chunks (9)

NamedDocumentChunk:

Chunk class hierarchy – used to describe individual chunks (9)

OutputChunk:

Chunk class hierarchy – used to describe individual chunks (9)

TangleAction:

TangleAction subclass initiates the tangle action (63)

Tokenizer:

Tokenizer class - breaks input into tokens (51)

TypeId:

The TypeId Class – to help the template engine (12)

TypeIdMeta:

The TypeId Class – to help the template engine (12)

WeaveAction:

WeaveAction subclass initiates the weave action (60)

Web:

Web class – describes the overall “web” of chunks (3), → Web class – describes the overall “web” of chunks (7)

WebReader:

WebReader class - parses the input file, building the Web structure (34)

__version__:

Overheads (83)

addIndent:

Emitter indent control: set, clear and reset (30)

argparse:

Imports (69)

builtins:

Imports (42)

clrIndent:

Emitter indent control: set, clear and reset (30)

codeBlock:

Emitter write a block of code with proper indents (29)

datetime:

Imports (80)

duration:

Action final summary of what was done (56)

expand:

Application parse command line (72)

expect:

WebReader handle a command string (45)

handleCommand:

WebReader handle a command string (35)

load:

WebReader load the web (48)

location:

WebReader location in the input stream (46)

logging:

Imports (74)

logging.config:

Imports (74)

os:

Imports (80)

parse:

WebReader load the web (48)

parseArgs:

Application parse command line (72)

perform:

Action call method actually does the real work (55), → ActionSequence call method delegates the sequence of ations (58), → WeaveAction call method to pick the language (61), → TangleAction call method does tangling of the output files (64), → LoadAction call method loads the input files (67)

platform:

Imports (42)

process:

Application class process all files (73)

re:

Imports (50)

resetIndent:

Emitter indent control: set, clear and reset (30)

setIndent:

Emitter indent control: set, clear and reset (30)

shlex:

Imports (69)

summary:

Action final summary of what was done (56), → ActionSequence summary summarizes each step (59), → WeaveAction summary of language choice (62), → TangleAction summary method provides total lines tangled (65), → LoadAction summary provides lines read (68)

sys:

Imports (42)

time:

Imports (80)

toml:

Imports (80)

types:

Imports (80)


class small

Created by pyweb.py at Tue Jul 11 09:15:02 2023.

Source pyweb.w modified Sun Jul 3 13:07:49 2022.

pyweb.__version__ ‘3.2’.

Working directory ‘/Users/slott/Documents/Projects/py-web-tool/src’.