############################## pyWeb Literate Programming 3.2 ############################## ================================================= Yet Another Literate Programming Tool ================================================= .. contents:: .. py-web-tool/src/intro.w Introduction ============ Literate programming was pioneered by Knuth as a method for developing readable, understandable presentations of programs. These would present a program in a literate fashion for people to read and understand; this would be in parallel with presentation as source text for a compiler to process and both would be generated from a common source file. One intent is to synchronize the program source with the documentation about that source. If the program and the documentation have a common origin, then the traditional gaps between intent (expressed in the documentation) and action (expressed in the working program) are significantly reduced. **py-web-lp** is a literate programming tool that combines the actions of *weaving* a document with *tangling* source files. It is independent of any source language. While is designed to work with RST document markup, it should be amenable to any other flavor of markup. It uses a small set of markup tags to define chunks of code and documentation. Background ----------- The following is an almost verbatim quote from Briggs' *nuweb* documentation, and provides an apt summary of Literate Programming. In 1984, Knuth introduced the idea of *literate programming* and described a pair of tools to support the practise (Donald E. Knuth, "Literate Programming", *The Computer Journal* 27 (1984), no. 2, 97-111.) His approach was to combine Pascal code with T\ :sub:`e`\ X documentation to produce a new language, ``WEB``, that offered programmers a superior approach to programming. He wrote several programs in ``WEB``, including ``weave`` and ``tangle``, the programs used to support literate programming. The idea was that a programmer wrote one document, the web file, that combined documentation written in T\ :sub:`e`\ X (Donald E. Knuth, T\ :sub:`e`\ X book, Computers and Typesetting, 1986) with code (written in Pascal). Running ``tangle`` on the web file would produce a complete Pascal program, ready for compilation by an ordinary Pascal compiler. The primary function of ``tangle`` is to allow the programmer to present elements of the program in any desired order, regardless of the restrictions imposed by the programming language. Thus, the programmer is free to present his program in a top-down fashion, bottom-up fashion, or whatever seems best in terms of promoting understanding and maintenance. Running ``weave`` on the web file would produce a T\ :sub:`e`\ X file, ready to be processed by T\ :sub:`e`\ X. The resulting document included a variety of automatically generated indices and cross-references that made it much easier to navigate the code. Additionally, all of the code sections were automatically prettyprinted, resulting in a quite impressive document. Knuth also wrote the programs for T\ :sub:`e`\ X and ``METAFONT`` entirely in ``WEB``, eventually publishing them in book form. These are probably the largest programs ever published in a readable form. Other Tools ------------ Numerous tools have been developed based on Knuth's initial work. A relatively complete survey is available at sites like `Literate Programming `_, and the OASIS `XML Cover Pages: Literate Programming with SGML and XML `_. The immediate predecessors to this **py-web-lp** tool are `FunnelWeb `_, `noweb `_ and `nuweb `_. The ideas lifted from these other tools created the foundation for **py-web-lp**. There are several Python-oriented literate programming tools. These include `LEO `_, `interscript `_, `lpy `_, `py2html `_, `PyLit-3 `_ The *FunnelWeb* tool is independent of any programming language and only mildly dependent on T\ :sub:`e`\ X. It has 19 commands, many of which duplicate features of HTML or L\ :sub:`a`\ T\ :sub:`e`\ X. The *noweb* tool was written by Norman Ramsey. This tool uses a sophisticated multi-processing framework, via Unix pipes, to permit flexible manipulation of the source file to tangle and weave the programming language and documentation markup files. The *nuweb* Simple Literate Programming Tool was developed by Preston Briggs (preston@tera.com). His work was supported by ARPA, through ONR grant N00014-91-J-1989. It is written in C, and very focused on producing L\ :sub:`a`\ T\ :sub:`e`\ X documents. It can produce HTML, but this is clearly added after the fact. It cannot be easily extended, and is not object-oriented. The *LEO* tool is a structured GUI editor for creating source. It uses XML and *noweb*\ -style chunk management. It is more than a simple weave and tangle tool. The *interscript* tool is very large and sophisticated, but doesn't gracefully tolerate HTML markup in the document. It can create a variety of markup languages from the interscript source, making it suitable for creating HTML as well as L\ :sub:`a`\ T\ :sub:`e`\ X. The *lpy* tool can produce very complex HTML representations of a Python program. It works by locating documentation markup embedded in Python comments and docstrings. This is called "inverted literate programming". The *py2html* tool does very sophisticated syntax coloring. The *PyLit-3* tool is perhaps the very best approach to Literate programming, since it leverages an existing lightweight markup language and it's output formatting. However, it's limited in the presentation order, making it difficult to present a complex Python module out of the proper Python required presentation. **py-web-lp** --------------- **py-web-lp** works with any programming language. It can work with any markup language, but is currently configured to work with RST. This philosophy comes from *FunnelWeb* *noweb*, *nuweb* and *interscript*. The primary differences between **py-web-lp** and other tools are the following. - **py-web-lp** is object-oriented, permitting easy extension. *noweb* extensions are separate processes that communicate through a sophisticated protocol. *nuweb* is not easily extended without rewriting and recompiling the C programs. - **py-web-lp** is built in the very portable Python programming language. This allows it to run anywhere that Python 3.3 runs, with only the addition of docutils. This makes it a useful tool for programmers in any language. - **py-web-lp** is much simpler than *FunnelWeb*, *LEO* or *Interscript*. It has a very limited selection of commands, but can still produce complex programs and HTML documents. - **py-web-lp** does not invent a complex markup language like *Interscript*. Because *Iterscript* has its own markup, it can generate L\ :sub:`a`\ T\ :sub:`e`\ X or HTML or other output formats from a unique input format. While powerful, it seems simpler to avoid inventing yet another sophisticated markup language. The language **py-web-lp** uses is very simple, and the author's use their preferred markup language almost exclusively. - **py-web-lp** supports the forward literate programming philosophy, where a source document creates programming language and markup language. The alternative, deriving the document from markup embedded in program comments ("inverted literate programming"), seems less appealing. The disadvantage of inverted literate programming is that the final document can't reflect the original author's preferred order of exposition, since that informtion generally isn't part of the source code. - **py-web-lp** also specifically rejects some features of *nuweb* and *FunnelWeb*. These include the macro capability with parameter substitution, and multiple references to a chunk. These two capabilities can be used to grow object-like applications from non-object programming languages (*e.g.* C or Pascal). Since most modern languages (Python, Java, C++) are object-oriented, this macro capability is more of a problem than a help. - Since **py-web-lp** is built in the Python interpreter, a source document can include Python expressions that are evaluated during weave operation to produce time stamps, source file descriptions or other information in the woven or tangled output. **py-web-lp** works with any programming language; it can work with any markup language. The initial release supports RST via simple templates. The following is extensively quoted from Briggs' *nuweb* documentation, and provides an excellent background in the advantages of the very simple approach started by *nuweb* and adopted by **py-web-lp**. The need to support arbitrary programming languages has many consequences: :No prettyprinting: Both ``WEB`` and ``CWEB`` are able to prettyprint the code sections of their documents because they understand the language well enough to parse it. Since we want to use *any* language, we've got to abandon this feature. However, we do allow particular individual formulas or fragments of L\ :sub:`a`\ T\ :sub:`e`\ X or HTML code to be formatted and still be part of the output files. :Limited index of identifiers: Because ``WEB`` knows about Pascal, it is able to construct an index of all the identifiers occurring in the code sections (filtering out keywords and the standard type identifiers). Unfortunately, this isn't as easy in our case. We don't know what an identifier looks like in each language and we certainly don't know all the keywords. We provide a mechanism to mark identifiers, and we use a pretty standard pattern for recognizing identifiers almost most programming languages. Of course, we've got to have some compensation for our losses or the whole idea would be a waste. Here are the advantages I [Briggs] can see: :Simplicity: The majority of the commands in ``WEB`` are concerned with control of the automatic prettyprinting. Since we don't prettyprint, many commands are eliminated. A further set of commands is subsumed by L\ :sub:`a`\ T\ :sub:`e`\ X and may also be eliminated. As a result, our set of commands is reduced to only about seven members (explained in the next section). This simplicity is also reflected in the size of this tool, which is quite a bit smaller than the tools used with other approaches. :No prettyprinting: Everyone disagrees about how their code should look, so automatic formatting annoys many people. One approach is to provide ways to control the formatting. Our approach is simpler -- we perform no automatic formatting and therefore allow the programmer complete control of code layout. :Control: We also offer the programmer reasonably complete control of the layout of his output files (the files generated during tangling). Of course, this is essential for languages that are sensitive to layout; but it is also important in many practical situations, *e.g.*, debugging. :Speed: Since [**py-web-lp**] doesn't do too much, it runs very quickly. It combines the functions of ``tangle`` and ``weave`` into a single program that performs both functions at once. :Chunk numbers: Inspired by the example of **noweb**, [**py-web-lp**] refers to all program code chunks by a simple, ascending sequence number through the file. This becomes the HTML anchor name, also. :Multiple file output: The programmer may specify more than one output file in a single [**py-web-lp**] source file. This is required when constructing programs in a combination of languages (say, Fortran and C). It's also an advantage when constructing very large programs. Acknowledgements ---------------- This application is very directly based on (derived from?) work that preceded this, particularly the following: - Ross N. Williams' *FunnelWeb* http://www.ross.net/funnelweb/ - Norman Ramsey's *noweb* http://www.eecs.harvard.edu/~nr/noweb/ - Preston Briggs' *nuweb* http://sourceforge.net/projects/nuweb/ Currently supported by Charles Martin and Marc W. Mengel Also, after using John Skaller's *interscript* http://interscript.sourceforge.net/ for two large development efforts, I finally understood the feature set I really wanted. Jason Fruit and others contributed to the previous version. .. py-web-tool/src/usage.w Installing ========== This requires Python 3.10. This is not (currently) hosted in PyPI. Instead of installing it with PIP, clone the GitHub repository or download the distribution kit. After downloading, install pyweb "manually" using the provided ``setup.py``. :: python setup.py install This will install the ``pyweb`` module. This depends on Jinja2 templates. The Jinja components should be installed when ``setup.py`` uses ``requirements.txt`` to install the required components. Using ===== **py-web-lp** supports two use cases, `Tangle Source Files`_ and `Weave Documentation`_. These are often combined to both tangle and weave an application and it's documentation. The work starts with creating a WEB file with documentation and code. Create WEB File ---------------- See `The py-web-lp Markup Language`_ for more details on the language. For a simple example, we'll use the following WEB file: ``examples/hw.w``. .. parsed-literal:: ########### Hello World ########### This file has a *small* example. @d The Body Of The Script @{ print("Hello, World!") @} The Python module includes a small script. @o hw.py @{ @ @} This example has RST markup document, that includes some ``@d`` and ``@o`` chunks to define code blocks. The ``@d`` is the definition of a named chunk, ``The Body Of The Script``. The ``@o`` defines an output file to be tangled. This file has a reference to the ``The Body Of The Script`` chunk. When tangling, the code will be used to build the file(s) in the ``@o`` chunk(s). In this example, it will write the ``hw.py`` file by tangling the referenced chunk. When weaving, the ``@d`` and ``@o`` chunks will have some additional RST markup inserted into the document. The output file will have a name based on the source WEB document. In this case it will be ``hw.rst``. Tangle Source Files ------------------- A user initiates this process when they have a complete ``.w`` file that contains a description of source files. These source files are described with ``@o`` commands in the WEB file. The use case is successful when the source files are produced. The use case is a failure when the source files cannot be produced, due to errors in the ``.w`` file. These must be corrected based on information in log messages. A typical command to tangle (without weaving) is: .. parsed-literal:: python -m pyweb -xw examples/hw.w -o examples The outputs will be defined by the ``@o`` commands in the source. The ``-o`` option writes the resulting tangled files to the named directory. Weave Documentation ------------------- A user initiates this process when they have a ``.w`` file that contains a description of a document to produce. The document is described by the entire WEB file. The default is to use ReSTructured Text (RST) markup. The output file will have the ``.rst`` suffix. The use case is successful when the documentation file is produced. The use case is a failure when the documentation file cannot be produced, due to errors in the ``.w`` file. These must be corrected based on information in log messages. A typical command to weave (without tangling) is: .. parsed-literal:: python -m pyweb -xt examples/hw.w -o examples The output will be named ``examples/hw.rst``. The ``-o`` option made sure the file was written to the ``examples`` directory. Running **py-web-lp** to Tangle and Weave ------------------------------------------- Assuming that you have marked ``pyweb.py`` as executable, you do the following: .. code:: bash python -m pyweb examples/hw.w -o examples This will tangle the ``@o`` commands in ``examples/hw.w`` It will also weave the output, and create ``examples/hw.rst``. This can be processed by docutils to create an HTML file. Command Line Options ~~~~~~~~~~~~~~~~~~~~~ Currently, the following command line options are accepted. :-v: Verbose logging. :-s: Silent operation. :-c *x*: Change the command character from ``@`` to ``*x*``. :-w *weaver*: Choose a particular documentation weaver template. Currently the choices are ``rst``, ``tex``, and ``html``. :-xw: Exclude weaving. This does tangling of source program files only. :-xt: Exclude tangling. This does weaving of the document file only. :-p *command*: Permit errors in the given list of commands. The most common version is ``-pi`` to permit errors in locating an include file. This is done in the following scenario: pass 1 uses ``-xw -pi`` to exclude weaving and permit include-file errors; the tangled program is run to create test results; pass 2 uses ``-xt`` to exclude tangling and include the test results. :-o *directory*: The directory to which to write output files. Bootstrapping -------------- **py-web-lp** is written using **py-web-lp**. The distribution includes the original ``.w`` files as well as a ``.py`` module. The bootstrap procedure is to run a "known good" ``pyweb`` to transform a working copy into a new version of ``pyweb``. We provide the previous release in the ``bootstrap`` directory. .. parsed-literal:: python bootstrap/pyweb.py pyweb.w rst2html.py pyweb.rst pyweb.html The resulting ``pyweb.html`` file is the updated documentation. The ``pyweb.py`` is the updated candidate release of **py-web-lp**. Similarly, the tests built from a ``.w`` files. .. code:: bash python pyweb.py tests/pyweb_test.w -o tests PYTHONPATH=.. pytest rst2html.py tests/pyweb_test.rst tests/pyweb_test.html Dependencies ------------- **py-web-lp** requires Python 3.10 or newer. The following components are listed in the ``requirements.txt`` file. These can be loaded via .. code:: bash python -m pip install -r requirements.txt This lp uses `Jinja `_ for template processing. The `tomli `_ library is used to parse configuration files for older Python that lack a ``tomllib`` in the standard library. If you create RST output, you'll want to use either `docutils `_ or `Sphinx `_ to translate the RST to HTML or LaTeX or any of the other formats supported by docutils or Sphinx. This is not a proper requirement to run the tool. It's a common part of an overall document production tool chain. The overview contains PlantUML diagrams. See https://plantuml.com/ for more information. The `PlantUML for Sphinx `_ plug-in can be used to render the diagrams automatically. For development, additional components like ``pytest``, ``tox``, and ``mypy`` are also used for development. More Advanced Usage =================== Here are two more advanced use cases. Tangle, Test, and Weave with Test Results ----------------------------------------- A user initiates this process when the final document should include test output from the source files created by the tangle operation. This is an extension to the example shown earlier. .. parsed-literal:: ########### Hello World ########### This file has a *small* example. @d The Body Of The Script @{ print("Hello, World!") @} The Python module includes a small script. @o hw.py @{ @ @} Example Output ============== @i examples/hw_output.log The use case is successful when the documentation file is produced, including current test output. The use case is a failure when the documentation file cannot be produced, due to errors in the ``.w`` file. These must be corrected based on information in log messages. The use case is a failure when the documentation file does not include current test output. The sequence is as follows: .. parsed-literal:: python -m pyweb -xw -pi examples/hw.w -o examples python examples/hw.py >examples/hw_output.log python -m pyweb -xt examples/hw.w -o examples The first step uses ``-xw`` to excludes document weaving. The ``-pi`` option will permits errors on the ``@i`` command. This is necessary in the event that the log file does not yet exist. The second step runs the test, creating a log file. The third step weaves the final document, including the test output file. The ``-xt`` option excludes tangling, since output file had already been produced. Template Changes ---------------- The woven document is based -- primarily -- on the text in the source WEB file. This is processed using a small set of Jinja2 macros to modify behavior. To fine-tune the results, we can adjust the templates used by this application. The easiest way to do this is to work with the ``weave.py`` script which shows how to create a customized subclass of ``Weaver``. The `Handy Scripts and Other Files`_ section shows this script and how it's build from a few ``pyweb`` components. .. py-web-tool/src/language.w The **py-web-lp** Markup Language ========================================== The essence of literate programming is a markup language that includes both code from documentation. For tangling, the code is relevant. For weaving, both code and documentation are relevant. The source document is a "Web" documentation that includes the code. It's important to see the ``.w`` file as the final documentation. The code is tangled out of the source web. The **py-web-lp** tool parses the ``.w`` file, and performs the tangle and weave operations. It *tangles* each individual output file from the program source chunks. It *weaves* the final documentation file file from the entire sequence of chunks provided, mixing the author's original documentation with some markup around the embedded program source. Concepts --------- The ``.w`` file has two tiers of markup in it. - At the top, it has **py-web-lp** markup to distinguish documentation chunks from code chunks. - Within the documentation chunks, there can be markup for the target publication tool chain. This might be RST, LaTeX, HTML, or some other markup language. The **py-web-lp** markup decomposes the source document a sequence of *Chunks*. .. uml:: object web object chunk object documentation object "source code" as code web *-- chunk chunk *-- documentation chunk *-- code The Web chunks have the following two overall sets of features: - Program source code to be *tangled* and *woven*. There are two important varieties: the "defined" chunks that are named, and the "output" chunks that define a file to be written. Program code chunks can have references to other defined code chunks. This permits created output files that tangled into a compiler-friendly order, separate from the presentation. - Documentation to be *woven*. These are the blocks of text between commands. The bulk of the file is typically documentation chunks that describe the program in some publication-oriented markup language like RST, HTML, or LaTeX. **py-web-lp** markup surrounds the code with "commands." Everything else is documentation. The code chunks have two transformations applied. - When Tangling, the indentation is adjusted to match the context in which they were originally defined. This assures that Python (which relies on indentation) parses correctly. For other languages, proper indentation is expected but not required. - When Weaving, selected characters can be quoted so they don't break the publication tool. For HTML, ``&``, ``<``, ``>`` are quoted properly. For LaTeX, a few escapes are used to avoid problems with the ``fancyvrb`` environment. The non-code, documentation chunks are not transformed up in any way. Everything that's not explicitly a code chunk is output without modification. All of the **py-web-lp** tags begin with ``@``. This is sometimes called the command prefix. (This can be changed.) The tags were historically referred to as "commands." For Python decorators in particular, the symbol must be doubled, ``@@``, because all ``@`` symbols are commands, irrespective of context. The *Structural* tags (historically called "major commands") partition the input and define the various chunks. The *Inline* tags are (called "minor commands") are used to control the woven and tangled output from the defined chunks. There are *Content* tags which generate summary cross-reference content in woven files. Boilerplate ----------- There is some mandatory "boilerplate" required to make a working document. Requirements vary by markup language. LaTeX ~~~~~ The LaTeX templates use ``\\fancyvrb``. The following is required. :: \\usepackage{fancyvrb} Some minimal boilerplate document looks like this: .. parsed-literal:: \documentclass{article} \usepackage{fancyvrb} \title{ *Title* } \author{ *Author* } \begin{document} \maketitle \tableofcontents *Your Document Starts Here* \end{document} HTML ~~~~ There's often a fairly large amount of HTML boilerplate. Currently, the templates used do **not** provide any CSS classes. For more sophisticated HTML documents, it may be necessary to provide customized templates with CSS classes to make the document look good. Structural Tags --------------- There are two definitional tags; these define the various chunks in an input file. ``@o`` *file* ``@{`` *text* ``@}`` The ``@o`` (output) command defines a named output file chunk. The text is tangled to the named file with no alteration. It is woven into the document in an appropriate fixed-width font. There are options available to specify comment conventions for the tangled output; this allows inclusion of source line numbers. ``@d`` *name* ``@{`` *text* ``@}`` The ``@d`` (define) command defines a named chunk of program source. This text is tangled or woven when it is referenced by the *reference* inline tag. There are options available to specify the indentation for this particular chunk. In rare cases, it can be helpful to override the indentation context. Each ``@o`` and ``@d`` tag is followed by a chunk which is delimited by ``@{`` and ``@}`` tags. At the end of that chunk, there is an optional "major" tag. ``@|`` A chunk may define user identifiers. The list of defined identifiers is placed in the chunk, separated by the ``@|`` separator. Additionally, these tags provide for the inclusion of additional input files. This is necessary for decomposing a long document into easy-to-edit sections. ``@i`` *file* The ``@i`` (include) command includes another file. The previous chunk is ended. The file is processed completely, then a new chunk is started for the text after the ``@i`` command. All material that is not explicitly in a ``@o`` or ``@d`` named chunk is implicitly collected into a sequence of anonymous document source chunks. These anonymous chunks form the backbone of the document that is woven. The anonymous chunks are never tangled into output program source files. They are woven into the document without any alteration. Note that white space (line breaks (``'\n'``), tabs and spaces) have no effect on the input parsing. They are completely preserved on output. The following example has three chunks: .. parsed-literal:: Some RST-format documentation that describes the following piece of the program. @o myFile.py @{ import math print( math.pi ) @| math math.pi @} Some more RST documentation. This starts with an anonymous chunk of documentation. It includes a named output chunk which will write to ``myFile.py``. It ends with an anonymous chunk of documentation. Inline Tags --------------- There are several tags that are replaced by content in the woven output. ``@@`` The ``@@`` command creates a single ``@`` in the output file. This is replaced in tangled as well as woven output. ``@<``\ *name*\ ``@>`` The *name* references a named chunk. When tangling, the referenced chunk replaces the reference command. When weaving, a reference marker is used. For example, in RST, this can be replaced with RST ```reference`_`` markup. Note that the indentation prior to the ``@<`` tag is preserved for the tangled chunk that replaces the tag. ``@(``\ *Python expression*\ ``@)`` The *Python expression* is evaluated and the result is tangled or woven in place. A few global variables and modules are available. These are described in `Expression Context`_. Content Tags --------------- There are three index creation tags that are replaced by content in the woven output. ``@f`` The ``@f`` command inserts a file cross reference. This lists the name of each file created by an ``@o`` command, and all of the various chunks that are concatenated to create this file. ``@m`` The ``@m`` command inserts a named chunk ("macro") cross reference. This lists the name of each chunk created by a ``@d`` command, and all of the various chunks that are concatenated to create the complete chunk. ``@u`` The ``@u`` command inserts a user identifier cross reference. This index lists the name of each chunk created by an ``@d`` command or ``@|``, and all of the various chunks that are concatenated to create the complete chunk. Additional Features ------------------- **Sequence Numbers**. The named chunks (from both ``@o`` and ``@d`` commands) are assigned unique sequence numbers to simplify cross references. **Case Sensitive**. Chunk names and file names are case sensitive. **Abbreviations**. Chunk names can be abbreviated. A partial name can have a trailing ellipsis (...), this will be resolved to the full name. The most typical use for this is shown in the following example: .. parsed-literal:: Some RST-format documentation. @o myFile.py @{ @ print(math.pi,time.time()) @} Some notes on the packages used. @d imports... @{ import math,time @| math time @} Some more RST-format documentation. This example shows five chunks. 1. An anonymous chunk of documentation. 2. A named chunk that tangles the ``myFile.py`` output. It has a reference to the ``imports of the various packages used`` chunk. Note that the full name of the chunk is essentially a line of documentation, traditionally done as a comment line in a non-literate programming environment. 3. An anonymous chunk of documentation. 4. A named chunk with an abbreviated name. The ``imports...`` matches the name ``imports of the various packages used``. Set off after the ``@|`` separator is the list of user-specified identifiers defined in this chunk. 5. An anonymous chunk of documentation. Note that the first time a name appears (in a reference or definition), it **must** be the full name. All subsequent uses can be elisions. Also not that ambiguous elision is an annoying problem when you first start creating a document. **Concatenation**. Named chunks are concatenated from their various pieces. This allows a named chunk to be broken into several pieces, simplifying the description. This is most often used when producing fairly complex output files. .. parsed-literal:: An anonymous chunk with some RST documentation. @o myFile.py @{ import math, time @} Some notes on the packages used. @o myFile.py @{ print(math.pi, time.time()) @} Some more HTML documentation. This example shows five chunks. 1. An anonymous chunk of documentation. 2. A named chunk that tangles the ``myFile.py`` output. It has the first part of the file. In the woven document this is marked with ``"="``. 3. An anonymous chunk of documentation. 4. A named chunk that also tangles the ``myFile.py`` output. This chunk's content is appended to the first chunk. In the woven document this is marked with ``"+="``. 5. An anonymous chunk of documentation. **Newline Preservation**. Newline characters are preserved on input. Because of this the output may appear to have excessive newlines. In all of the above examples, each named chunk was defined with the following. .. parsed-literal:: @{ import math, time @} This puts a newline character before and after the import line. Controlling Indentation ----------------------- We have two choices in indentation: - Context-Sensitive. - Consistent. If we have context-sensitive indentation, then the indentation of a chunk reference is applied to the entire chunk when expanded in place of the reference. This makes it simpler to prepare source for languages (like Python) where indentation is important. There are cases, however, when this is not desirable. There are some places in Python where we want to create long, triple-quoted strings with indentation that does not follow the prevailing indentations of the surrounding code. Here's how the context-sensitive indentation works. .. parsed-literal:: @o myFile.py @{ def aFunction(a, b): @ @| aFunction @} @d body... @{ """doc string""" return a + b @} The tangled output from this will look like the following. All of the newline characters are preserved, and the reference to *body of the aFunction* is indented to match the prevailing indent where it was referenced. In the following example, explicit line markers of ``~`` are provided to make the blank lines more obvious. .. parsed-literal:: ~ ~def aFunction(a, b): ~ ~ """doc string""" ~ return a + b ~ [The ``@|`` command shows that this chunk defines the identifier ``aFunction``.] This leads to a difficult design choice. - Do we use context-sensitive indentation without any exceptions? This is the current implementation. - Do we use consistent indentation and require the author to get it right? This seems to make Python awkward, since we might indent our outdent a ``@<`` *name* ``@>`` command, expecting the chunk to indent properly. - Do we use context-sensitive indentation with an exception indicator? This seems to go against the utter simplicity we're cribbing from **noweb**. However, it makes a great deal of sense to add an option for ``@d`` chunks to supersede context-sensitive indentation. The author must then get it right. The syntax to define a section looks like this: .. parsed-literal:: @d -noindent some chunk name @{*First partial line* *More that uses """* @} We might reference such a section like this. .. parsed-literal:: @d some bigger chunk... @{*code* @ @} This will include the ``-noindent`` section by resetting the contextual indentation to zero. The *First partial line* line will be output after the four spaces provided by the ``some bigger chunk`` context. After the first newline (*More that uses """*) will be at the left margin. Tracking Source Line Numbers ---------------------------- Since the tangled output files are -- well -- tangled, it can be difficult to trace back from a Python error stack to the original line in the ``.w`` file that needs to be fixed. To facilitate this, there is a two-step operation to get more detailed information on how tangling worked. 1. Use the -n command-line option to get line numbers. 2. Include comment indicators on the ``@o`` commands that define output files. The expanded syntax for ``@o`` looks like this. .. parsed-literal:: @o -start /* -end \*/ page-layout.css @{ *Some CSS code* @} We've added two options: ``-start /*`` and ``-end */`` which define comment start and end syntax. This will lead to comments embedded in the tangled output which contain source line numbers for every (every!) chunk. Expression Context ------------------- There are two possible implementations for evaluation of a Python expression in the input. 1. Create an ``ExpressionCommand``, and append this to the current ``Chunk``. This will allow evaluation during weave processing and during tangle processing. This makes the entire weave (or tangle) context available to the expression, including completed cross reference information. 2. Evaluate the expression during input parsing, and append the resulting text as a ``TextCommand`` to the current ``Chunk``. This provides a common result available to both weave and parse, but the only context available is the ``WebReader`` and the incomplete ``Web``, built up to that point. In this implementation, we adopt the latter approach, and evaluate expressions immediately. A global context is created with the following variables defined. :os.path: This is the standard ``os.path`` module. :os.getcwd: The complete ``os`` module is not available. Just this function. :datetime: This is the standard ``datetime`` module. :time: The standard ``time`` module. :platform: This is the standard ``platform`` module. :__builtins__: Most of the built-ins are available, too. Not all. ``exec()``, ``eval()``, ``open()`` and ``__import__()`` aren't available. :theLocation: A tuple with the file name, first line number and last line number for the original expression's location. :theWebReader: The ``WebReader`` instance doing the parsing. :theFile: The ``.w`` file being processed. :thisApplication: The name of the running **py-web-lp** application. It may not be pyweb.py, if some other script is being used. :__version__: The version string in the **py-web-lp** application. .. py-web-tool/src/overview.w Architecture and Design Overview ================================ This application breaks the overall problem of literate programming into the following sub-problems. 1. Representation of the WEB document as Chunks and Commands 2. Reading and parsing the input WEB document. 3. Weaving a document file. 4. Tangling the desired program source files. Here's the overall Context Diagram for this. .. uml:: left to right direction skinparam actorStyle awesome actor "Developer" as Dev rectangle PyWeb { usecase "Tangle Source" as UC_Tangle usecase "Weave Document" as UC_Weave } rectangle IDE { usecase "Create WEB" as UC_Create usecase "Run Tests" as UC_Test usecase "Build Documentation" as UC_Doc usecase "Build Application" as UC_App } database WEB component App folder Documentation Dev --> UC_Create Dev --> UC_Test Dev --> UC_Doc Dev --> UC_App UC_Create --> WEB WEB --> UC_Tangle WEB --> UC_Weave UC_Tangle --> App UC_Weave --> Documentation UC_Test ..> UC_Tangle UC_Doc ..> UC_Weave UC_App ..> UC_Tangle The idea here is a central WEB document contains both the application source code and the documentation that describes the code. The documentation can present information in an order that's meaningful and helpful to people; the tangling operation orders this for the benefit of compilers and tools. Since this is often part of an Integrated Development Environment (IDE), the container for all of these software components is the developer's desktop. (We don't need a diagram for that.) Here's a summary of the application-level components. These are the most visible libraries and command-line applications. .. uml:: component pyweb package jinja pyweb ..> jinja package templates pyweb *-- templates jinja ..> templates component weave weave ..> pyweb component tangle tangle ..> pyweb The ``weave`` and ``tangle`` are convenient scripts that import and customize the underlying ``pyweb`` application. We've used the dotted "depends-on" arrow to depict this. The ``pyweb`` application depends on Jinja2 to define the various templates for weaving the output documents. The ``pyweb`` application contains the templates; this is shown with a solid line. We can modify the templates to alter the look and feel. The supplied ``weave.py`` script shows how to do this. In many cases, the final production will multiple steps, as shown below: .. uml:: database WEB component pyweb artifact ".rst File" as RST component sphinx artifact ".html File" as HTML WEB --> pyweb pyweb --> RST RST --> sphinx sphinx --> HTML We can use **pyweb-lp** to create an ``.rst`` file with the documentation. This is then processed by Sphinx to inject a Sphinx theme and necessary CSS to make responsive web document(s). This is often automated with a ``Makefile``. Overall Structure ----------------- Generally, the code breaks into three functional areas - The core representation of a WEB. - A parser to read the source WEB. - The emitters to produce woven and tangled output, which include weavers and tanglers. We could depict it as follows: .. uml:: folder core { class Web class Chunk abstract class Command Web *-- "1..*" Chunk Chunk *-- "1..*" Command } folder parser { class WebReader WebReader --> Web } folder emitters { abstract class Emitter class Tangler class Weaver Emitter <|-- Tangler Emitter <|-- Weaver Emitter --> Web } We'll look at the core model, first. Core WEB Representation ----------------------- The basic structure has three layers, as shown in the following diagram: .. uml:: class Web << dataclass >> { chunks: list[Chunk] } class Chunk { name: str commands: list[Command] } abstract class Command Web *-- "1..*" Chunk Chunk *-- "1..*" Command class CodeChunk Chunk <|-- CodeChunk class NamedChunk Chunk <|-- NamedChunk class OutputChunk Chunk <|-- OutputChunk class NamedCodeChunk Chunk <|-- NamedCodeChunk class TextCommand Command <|-- TextCommand class CodeCommand Command <|-- CodeCommand class ReferenceCommand Command <|-- ReferenceCommand class XRefCommand Command <|-- XRefCommand class FileXRefCommand XRefCommand <|-- FileXRefCommand class MacroXRefCommand XRefCommand <|-- MacroXRefCommand class UseridXRefCommand XRefCommand <|-- UseridXRefCommand The source document is transformed into a ``Web``, which is the overall container. The source is decomposed into a sequence of ``Chunk`` instances. Each ``Chunk`` is a sequence of ``Commands``. ``Chunk`` objects and ``Command`` objects cannot be nested, leading to delightful simplification. The overall ``Web`` includes both the original sequence of ``Chunk`` objects as well as an index for the named ``Chunk`` instances. Note that a named chunk may be created through a number of ``@d`` commands. This means that each named ``Chunk`` may be a sequence of definitions sharing a common name. They are concatenated in order to permit decomposing a single concept into sequentially described pieces. The various layers of ``Web``, ``Chunk``, and ``Command`` each have attributes designed to be usable by a Jinja template when weaving output. When tangling, however, the only attribute that matters is the text contained in the ``@{`` and ``@}`` brackets. This makes tangling somewhat simpler than weaving. There is a small interaction between a ``Tangler`` and each ``Chunk`` to work out the indentation. based in the context in which a ``@< name @>`` reference occurs. Reading and Parsing -------------------- .. uml:: class Web class WebReader { parse(source) : Web } WebReader ..> Web class Tokenizer WebReader ..> Tokenizer class OptionParser class OptionDef OptionParser *-- OptionDef WebReader ..> OptionParser A solution to the reading and parsing problem depends on a convenient tool for breaking up the input stream and a representation for the chunks of input and the sequence of commands. Input decomposition is done with something we might call the **Splitter** design pattern. The **Splitter** pattern is widely used in text processing, and has a long legacy in a variety of languages and libraries. A **Splitter** decomposes a string into a sequence of strings using some split pattern. There are many variant implementations. For example, one variant locates only a single occurence (usually the left-most); this is commonly implemented as a Find or Search string function. Another variant locates all occurrences of a specific string or character, and discards the matching string or character. The variation on **Splitter** in this application creates each element in the resulting sequence as either (1) an instance of the split regular expression or (2) the text between split patterns. We define our splitting pattern with the regular expression ``'@.|\n'``. This will split on either of these patterns: - ``@`` followed by a single character, - or, a newline. For the most part, ``\n`` is only text, and as almost no special significance. The exception is the ``@i`` *filename* command, which ends at the end of the line, making the ``\n`` significant syntax in this case. We could be more specific with the following as a split pattern: ``'@[doOifmu\|<>(){}\[\]]|\n'``. This would silently ignore unknown commands, merging them in with the surrounding text. This would leave the ``'@@'`` sequences completely alone, allowing us to replace ``'@@'`` with ``'@'`` in every text chunk. It's not clear this additional level of detail is helpful. Within the ``@d`` and ``@o`` commands, there is a name and options. These follow the syntax rules for Tcl or the shell. Optional fields are prefaced with ``-``. All options must come before all positional arguments. The positional arguments provide the name being defined. In effect, the name is ``' '.join(args.split(' ')``; this means multiple adjacent spaces in a name will be collapsed to a single space. Emitters -------- There are two possible outputs: - A woven document. - One or more tangled source files. The overall structure of the classes is shown in the following diagram. .. uml:: class Web abstract class Emitter { emit(web) } Emitter ..> Web class Weaver Emitter <|-- Weaver class Tangler Emitter <|-- Tangler class TanglerMake Tangler <|-- TanglerMake abstract class ReferenceStyle Weaver --> ReferenceStyle class Simple ReferenceStyle <|-- Simple class Transitive ReferenceStyle <|-- Transitive class Template Weaver --> Template class "Jinja Macro" as macro Template *-- macro We'll look at the weaving activity first, then the tangling activity. Weaving --------- The weaving activity depends on having a target document markup language. There are several approaches to this problem. - We can use a markup language unique to **py-web-lp**. This would hide the final target markup language. It would mean that **py-web-lp** would be equivalent to a tool like Pandoc, producing a variety of target markup languages from a single, common source. - We can use any of the existing markup languages (HTML, RST, Markdown, LaTeX, etc.) expand snippets of markup into author-supplied markup to create the target woven document. The problem with the first method is defining yet-another-markup-language. This seems needlessly complex. The problem with the second method is the source WEB file is a mixture of the following two things: - The background document in some standard markup and - The code elements. The code elements must be set off from the background text via some markup. In languages like RST and Markdown, there's a small textual wrapper around code samples. In languages like HTML, the wrapper can be much more complex. Also, certain code characters may need to be properly escaped if the code sample happens to contain markup that should **not** be processed, but treated as literal text. The author should not be foreced to repeat the wrappers around each code examples. This should be delegated to the literate programming tool. Further, the author should not be narrowly constrained by the markup injected by the weaving process; the weaver should be extensible to add features. This leads to using the **Facade** design pattern. The weaver is a **Facade** over the Jinja template engine. The tool provides default templates in RST, HTML, and LaTeX. These can be replaced; new templates can be added. The templates used to wrap code sections can be tweaked relatively easily. Tangling ---------- The tangling activity produces output files. In other tools, some care was taken to understand the source code context for tangling, and provide a correct indentation. This required a command-line parameter to turn off indentation for languages like Fortran, where identation is not used. In **py-web-lp**, there are two options. The default behavior is that the indent of a ``@< name @>`` command is used to set the indent of the material is expanded in place of this reference. If all ``@<`` commands are presented at the left margin, no indentation will be done. This is helpful simplification, particularly for users of Python, where indentation is significant. In rare cases, we might need both, and a ``@d`` chunk can override the indentation rule to force the material to be placed at the left margin. Application ------------ The overall application has the following layers to it: - An ``Action`` class hierarchy that includes the actions of Load, Tangle, and Weave. - An overall ``Application`` class that executes the actions. - A top-level main function parses the command line, creates and configures the actions, and executes the sequence of actions. The idea is that the Weaver Action should be visible to tools like `PyInvoke `_. We want ``Weave("someFile.w")`` to be a sensible task. .. uml:: abstract class Action class ActionSequence Action <|-- ActionSequence ActionSequence *-- "2..m" Action class LoadAction Action <|-- LoadAction class WeaveAction Action <|-- WeaveAction class TangleAction Action <|-- TangleAction class Application Application *-- Action This shows the essential structure of the top-level classes. .. py-web-tool/src/impl.w Implementation ============== The implementation is contained in a single Python module defining the all of the classes and functions, as well as an overall ``main()`` function. The ``main()`` function uses these base classes to weave and tangle the output files. The broad outline of the presentation is as follows: - `Base Classes`_ that define a model for the ``.w`` file. - `Web Class`_ contains the overall Web of Chunks. A Web is a sequence of `Chunk` objects. It's also a mapping from chunk name to definition. - `Chunk Class Hierarchy`_ are pieces of the source document, built into a Web. A ``Chunk`` is a collection of ``Command`` instances. This can be either an anonymous chunk that will be sent directly to the output, or a named chunks delimited by the structural ``@d`` or ``@o`` commands. - `Command Class Hierarchy`_ are the items within a ``Chunk``. The text and the inline ``@`` references are the principle command classes. Additionally, there are some cross reference commands (``@f``, ``@m``, or ``@u``). - `Output Serialization`_. This is the ``Emitter`` class hierarchy writes various kinds of files. These decompose into two subclasses: - A ``Tangler`` creates source code. - A ``Weaver`` creates documentation. The various Jinja-based templates are part of weaving. - `Input Parsing`_ covers deserialization from the source ``.w`` file to the base model of ``Web``, ``Chunk``, and ``Command``. - `The WebReader class`_ which parses the Web structure. - `The Tokenizer class`_ which tokenizes the raw input. - Other application components: - `Error Class`_ defines an application-specific exception. This covers all of the various kinds of problems that might arise. - `Action class hierarchy`_ defines things this program does. - `The Application class`_. This is an overall class definition that includes command line parsing, picking an Action, configuring and executing the Action. It could be a set of related functions, but we've bound them into a class. - `Logging setup`_. This includes a simple context manager for logging. - `The Main Function`_. - `pyWeb Module File`_ defines the final module file that contains the application. We'll start with the base classes that define the data model for the source WEB of chunks. Base Classes ------------- Here are some of the base classes that define the structure and meaning of a ``.w`` source file. .. _`Base Class Definitions (1)`: .. rubric:: Base Class Definitions (1) = .. parsed-literal:: :class: code → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_ → `Chunk class hierarchy -- used to describe individual chunks (8)`_ → `Web class -- describes the overall "web" of chunks (3)`_ .. .. container:: small ∎ *Base Class Definitions (1)*. Used by → `pyweb.py (79)`_. The above order is reasonably helpful for Python and minimizes forward references. The ``Chunk``, ``Command``, and ``Web`` instances do have a circular relationship, making a strict ordering a bit complex. We'll start at the central collection of information, the ``Web`` class of objects. Web Class ~~~~~~~~~ The overall web of chunks is contained in a single instance of the ``Web`` class that is the principle parameter for the weaving and tangling actions. Broadly, the functionality of a Web can be separated into the folloowing areas: - It is constructed by a ``WebReader``. - It also supports "enrichment" of the web, once all the ``Chunk`` instances are known. This is a stateful update to the web. Each ``Chunk`` is updated with references it makes as well as references to it. - It supports ``Chunk`` cross-reference methods that traverse this enriched data. This includes a kind of validity check to be sure that everything is used once and once only. Fundamentally, a ``Web`` is a hybrid list+mapping. It as the following features: - It's a ``Sequence`` to retain all ``Chunk`` instances in order. - It's a mapping of name-to-Chunk that also offers a moderately sophisticated lookup, including exact match for a ``Chunk`` name and an approximate match for a an abbreviated name. The ``Web`` is built by the parser by loading the sequence of ``Chunk`` instances. Note that the WEB source language has a "mixed content model". This means the code chunks have specific tags with names. The text, on the other hand, is interspersed among the code chunks. The text belongs to implicit, unnamed text chunks. A web instance has a number of attributes. :chunks: the sequence of ``Chunk`` instances as seen in the input file. To support anonymous chunks, and to assure that the original input document order is preserved, we keep all chunks in a master sequential list. :files: the ``@o`` named ``OutputChunk`` chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with "="), all others a second definitions (marked with "+="). :macros: the ``@d`` named ``NamedChunk`` chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with "="), all others a second definitions (marked with "+="). :userids: the cross reference of chunks referenced by commands in other chunks. This relies on the way a ``@dataclass`` does post-init processing. One the raw sequence of ``Chunks`` has been presented, some additional processing is done to link each ``Chunk`` to the web. This permits the ``full_name`` property to expand abbreviated names to full names, and, consequently, chunk references. .. _`Imports (2)`: .. rubric:: Imports (2) = .. parsed-literal:: :class: code from collections import defaultdict from collections.abc import Iterator from dataclasses import dataclass, field from functools import cache import logging from pathlib import Path from types import SimpleNamespace from typing import Any, Optional, Literal, ClassVar, Union from weakref import ref, ReferenceType .. .. container:: small ∎ *Imports (2)*. Used by → `pyweb.py (79)`_. The class defines one visible element of a ``Web`` instance, the ``chunks`` list of ``Chunk`` instances. From this list of ``Chunk`` objects, the remaining internal objects are built. These include the following: - ``chunk_map`` has the mapping of chunk names to list of chunks that provide the definition for the chunk. - ``userid_map`` has the mapping of user-defined names to the list of chunks that define the name. - ``references`` is the set of all referenced chunks. Additionally there are attributes to contain a logger, a reference to the WEB file path, used to evaluate expressions, and a "strict-match" option that can report errors during name resolution. Disabling this will allow documents to be tangled that are potentially incomplete. Generally, a parser will create a list of ``Chunk`` objects. From this, the parser can creates the final ``Web``. .. _`Web class -- describes the overall "web" of chunks (3)`: .. rubric:: Web class -- describes the overall "web" of chunks (3) = .. parsed-literal:: :class: code @dataclass class Web: chunks: list["Chunk"] #: The source sequence of chunks. # The \`\`@d\`\` chunk names and locations where they're defined. chunk\_map: dict[str, list["Chunk"]] = field(init=False) # The \`\`@\|\`\` defined names and chunks with which they're associated. userid\_map: defaultdict[str, list["Chunk"]] = field(init=False) logger: logging.Logger = field(init=False, default=logging.getLogger("Web")) web\_path: Path = field(init=False) #: Source WEB file; set by \`\`\`WebParse\`\` strict\_match: ClassVar[bool] = True #: Report ... names without a definition. .. .. container:: small ∎ *Web class -- describes the overall "web" of chunks (3)*. Used by → `Base Class Definitions (1)`_. The ``__post_init__()`` special method populates the detailed structure of the WEB document. There are several passes through the WEB to digest the data: 1. Set all ``Chunk`` and ``Command`` back references to the ``Web`` container. This is required so a ``Chunk`` with a ``ReferenceCommand`` instance can properly refer to a chunk elsewhere in the ``Web`` container. There are all weak references to faciliate garbgage collection. 2. Locate the unabbreviated names in chunks and references to chunks. Names can found in two places. The ``@d`` command provides a name. A ``@`` command can also provide a reference to a name. The unabbreviated names define the structure. Unambiguous abbreviations can be used freely, since full names are located first. 3. Accumulate chunk lists, output lists, and name definition lists. This pass does two things. First any user-defined name after a ``@|`` command is accumulated. Second, any abbreviated name is resolved to the full name, and the complete mapping from chunk name to a sequence of defining chunks is completed. 4. Set the ``referencedBy`` attribute of a ``Chunk`` instance with all of the commands that point to it. The idea here is that a top-level ``Chunk`` instance may have references to other ``Chunk`` isntances. This forms a kind of tree. Any given low-level ``Chunk`` object is named by a sequence of parent ``Chunk`` objects. Once the initialization is complete, the ``Web`` instance can be woven or tangled. .. _`Web class -- describes the overall "web" of chunks (4)`: .. rubric:: Web class -- describes the overall "web" of chunks (4) += .. parsed-literal:: :class: code def \_\_post\_init\_\_(self) -> None: """ Populate weak references throughout the web to make full\_name properties work. Then. Locate all macro definitions and userid references. """ # Pass 1 -- set all Chunk and Command back references. for c in self.chunks: c.web = ref(self) for cmd in c.commands: cmd.web = ref(self) # Named Chunks = Union of macro\_iter and file\_iter named\_chunks = list(filter(lambda c: c.name is not None, self.chunks)) # Pass 2 -- locate the unabbreviated names in chunks and references to chunks. self.chunk\_map = {} for seq, c in enumerate(named\_chunks, start=1): c.seq = seq if not c.path: # Use \`\`@d name\`\` chunks (reject \`\`@o\`\` and text) if c.name and not c.name.endswith('...'): self.logger.debug(f"\_\_post\_init\_\_ 2a {c.name=!r}") self.chunk\_map.setdefault(c.name, []) for cmd in c.commands: # Find \`\`@< name @>\`\` in \`\`@d name\`\` chunks or \`\`@o\`\` chunks if cmd.has\_name: if not cast(ReferenceCommand, cmd).name.endswith('...'): self.logger.debug(f"\_\_post\_init\_\_ 2b {cast(ReferenceCommand, cmd).name=!r}") self.chunk\_map.setdefault(cast(ReferenceCommand, cmd).name, []) # Pass 3 -- accumulate chunk lists, output lists, and name definition lists. self.userid\_map = defaultdict(list) for c in named\_chunks: for name in c.def\_names: self.userid\_map[name].append(c) if not c.path: # Named \`\`@d name\`\` chunks if full\_name := c.full\_name: c.initial = len(self.chunk\_map[full\_name]) == 0 self.chunk\_map[full\_name].append(c) self.logger.debug(f"\_\_post\_init\_\_ 3 {c.name=!r} -> {c.full\_name=!r}") else: # Output \`\`@o\`\` and anonymous chunks. # Assume all @o chunks are unique. If they're not, they overwrite each other. # Also, there's not \`\`full\_name\`\` for these chunks. c.initial = True # TODO: Accumulate all chunks that contribute to a named file... # Pass 4 -- set referencedBy a command in a chunk. # ONLY set this in references embedded in named chunk or output chunk. # In a generic Chunk (which is text) there's no anchor to refer to. # NOTE: Assume single references \*only\* # We should raise an exception when updating a non-None referencedBy value. # Or incrementing ref\_chunk.references > 1. for c in named\_chunks: for cmd in c.commands: if cmd.has\_name: ref\_to\_list = self.resolve\_chunk(cast(ReferenceCommand, cmd).name) for ref\_chunk in ref\_to\_list: ref\_chunk.referencedBy = c ref\_chunk.references += 1 .. .. container:: small ∎ *Web class -- describes the overall "web" of chunks (4)*. Used by → `Base Class Definitions (1)`_. The representation of a ``Web`` instance is a sequence of ``Chunk`` instances. This can be long and difficult to read. It is, however, complete, and can be used to build instances of ``Web`` objects from a variety of sources. .. _`Web class -- describes the overall "web" of chunks (5)`: .. rubric:: Web class -- describes the overall "web" of chunks (5) += .. parsed-literal:: :class: code def \_\_repr\_\_(self) -> str: NL = ",\\n" return ( f"{self.\_\_class\_\_.\_\_name\_\_}(" f"{NL.join(repr(c) for c in self.chunks)}" f")" ) .. .. container:: small ∎ *Web class -- describes the overall "web" of chunks (5)*. Used by → `Base Class Definitions (1)`_. Name and Chunk resolution are similar. Name resolution provides only the expanded name. Chunk resolution provides the list of chunks that define a name. Chunk resolution expands on the basic features of Name resolution. The complex ``target.endswith('...')`` processing only happens once during ``__post_init__()`` processing. After the initalization is complete, all ``ReferenceCommand`` objects will have a ``full_name`` attribute that avoids the complication of resolving a name with a ``...`` ellipsis. .. _`Web class -- describes the overall "web" of chunks (6)`: .. rubric:: Web class -- describes the overall "web" of chunks (6) += .. parsed-literal:: :class: code def resolve\_name(self, target: str) -> str: """Map short names to full names, if possible.""" if target in self.chunk\_map: # self.logger.debug(f"resolve\_name {target=} in self.chunk\_map") return target elif target.endswith('...'): # The ... is equivalent to regular expression .\* matches = list( c\_name for c\_name in self.chunk\_map if c\_name.startswith(target[:-3]) ) match : str # self.logger.debug(f"resolve\_name {target=} {matches=} in self.chunk\_map") match matches: case []: if self.strict\_match: raise Error(f"No full name for {target!r}") else: self.logger.warning(f"resolve\_name {target=} unknown") self.chunk\_map[target] = [] match = target case [head]: match = head case [head, \*tail]: message = f"Ambiguous abbreviation {target!r}, matches {[head] + tail!r}" raise Error(message) return match else: self.logger.warning(f"resolve\_name {target=} unknown") self.chunk\_map[target] = [] return target def resolve\_chunk(self, target: str) -> list["Chunk"]: """Map name (short or full) to the defining sequence of chunks.""" full\_name = self.resolve\_name(target) chunk\_list = self.chunk\_map[full\_name] self.logger.debug(f"resolve\_chunk {target=!r} -> {full\_name=!r} -> {chunk\_list=}") return chunk\_list .. .. container:: small ∎ *Web class -- describes the overall "web" of chunks (6)*. Used by → `Base Class Definitions (1)`_. The point of the ``Web`` object is to be able to manage a variety of structures. These iterator methods and properties provide the list of ``@o`` chunks, ``@d`` chunks, and the usernames after ``@|`` in a chunk. Additionally, we can confirm the overall structure by asserting that each ``@d`` name has one reference. A name with no references indicates an omission, a name with multiple references suggests a spelling or ellipsis problem. .. _`Web class -- describes the overall "web" of chunks (7)`: .. rubric:: Web class -- describes the overall "web" of chunks (7) += .. parsed-literal:: :class: code def file\_iter(self) -> Iterator[OutputChunk]: return (cast(OutputChunk, c) for c in self.chunks if c.type\_is("OutputChunk")) def macro\_iter(self) -> Iterator[NamedChunk]: return (cast(NamedChunk, c) for c in self.chunks if c.type\_is("NamedChunk")) def userid\_iter(self) -> Iterator[SimpleNamespace]: yield from (SimpleNamespace(def\_name=n, chunk=c) for c in self.file\_iter() for n in c.def\_names) yield from (SimpleNamespace(def\_name=n, chunk=c) for c in self.macro\_iter() for n in c.def\_names) @property def files(self) -> list["OutputChunk"]: return list(self.file\_iter()) @property def macros(self) -> list[SimpleNamespace]: """ The chunk\_map has the list of Chunks that comprise a macro definition. We separate those to make it slightly easier to format the first definition. """ first\_list = ( (self.chunk\_map[name][0], self.chunk\_map[name]) for name in sorted(self.chunk\_map) if self.chunk\_map[name] ) macro\_list = list( SimpleNamespace(name=first\_def.name, full\_name=first\_def.full\_name, seq=first\_def.seq, def\_list=def\_list) for first\_def, def\_list in first\_list ) # self.logger.debug(f"macros: {defs}") return macro\_list @property def userids(self) -> list[SimpleNamespace]: userid\_list = list( SimpleNamespace(userid=userid, ref\_list=self.userid\_map[userid]) for userid in sorted(self.userid\_map) ) # self.logger.debug(f"userids: {userid\_list}") return userid\_list def no\_reference(self) -> list[Chunk]: return list(filter(lambda c: c.name and not c.path and c.references == 0, self.chunks)) def multi\_reference(self) -> list[Chunk]: return list(filter(lambda c: c.name and not c.path and c.references > 1, self.chunks)) .. .. container:: small ∎ *Web class -- describes the overall "web" of chunks (7)*. Used by → `Base Class Definitions (1)`_. A ``Web`` instance is built by a ``WebReader``. It's used by an ``Emitter``, including a ``Weaver`` as well as a ``Tangler``. A ``Web`` is composed of individual ``Chunk`` instances. Chunk Class Hierarchy ~~~~~~~~~~~~~~~~~~~~~ A ``Chunk`` is a piece of the input file. It is a collection of ``Command`` instances. A ``Chunk`` can be woven or tangled to create output. .. uml:: class Chunk { name: str seq: int commands: list[Command] options: list[str] def_names: list[str] initial: bool } class OutputChunk Chunk <|-- OutputChunk class NamedChunk Chunk <|-- NamedChunk These subclasss reflect three kinds of content in the WEB source document: - ``Chunk`` is the anonymous text context. Text in the body generally becomes a ``TextCommand``. Also, the various XREF commands (``@m``, ``@f``, ``@u``) can *only* appear here. In principle, a ``@< reference @>`` can appear in text. It must name a ``@d name @[...@]`` NamedDocumentChunk, which is expanded in place, not linked. - ``OutputChunk`` is the ``@o`` context. Text in the body becomes a ``CodeCommand``. Any ``@< reference @>`` will be expanded when tangling, but become a link when weaving. This defines an output file. - ``NamedChunk`` is the ``@d`` context. Text in the body becomes a ``CodeCommand``. Any ``@< reference @>`` will be expanded when tangling, but become a link when weaving. Most of the attributes are pushed up to the superclass. This makes type checking the complex WEB tree much simpler. The attributes are visible to the Jinja templates. In particular the sequence number, ``seq``, and the initial definition indicator, ``initial``, are often used to customize presentation of the woven content. A ``type_is()`` method is used to discern the various subtypes. This slightly simplifies the work done by a template. It's not easy to rely on proper inheritance because the templates are implemented in a separate language with their own processing rules. .. _`Chunk class hierarchy -- used to describe individual chunks (8)`: .. rubric:: Chunk class hierarchy -- used to describe individual chunks (8) = .. parsed-literal:: :class: code @dataclass class Chunk: """Superclass for OutputChunk, NamedChunk, NamedDocumentChunk. """ #: Short name of the chunk. name: str \| None = None #: Unique sequence number of chunk in the WEB. seq: int \| None = None #: Sequence of commands inside this chunk. commands: list["Command"] = field(default\_factory=list) #: Parsed options for @d and @o chunks. options: list[str] = field(default\_factory=list) #: Names defined after \`\`@\|\`\` in this chunk. def\_names: list[str] = field(default\_factory=list) #: Is this the first use of a given Chunk name? initial: bool = False #: If injecting location details whenm tangling, this is the comment prefix. comment\_start: str \| None = None #: If injecting location details, this is the comment suffix. comment\_end: str \| None = None #: Count of references to this Chunk. references: int = field(init=False, default=0) #: The immediate reference to this chunk. referencedBy: Optional["Chunk"] = field(init=False, default=None) #: Weak reference to the \`\`Web\`\` containing this \`\`Chunk\`\`. web: ReferenceType["Web"] = field(init=False, repr=False) #: Logger for any chunk-specific messages. logger: logging.Logger = field(init=False, default=logging.getLogger("Chunk")) @property def full\_name(self) -> str \| None: if self.name: return cast(Web, self.web()).resolve\_name(self.name) else: return None @property def path(self) -> Path \| None: return None @property def location(self) -> tuple[str, int]: return self.commands[0].location @property def transitive\_referencedBy(self) -> list["Chunk"]: if self.referencedBy: return [self.referencedBy] + self.referencedBy.transitive\_referencedBy else: return [] def add\_text(self, text: str, location: tuple[str, int]) -> "Chunk": if self.commands and self.commands[-1].typeid.TextCommand: cast(HasText, self.commands[-1]).text += text else: # Empty list OR previous command was not \`\`TextCommand\`\` self.commands.append(TextCommand(text, location)) return self def type\_is(self, name: str) -> bool: """ Instead of type name matching, we could check for these features: - has\_code() (i.e., NamedChunk and OutputChunk) - has\_text() (i.e., Chunk and NamedDocumentChunk) This is for template rendering, where proper Liskov Substitution is irrelevant. """ return self.\_\_class\_\_.\_\_name\_\_ == name .. .. container:: small ∎ *Chunk class hierarchy -- used to describe individual chunks (8)*. Used by → `Base Class Definitions (1)`_. The subclasses do little more than partition thd Chunks in a way that permits customization in the template rendering process. An ``OutputChunk`` is distinguished from a ``NamedChunk`` by having a ``path`` property and not having a ``full_name`` property. .. _`Chunk class hierarchy -- used to describe individual chunks (9)`: .. rubric:: Chunk class hierarchy -- used to describe individual chunks (9) += .. parsed-literal:: :class: code class OutputChunk(Chunk): """An output file.""" @property def path(self) -> Path \| None: if self.name: return Path(self.name) else: return None @property def full\_name(self) -> str \| None: return None def add\_text(self, text: str, location: tuple[str, int]) -> Chunk: if self.commands and self.commands[-1].typeid.CodeCommand: cast(HasText, self.commands[-1]).text += text else: # Empty list OR previous command was not \`\`CodeCommand\`\` self.commands.append(CodeCommand(text, location)) return self class NamedChunk(Chunk): """A defined name with code.""" def add\_text(self, text: str, location: tuple[str, int]) -> Chunk: if self.commands and self.commands[-1].typeid.CodeCommand: cast(HasText, self.commands[-1]).text += text else: # Empty list OR previous command was not \`\`CodeCommand\`\` self.commands.append(CodeCommand(text, location)) return self class NamedChunk\_Noindent(Chunk): """A defined name with code and the -noIndent option.""" pass class NamedDocumentChunk(Chunk): """A defined name with text.""" pass .. .. container:: small ∎ *Chunk class hierarchy -- used to describe individual chunks (9)*. Used by → `Base Class Definitions (1)`_. Command Class Hierarchy ~~~~~~~~~~~~~~~~~~~~~~~ A ``Chunk`` is a sequence of ``Command`` instances. For the generic ``Chunk`` superclass, the commands are -- mostly -- the ``TextCommand`` subclass of ``Command``. These are blocks of text. A ``Chunk`` may also include some ``XRefCommand`` instances which expand to cross-reference material for an index. For the ``CodeChunk`` and ``NamedChunk`` subclasses, the commands are ``CodeCommand`` instances intermixed with ``ReferenceCommand`` instances. A ``CodeCommand`` has a wrapper when weaving. Additionally, it will tangled into the output. A ``ReferenceCommand`` becomes a link when weaving, and expands to it's full body when being tangled. .. uml:: class Chunk { name: str commands: list[Command] } abstract class Command { {static} has_name: bool {static} has_text: bool {static} typeid: TypeId text: str tangle(Tangler, Target) } Chunk *-- "1..*" Command abstract HasText Command <|-- HasText class TextCommand HasText <|-- TextCommand class CodeCommand HasText <|-- CodeCommand class ReferenceCommand Command <|-- ReferenceCommand abstract XRefCommand Command <|-- XRefCommand class FileXRefCommand XRefCommand <|-- FileXRefCommand class MacroXRefCommand XRefCommand <|-- MacroXRefCommand class UseridXRefCommand XRefCommand <|-- UseridXRefCommand class TypeId { __getattr__(str) : bool } Command -- TypeId Each of these variants has the possibility of distinct processing when weaving the final document. The type information must be visibile to the Jinja template processing. This is done through an instance of the ``TypeId`` class attached to each of these classes. The input stream is broken into individual commands, based on the various ``@``\ *x* strings in the file. There are several subclasses of ``Command``, each used to describe a different command or block of text in the input. All instances of the ``Command`` class are created by the ``WebReader`` instance. In this case, a ``WebReader`` can be thought of as a factory for ``Command`` instances. Each ``Command`` instance is appended to the sequence of commands that belong to a ``Chunk``. This model permits two kinds of serialization: - Weaving a document from the WEB source file. This uses the various attributes of the various subclasses. - Tangling target documents with code. This relies on a ``tangle()`` method in each subclass. We'll address the run-time type identification first, the the definitions of the various ``Command`` subclasses. .. _`Command class hierarchy -- used to describe individual commands in a chunk (10)`: .. rubric:: Command class hierarchy -- used to describe individual commands in a chunk (10) = .. parsed-literal:: :class: code → `The TypeId Class -- to help the template engine (12)`_ → `The Command Abstract Base Class (13)`_ → `The HasText Type Hint -- used instead of another abstract class (14)`_ → `The TextCommand Class (15)`_ → `The CodeCommand Class (16)`_ → `The ReferenceCommand Class (17)`_ → `The XrefCommand Subclasses -- files, macros, and user names (18)`_ .. .. container:: small ∎ *Command class hierarchy -- used to describe individual commands in a chunk (10)*. Used by → `Base Class Definitions (1)`_. The TypeId Class **************** The ``TypeId`` class provides run-time type identification to the Jinja templates. The idea is ``object.typeid.AClass`` is equivalent to ``isinstance(object, pyweb.AClass)``. It has simpler syntax and works better with Jinja templates. It helps sort out the various nodes of the AST built from the source WEB document. There are three parts to the ``TypeId`` implementation: - A ``TypeId`` class definition to handle the attribute access. A reference to ``object.typeid.Name`` evaluates ``__getattr__(object, 'Name')``. - A metaclass definition, ``TypeIdMeta``, to inject the new ``typeid`` attribute into each class. - The normal class initialization process, which evaluates ``__set_name__()`` for each attribute of a class that defines the method. This provides the containing class to the ``TypeId`` instance. The idea of run-time type identification is -- in a way -- a failure to properly define the classes to follow the Liskov Substitution design principle. A better design would check for specific features of a subclass of ``Command``. This becomes awkwardly complex in the Jinja templates, because the templates exist outside the class hierarchy. We rely on the ``typeid`` to map classes to macros appropriate to the class. .. _`Imports (11)`: .. rubric:: Imports (11) += .. parsed-literal:: :class: code from typing import TypeGuard, TypeVar, Generic .. .. container:: small ∎ *Imports (11)*. Used by → `pyweb.py (79)`_. .. _`The TypeId Class -- to help the template engine (12)`: .. rubric:: The TypeId Class -- to help the template engine (12) = .. parsed-literal:: :class: code \_T = TypeVar("\_T") class TypeId: """ This makes a given class name into an attribute with a True value. Any other attribute reference will return False. >>> class A: ... typeid = TypeId() >>> a = A() >>> a.typeid.A True >>> a.typeid.B False """ def \_\_set\_name\_\_(self, owner: type[\_T], name: str) -> "TypeId": self.my\_class = owner return self def \_\_getattr\_\_(self, item: str) -> TypeGuard[\_T]: return self.my\_class.\_\_name\_\_ == item from collections.abc import Mapping class TypeIdMeta(type): """Inject the \`\`typeid\`\` attribute into a class definition.""" @classmethod def \_\_prepare\_\_(metacls, name: str, bases: tuple[type, ...], \*\*kwds: Any) -> Mapping[str, object]: # type: ignore[override] return {"typeid": TypeId()} .. .. container:: small ∎ *The TypeId Class -- to help the template engine (12)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. The ``TypeIdMeta`` metaclass sets the ``typeid`` attribute of each class defined by this metaclass. The ordinary class preparation will invoke the ``__set_name__()`` special method to provide details to the attribute. Once set, any reference to ``c.typeid.name`` will be evaluated as ``__getattr__(c, 'name')``. This permits the typeid to compare the name provided by ``__set_name__()`` with the name being inquired about. The Command Class ******************** The ``Command`` class is abstract, and describes most of the features of the various subclasses. .. _`The Command Abstract Base Class (13)`: .. rubric:: The Command Abstract Base Class (13) = .. parsed-literal:: :class: code class Command(metaclass=TypeIdMeta): typeid: TypeId has\_name: TypeGuard["ReferenceCommand"] = False has\_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = False def \_\_init\_\_(self, location: tuple[str, int]) -> None: self.location = location #: The (filename, line number) self.logger = logging.getLogger(self.\_\_class\_\_.\_\_name\_\_) self.web: ReferenceType["Web"] self.text: str #: The body of this command def \_\_repr\_\_(self) -> str: return f"{self.\_\_class\_\_.\_\_name\_\_}(location={self.location!r})" @abc.abstractmethod def tangle(self, aTangler: "Tangler", target: TextIO) -> None: ... .. .. container:: small ∎ *The Command Abstract Base Class (13)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. The HasText Classes ******************* A type hint summarizes some of the subclass relationships. .. _`The HasText Type Hint -- used instead of another abstract class (14)`: .. rubric:: The HasText Type Hint -- used instead of another abstract class (14) = .. parsed-literal:: :class: code HasText = Union["CodeCommand", "TextCommand"] .. .. container:: small ∎ *The HasText Type Hint -- used instead of another abstract class (14)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. We don't formalize this as proper subclass definitions. We probably should, but it doesn't seem to add any clarity. The TextCommand Class ********************* The ``TextCommand`` class describes all of the text **outside** the ``@d`` and ``@o`` chunks. These are **not** tangled, and an exception is raised. .. _`The TextCommand Class (15)`: .. rubric:: The TextCommand Class (15) = .. parsed-literal:: :class: code class TextCommand(Command): """Text outside any other command.""" has\_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = True def \_\_init\_\_(self, text: str, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) self.text = text #: The text def tangle(self, aTangler: "Tangler", target: TextIO) -> None: message = f"attempt to tangle a text block {self.location} {shorten(self.text, 32)!r}" self.logger.error(message) raise Error(message) def \_\_repr\_\_(self) -> str: return f"{self.\_\_class\_\_.\_\_name\_\_}(text={self.text!r}, location={self.location!r})" .. .. container:: small ∎ *The TextCommand Class (15)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. The CodeCommand Class ********************* The ``CodeCommand`` class describes the text **inside** the ``@d`` and ``@o`` chunks. These are tangled without change. .. _`The CodeCommand Class (16)`: .. rubric:: The CodeCommand Class (16) = .. parsed-literal:: :class: code class CodeCommand(Command): """Code inside a \`\`@o\`\`, or \`\`@d\`\` command.""" has\_text: TypeGuard[Union["CodeCommand", "TextCommand"]] = True def \_\_init\_\_(self, text: str, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) self.text = text #: The text def tangle(self, aTangler: "Tangler", target: TextIO) -> None: self.logger.debug(f"tangle {self.text=!r}") aTangler.codeBlock(target, self.text) def \_\_repr\_\_(self) -> str: return f"{self.\_\_class\_\_.\_\_name\_\_}(text={self.text!r}, location={self.location!r})" .. .. container:: small ∎ *The CodeCommand Class (16)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. The ReferenceCommand Class ************************** The ``ReferenceCommand`` class describes a ``@< name @>`` construct inside a chunk. When tangled, these lead to inserting the referenced chunk's content. Because this a reference to another chunk, the properties provide the values for the other chunk. .. _`The ReferenceCommand Class (17)`: .. rubric:: The ReferenceCommand Class (17) = .. parsed-literal:: :class: code class ReferenceCommand(Command): """ Reference to a \`\`NamedChunk\`\` in code, a \`\`@< name @>\`\` construct. In a CodeChunk or OutputChunk, it tangles to the definition from a \`\`NamedChunk\`\`. In text, it can weave to the text of a \`\`NamedDocumentChunk\`\`. """ has\_name: TypeGuard["ReferenceCommand"] = True def \_\_init\_\_(self, name: str, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) self.name = name #: The name that is referenced. @property def full\_name(self) -> str: return cast(Web, self.web()).resolve\_name(self.name) @property def seq(self) -> int \| None: return cast(Web, self.web()).resolve\_chunk(self.name)[0].seq def tangle(self, aTangler: "Tangler", target: TextIO) -> None: """Expand this reference. The starting position is the indentation for all \*\*subsequent\*\* lines. Provide the indent before \`\`@<\`\`, in \`\`tangler.fragment\`\` back to the tangler. """ self.logger.debug(f"tangle reference to {self.name=}, context: {aTangler.fragment=}") chunk\_list = cast(Web, self.web()).resolve\_chunk(self.name) if len(chunk\_list) == 0: message = f"Attempt to tangle an undefined Chunk, {self.name!r}" self.logger.error(message) raise Error(message) aTangler.reference\_names.add(self.name) aTangler.addIndent(len(aTangler.fragment)) aTangler.fragment = "" for chunk in chunk\_list: # TODO: if chunk.options includes '-indent': do a setIndent before tangling. for command in chunk.commands: command.tangle(aTangler, target) aTangler.clrIndent() def \_\_repr\_\_(self) -> str: return f"{self.\_\_class\_\_.\_\_name\_\_}(name={self.name!r}, location={self.location!r})" .. .. container:: small ∎ *The ReferenceCommand Class (17)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. The XrefCommand Classes ************************** The ``XRefCommand`` classes describes a ``@f``, ``@m``, and ``@u`` constructs inside a chunk. These are **not** Tangled. They're only woven. Each offers a unique property that can be used by the template rending to get data about the WEB content. .. _`The XrefCommand Subclasses -- files, macros, and user names (18)`: .. rubric:: The XrefCommand Subclasses -- files, macros, and user names (18) = .. parsed-literal:: :class: code class FileXrefCommand(Command): """The \`\`@f\`\` command.""" def \_\_init\_\_(self, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) @property def files(self) -> list["OutputChunk"]: return cast(Web, self.web()).files def tangle(self, aTangler: "Tangler", target: TextIO) -> None: raise Error('Illegal tangling of a cross reference command.') class MacroXrefCommand(Command): """The \`\`@m\`\` command.""" def \_\_init\_\_(self, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) @property def macros(self) -> list[SimpleNamespace]: return cast(Web, self.web()).macros def tangle(self, aTangler: "Tangler", target: TextIO) -> None: raise Error('Illegal tangling of a cross reference command.') class UserIdXrefCommand(Command): """The \`\`@u\`\` command.""" def \_\_init\_\_(self, location: tuple[str, int]) -> None: super().\_\_init\_\_(location) @property def userids(self) -> list[SimpleNamespace]: return cast(Web, self.web()).userids def tangle(self, aTangler: "Tangler", target: TextIO) -> None: raise Error('Illegal tangling of a cross reference command.') .. .. container:: small ∎ *The XrefCommand Subclasses -- files, macros, and user names (18)*. Used by → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_. Output Serialization -------------------- The ``Emitter`` class hierarchy writes the output from the source ``Web`` instance. An ``Emitter`` instance is responsible for control of an output file format. This includes the necessary file naming, opening, writing and closing operations. .. uml:: abstract class Emitter { output: Path emit(Web) } class Web Emitter ..> Web class Weaver Emitter <|-- Weaver class Tangler Emitter <|-- Tangler class TanglerMake Tangler <|-- TanglerMake package jinja { class Environment } Weaver --> Environment object template Weaver *-- template Environment --> template Here's how the definitions are provided in the application. The two reference class definitions are used by by the ``Emitter`` class, and needs to be defined first. We'll look at them later, since they're a tiny strategy change in how cross-references are displayed. .. _`Base Class Definitions (19)`: .. rubric:: Base Class Definitions (19) += .. parsed-literal:: :class: code → `Emitter Superclass (21)`_ → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_ → `Tangler Subclass -- emits the output files (28)`_ → `TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32)`_ .. .. container:: small ∎ *Base Class Definitions (19)*. Used by → `pyweb.py (79)`_. .. _`Imports (20)`: .. rubric:: Imports (20) += .. parsed-literal:: :class: code import abc from textwrap import dedent, shorten from jinja2 import Environment, DictLoader, select\_autoescape .. .. container:: small ∎ *Imports (20)*. Used by → `pyweb.py (79)`_. The ``Emitter`` class is an abstraction, used to check the consistency of the subclasses. .. _`Emitter Superclass (21)`: .. rubric:: Emitter Superclass (21) = .. parsed-literal:: :class: code class Emitter(abc.ABC): def \_\_init\_\_(self, output: Path): self.logger = logging.getLogger(self.\_\_class\_\_.\_\_qualname\_\_) self.log\_indent = logging.getLogger("indent." + self.\_\_class\_\_.\_\_qualname\_\_) self.output = output @abc.abstractmethod def emit(self, web: Web) -> None: pass .. .. container:: small ∎ *Emitter Superclass (21)*. Used by → `Base Class Definitions (19)`_. The Weaver Subclass ~~~~~~~~~~~~~~~~~~~~ The Weaver is a **Facade** that wraps Jinja template processing. The job is to build the necessary environment, locate the templates, and then evaluate the template's ``generate()`` method to fill the values into the template to create the woven document. There's "base_weaver" template that contains the essential structure of the output document. This creates the needed macros, and then weaves the various chunks, in order. Each unique markup language has macros that provide the unique markup required for the various chunks. This permits customization of the markup. We have an interesting wrinkle with RST-formatted output. There are two variants that may be important: - When used with Sphinx, the "small" caption at the end of a code block uses ``.. rst-class:: small``. - When used without Sphinx, i.e., native docutils, the the "small" caption at the end of a code block uses ``.. class:: small``. This is a minor change to the template being used. The question is how to make that distinction in the weaver? One view is to use subclasses of :py:class:`Weaver` for this. However, the templates are found by name in the ``template_map`` within the ``Weaver``. The ``--weaver`` command-line option provides the string (e.g., ``rst`` or ``html``) used to build a key into the template map. We can, therefore, use the ``--weaver`` command-line option to provide an expanded set of names for RST processing. - ``-w rst`` is the Sphinx option. - ``-w rst-sphinx`` is an alias for ``rst``. The dictionary key points to the same templates as ``rst``. - ``-w rst-nosphinx`` is the "pure-docutils" version, using ``.. class::``. - ``-w rst-docutils`` is an alias for the nosphinx option. While this works out nicely, it turns out that the ``.. container:: small`` is, perhaps, a better markup that ``.. class:: small``. This work in docutils **and** Sphinx. .. _`Weaver Subclass -- Uses Jinja templates to weave documentation (22)`: .. rubric:: Weaver Subclass -- Uses Jinja templates to weave documentation (22) = .. parsed-literal:: :class: code → `Debug Templates -- these display debugging information (24)`_ → `RST Templates -- the default weave output (25)`_ → `HTML Templates -- emit HTML weave output (26)`_ → `LaTeX Templates -- emit LaTeX weave output (27)`_ → `Common base template -- this is used for ALL weaving (23)`_ class Weaver(Emitter): template\_map = { "debug\_defaults": debug\_weaver\_template, "debug\_macros": "", "rst\_defaults": rst\_weaver\_template, "rst\_macros": rst\_overrides\_template, "html\_defaults": html\_weaver\_template, "html\_macros": html\_overrides\_template, "tex\_defaults": latex\_weaver\_template, "tex\_macros": tex\_overrides\_template, "rst-sphinx\_defaults": rst\_weaver\_template, "rst-sphinx\_macros": rst\_overrides\_template, "rst-nosphinx\_defaults": rst\_weaver\_template, "rst-nosphinx\_macros": rst\_nosphinx\_template, "rst-docutils\_defaults": rst\_weaver\_template, "rst-docutils\_macros": rst\_nosphinx\_template, } quote\_rules = { "rst": rst\_quote\_rules, "html": html\_quote\_rules, "tex": latex\_quote\_rules, "debug": debug\_quote\_rules, } def \_\_init\_\_(self, output: Path = Path.cwd()) -> None: super().\_\_init\_\_(output) # Summary self.linesWritten = 0 def set\_markup(self, markup: str = "rst") -> "Weaver": self.markup = markup return self def emit(self, web: Web) -> None: self.target\_path = (self.output / web.web\_path.name).with\_suffix(f".{self.markup}") self.logger.info("Weaving %s using %s markup", self.target\_path, self.markup) with self.target\_path.open('w') as target\_file: for text in self.generate\_text(web): self.linesWritten += text.count("\\n") target\_file.write(text) def generate\_text(self, web: Web) -> Iterator[str]: self.env = Environment( loader=DictLoader( self.template\_map \| {'base\_weaver': base\_template,} ), autoescape=select\_autoescape() ) self.env.filters \|= { "quote\_rules": self.quote\_rules[self.markup] } defaults = self.env.get\_template(f"{self.markup}\_defaults") macros = self.env.get\_template(f"{self.markup}\_macros") template = self.env.get\_template("base\_weaver") return template.generate(web=web, macros=macros, defaults=defaults) .. .. container:: small ∎ *Weaver Subclass -- Uses Jinja templates to weave documentation (22)*. Used by → `Base Class Definitions (19)`_. There are several strategy plug-ins. Each is unique for a particular flavort of markup. These include the quoting function used to escape markup characters, and the templates used. The objective is to have a generic "weaver" template which includes three levels of template definition: 1. Defaults. 2. Configured overrides, perhaps from ``pyweb.toml``. 3. Document overrides from the ``.w`` file in ``@t name @{...@}`` commands. This means there is a two-step binding between document and macros. 1. The base weaver document should import three generic template definitions: ``{%- from 'markup' import * %}`` ``{%- from 'configured' import * %}`` ``{%- from 'document' import * %}`` 2. These names map (*somehow*) to specific templates based on markup language. ``markup`` -> ``rst/markup``, etc. This allows us to provide all templates and make a final binding at weave time. We can use a prefix loader with a given prefix. Some kind of "import rst/markup as markup" would be ideal. Jinja, however, doesn't seem to support this the same way Python does. There's no ``import as`` construct allowing very late binding. The alternative is to create the environment very late in the process, once we have all the information available. We can then pick the templates to put into a DictLoader to support the standard weaving structure. The quoting rules apply to the various template languages. The idea is that a few characters must be escaped for proper presentation in the code sample sections. Common Base Template ******************** The common base template expands each chunk and each command in order. This involves some special case processing for ``OutputChunk`` and ``NamedChunk`` which have a "wrapper" woven around the chunk's sequence of commands. This relies on a number of individual macros: - text(command). Emits a block of text -- this should do *nothing* with the text. The author's original markup passes through untouched. - begin_code(chunk). Starts a block of code, either ``@d`` or ``@o``. - code(command). Emits a block fo code. This may require escapes for special characters that would break the markup being used. - ref(command). Emits a reference to a named block of code. - end_code(chunk). Ends a block of code. - file_xref(command). Emit the full ``@f`` output, usually some kind of definition list. - macro_xref(command). Emit the full ``@m`` output, usually some kind of definition list. - userid_xref(command). Emit the full ``@u`` output, usually some kind of definition list. The ``ref()`` macro can also be used in the XREF output macros. It can also be used in the ``end_code()`` macro. After a block of code, some tools (like Interscript) will show where the block was referenced. The point of using the ``ref()`` macro in multiple places is to make all of them look identical. There are a variety of optional formatting considerations. First is cross-references, second is a variety of ``begin_code()`` options. There are four styles for the "referencedBy" information in a ``Chunk``. - Nothing. - The immediate ``@`` Chunk. - The entire transitive sequence of parents for the ``@`` Chunk. There are two forms for this: - Top-down path. ``→ Named (1) / → Sub-Named (2) / → Sub-Sub-Named (3)``. - Bottom-up path. ``→ Sub-Sub-Named (3) ∈ → Sub-Named (2) ∈ → Named (1)``. These require four distinct versions of the ``end_code()`` macro. This macro uses the ``transitive_referencedBy`` propery of a ``Chunk`` producing a sequence of ``ref()`` values. We need .. _`Common base template -- this is used for ALL weaving (23)`: .. rubric:: Common base template -- this is used for ALL weaving (23) = .. parsed-literal:: :class: code base\_template = dedent("""\\ {%- from macros import text, begin\_code, code, ref, end\_code, file\_xref, macro\_xref, userid\_xref -%} {%- if not text is defined %}{%- from defaults import text -%}{%- endif -%} {%- if not begin\_code is defined %}{%- from defaults import begin\_code -%}{%- endif -%} {%- if not code is defined %}{%- from defaults import code -%}{%- endif -%} {%- if not ref is defined %}{%- from defaults import ref -%}{%- endif -%} {%- if not end\_code is defined %}{%- from defaults import end\_code -%}{%- endif -%} {%- if not file\_xref is defined %}{%- from defaults import file\_xref -%}{%- endif -%} {%- if not macro\_xref is defined %}{%- from defaults import macro\_xref -%}{%- endif -%} {%- if not userid\_xref is defined %}{%- from defaults import userid\_xref -%}{%- endif -%} {% for chunk in web.chunks -%} {%- if chunk.type\_is('OutputChunk') or chunk.type\_is('NamedChunk') -%} {{begin\_code(chunk)}} {%- for command in chunk.commands -%} {%- if command.typeid.CodeCommand -%}{{code(command)}} {%- elif command.typeid.ReferenceCommand -%}{{ref(command)}} {%- endif -%} {%- endfor -%} {{end\_code(chunk)}} {%- elif chunk.type\_is('Chunk') -%} {%- for command in chunk.commands -%} {%- if command.typeid.TextCommand %}{{text(command)}} {%- elif command.typeid.ReferenceCommand %}{{ref(command)}} {%- elif command.typeid.FileXrefCommand %}{{file\_xref(command)}} {%- elif command.typeid.MacroXrefCommand %}{{macro\_xref(command)}} {%- elif command.typeid.UserIdXrefCommand %}{{userid\_xref(command)}} {%- endif -%} {%- endfor -%} {%- endif -%} {%- endfor %} """) .. .. container:: small ∎ *Common base template -- this is used for ALL weaving (23)*. Used by → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_. **TODO:** Need to more gracefully handle the case where an output chunk has multiple definitions. .. parsed-literal:: @o x.y @{ ... part 1 ... @} @o x.y @{ ... part 2 ... @} The above should have the same output as the follow (more complex) alternative: .. parsed-literal:: @o x.y @{ @ @ @} @d part 1 @{ ... part 1 ... @} @d part 2 @{ ... part 2 ... @} Currently, we casually treat the first instance as the "definition", and don't provide references to the additional parts of the definition. Debug Template *************** .. _`Debug Templates -- these display debugging information (24)`: .. rubric:: Debug Templates -- these display debugging information (24) = .. parsed-literal:: :class: code def debug\_quote\_rules(text: str) -> str: return repr(text) debug\_weaver\_template = dedent("""\\ {%- macro text(command) -%} text: {{command}} {%- endmacro -%} {%- macro begin\_code(chunk) %} begin\_code: {{chunk}} {%- endmacro -%} {%- macro code(command) %} code: {{command}} {%- endmacro -%} {%- macro ref(id) %} ref: {{id}} {%- endmacro -%} {%- macro end\_code(chunk) %} end\_code: {{chunk}} {% endmacro -%} {%- macro file\_xref(command) -%} file\_xref {{command.files}} {%- endmacro -%} {%- macro macro\_xref(command) -%} macro\_xref {{command.macros}} {%- endmacro -%} {%- macro userid\_xref(command) -%} userid\_xref {{command.userids}} {%- endmacro -%} """) .. .. container:: small ∎ *Debug Templates -- these display debugging information (24)*. Used by → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_. RST Template *************** The RST Templates produce ReStructuredText for the various web commands. Note that code lines must be indented when using this markup. .. _`RST Templates -- the default weave output (25)`: .. rubric:: RST Templates -- the default weave output (25) = .. parsed-literal:: :class: code def rst\_quote\_rules(text: str) -> str: quoted\_chars = [ ('\\\\', r'\\\\'), # Must be first. ('\`', r'\\\`'), ('\_', r'\\\_'), ('\*', r'\\\*'), ('\|', r'\\\|'), ] clean = text for from\_, to\_ in quoted\_chars: clean = clean.replace(from\_, to\_) return clean rst\_weaver\_template = dedent(""" {%- macro text(command) -%} {{command.text}} {%- endmacro -%} {%- macro begin\_code(chunk) %} .. \_\`{{chunk.full\_name or chunk.name}} ({{chunk.seq}})\`: .. rubric:: {{chunk.full\_name or chunk.name}} ({{chunk.seq}}) {% if chunk.initial %}={% else %}+={% endif %} .. parsed-literal:: :class: code {% endmacro -%} {# For RST, each line must be indented. #} {%- macro code(command) %}{% for line in command.text.splitlines() %} {{line \| quote\_rules}} {% endfor -%}{% endmacro -%} {%- macro ref(id) %} \\N{RIGHTWARDS ARROW} \`{{id.full\_name or id.name}} ({{id.seq}})\`\_{% endmacro -%} {# When using Sphinx, this \*could\* be rst-class::, pure docutils uses container::#} {%- macro end\_code(chunk) %} .. .. container:: small \\N{END OF PROOF} \*{{chunk.full\_name or chunk.name}} ({{chunk.seq}})\*. {% if chunk.referencedBy %}Used by {{ref(chunk.referencedBy)}}.{% endif %} {% endmacro -%} {%- macro file\_xref(command) -%} {% for file in command.files -%} :{{file.name}}: \\N{RIGHTWARDS ARROW} \`{{file.name}} ({{file.seq}})\`\_ {%- endfor %} {%- endmacro -%} {%- macro macro\_xref(command) -%} {% for macro in command.macros -%} :{{macro.full\_name}}: {% for d in macro.def\_list -%}\\N{RIGHTWARDS ARROW} \`{{d.full\_name or d.name}} ({{d.seq}})\`\_{% if loop.last %}{% else %}, {% endif %}{%- endfor %} {% endfor %} {%- endmacro -%} {%- macro userid\_xref(command) -%} {% for userid in command.userids -%} :{{userid.userid}}: {% for r in userid.ref\_list -%}\\N{RIGHTWARDS ARROW} \`{{r.full\_name or r.name}} ({{r.seq}})\`\_{% if loop.last %}{% else %}, {% endif %}{%- endfor %} {% endfor %} {%- endmacro -%} """) rst\_overrides\_template = dedent("""\\ """) rst\_nosphinx\_template = dedent("""\\ {%- macro end\_code(chunk) %} .. .. class:: small \\N{END OF PROOF} \*{{chunk.full\_name or chunk.name}} ({{chunk.seq}})\* {% endmacro -%} """) .. .. container:: small ∎ *RST Templates -- the default weave output (25)*. Used by → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_. HTML Template *************** The HTML templates use a relatively simple markup, avoiding any CSS names. A slightly more flexible approach might be to name specific CSS styles, and provide generic definitions for those styles. This would make it easier to tailor HTML output via CSS changes, avoiding any HTML modifications. .. _`HTML Templates -- emit HTML weave output (26)`: .. rubric:: HTML Templates -- emit HTML weave output (26) = .. parsed-literal:: :class: code def html\_quote\_rules(text: str) -> str: quoted\_chars = [ ("&", "&"), # Must be first ("<", "<"), (">", ">"), ('"', """), # Only applies inside tags... ] clean = text for from\_, to\_ in quoted\_chars: clean = clean.replace(from\_, to\_) return clean html\_weaver\_template = dedent("""\\ {%- macro text(command) -%} {{command.text}} {%- endmacro -%} {%- macro begin\_code(chunk) %}

{{chunk.full\_name or chunk.name}} ({{chunk.seq}}) {% if chunk.initial %}={% else %}+={% endif %}


        {%- endmacro -%}
        
        {%- macro code(command) -%}{{command.text \| quote\_rules}}{%- endmacro -%}
        
        {%- macro ref(id) %}→{{id.full\_name or id.name}} ({{id.seq}}){% endmacro -%}
        
        {%- macro end\_code(chunk) %}
        

{{chunk.full\_name or chunk.name}} ({{chunk.seq}}). {% if chunk.referencedBy %}Used by {{ref(chunk.referencedBy)}}.{% endif %}

{% endmacro -%} {%- macro file\_xref(command) %}
{% for file in command.files -%}
{{file.name}}
{{ref(file)}}
{%- endfor %}
{% endmacro -%} {%- macro macro\_xref(command) %}
{% for macro in command.macros -%}
{{macro.full\_name}}
{% for d in macro.def\_list -%}{{ref(d)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}
{% endfor %}
{% endmacro -%} {%- macro userid\_xref(command) %}
{% for userid in command.userids -%}
{{userid.userid}}
{% for r in userid.ref\_list -%}{{ref(r)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %}
{% endfor %}
{% endmacro -%} """) html\_overrides\_template = dedent("""\\ """) .. .. container:: small ∎ *HTML Templates -- emit HTML weave output (26)*. Used by → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_. LaTeX Template *************** The LaTeX templates use a markup focused in the ``verbatim`` environment. Common alternatives include ``listings`` and ``minted``. .. _`LaTeX Templates -- emit LaTeX weave output (27)`: .. rubric:: LaTeX Templates -- emit LaTeX weave output (27) = .. parsed-literal:: :class: code def latex\_quote\_rules(text: str) -> str: quoted\_strings = [ ("\\\\end{Verbatim}", "\\\\end\\\\,{Verbatim}"), # Allow \\end{Verbatim} in a Verbatim context ("\\\\{", "\\\\\\\\,{"), # Prevent unexpected commands in Verbatim ("$", "\\\\$"), # Prevent unexpected math in Verbatim ] clean = text for from\_, to\_ in quoted\_strings: clean = clean.replace(from\_, to\_) return clean latex\_weaver\_template = dedent("""\\ {%- macro text(command) -%} {{command.text}} {%- endmacro -%} {%- macro begin\_code(chunk) %} \\\\label{pyweb-{{chunk.seq}}} \\\\begin{flushleft} \\\\textit{Code example {{chunk.full\_name or chunk.name}} ({{chunk.seq}})} \\\\begin{Verbatim}[commandchars=\\\\\\\\\\\\{\\\\},codes={\\\\catcode\`$$=3\\\\catcode\`^=7},frame=single] {%- endmacro -%} {%- macro code(command) -%}{{command.text \| quote\_rules}}{%- endmacro -%} {%- macro ref(id) %}$$\\\\rightarrow$$ Code Example {{id.full\_name or id.name}} ({{id.seq}}){% endmacro -%} {%- macro end\_code(chunk) %} \\\\end{Verbatim} \\\\end{flushleft} {% endmacro -%} {%- macro file\_xref(command) %} \\\\begin{itemize} {% for file in command.files -%} \\\\item {{file.name}}: {{ref(file)}} {%- endfor %} \\\\end{itemize} {% endmacro -%} {%- macro macro\_xref(command) %} \\\\begin{itemize} {% for macro in command.macros -%} \\\\item {{macro.full\_name}} \\\\\\\\ {% for d in macro.def\_list -%}{{ref(d)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %} {% endfor %} \\\\end{itemize} {% endmacro -%} {%- macro userid\_xref(command) %} \\\\begin{itemize} {% for userid in command.userids -%} \\\\item {{userid.userid}} \\\\\\\\ {% for r in userid.ref\_list -%}{{ref(r)}}{% if loop.last %}{% else %}, {% endif %}{%- endfor %} {% endfor %} \\\\end{itemize} {% endmacro -%} """) tex\_overrides\_template = dedent("""\\ """) .. .. container:: small ∎ *LaTeX Templates -- emit LaTeX weave output (27)*. Used by → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_. The Tangler Subclasses ~~~~~~~~~~~~~~~~~~~~~~ Tangling is a variation on emitting that includes all the code in the order defined by the ``@o`` file commands. This is not necessarily the order they're presented in the document. The whole point of Weaving and Tangling is to write a document in an order that's sensible for people to understand. The tangled output is for compilers and run-time environments. Each file is individually tangled, unrelated to the order of the source WEB document. The ``emit()`` process, therefore, iterates through all of the files defined in the WEB. There's a complex interplay between ``Tangler`` and ``CodeCommand`` to maintain the indentations. .. uml:: participant Tangler participant ReferenceCommand participant Command Tangler --> ReferenceCommand : tangle() ReferenceCommand --> Tangler : get len(fragment) ReferenceCommand --> Tangler : addIndent(i) group [for all] commands in the referenced chunk ReferenceCommand --> Command : tangle() end ReferenceCommand --> Tangler : clrIndent() This approach can preserves the indentation in front of a ``@< reference @>`` command. .. _`Tangler Subclass -- emits the output files (28)`: .. rubric:: Tangler Subclass -- emits the output files (28) = .. parsed-literal:: :class: code class Tangler(Emitter): code\_indent = 0 #: Initial indent def \_\_init\_\_(self, output: Path = Path.cwd()) -> None: super().\_\_init\_\_(output) self.context: list[int] = [] #: Indentations self.fragment = "" # Nothing written yet. # Create context and initial lastIndent values self.resetIndent(self.code\_indent) # Summaries self.reference\_names: set[str] = set() self.linesWritten = 0 self.totalFiles = 0 self.totalLines = 0 def emit(self, web: Web) -> None: for file\_chunk in web.files: self.logger.info("Tangling %s", file\_chunk.name) self.emit\_file(web, file\_chunk) def emit\_file(self, web: Web, file\_chunk: Chunk) -> None: target\_path = self.output / (file\_chunk.name or "Untitled.out") self.logger.debug("Writing %s", target\_path) self.logger.debug("Chunk %r", file\_chunk) with target\_path.open("w") as target: # An initial command to provide indentations. for command in file\_chunk.commands: command.tangle(self, target) → `Emitter write a block of code with proper indents (29)`_ → `Emitter indent control: set, clear and reset (30)`_ .. .. container:: small ∎ *Tangler Subclass -- emits the output files (28)*. Used by → `Base Class Definitions (19)`_. The ``codeBlock()`` method is used by each block of code tangled into a document. There are two sources of indentation: - A ``Chunk`` can provide an indent setting as an option. This is provided by the ``indent`` attribute of the tangle context. If specified, this is the indentation. - A ``@< name @>`` ``ReferenceCommand`` may be indented. This will be in a ``Chunk`` as the following three commands: 1. A ``CodeCommand`` with only spaces and no trailing ``\n``. The indent is buffered -- not written -- and the ``fragment`` attribute is set. 2. The ``ReferenceCommand``. This interpolates text from a ``NamedChunk`` using the prevailing indent. The ``tangle()`` method uses ``addIndent()`` and ``clrIndent()`` to mark this. The processing depends on this tangler's ``fragment`` attribute to provide the pending indentation; the ``addIndent()`` must consume the fragment to prevent confusion with subsequent indentations. 3. A ``CodeCommand`` with a trailing ``\n``. (Often it's only the newline.) If the ``fragment`` attribute is set, there's a pending indentation that hasn't yet been written. This can happen with there's a ``@@`` command at the left end of a line; often a Python decorator. The fragment is written and the ``fragment`` attribute cleared. No ``addIdent()`` will have been done to consume the fragment. While the WEB language permits multiple ``@ @`` on a single line, this is odd and potentially confusing. It isn't clear how the second reference should be indented. The ``ReferenceCommand`` ``tangle()`` implementation handles much of this. The following two rules apply: - A line of text that does not end with a newline, sets a new prevailing indent for the following command(s). - A line of text ending with a newline resets the prevailing indent. This a stack, maintained by the Tangler. .. _`Emitter write a block of code with proper indents (29)`: .. rubric:: Emitter write a block of code with proper indents (29) = .. parsed-literal:: :class: code def codeBlock(self, target: TextIO, text: str) -> None: """Indented write of text in a \`\`CodeCommand\`\`. Counts lines and saves position to indent to when expanding \`\`@<...@>\`\` references. The \`\`fragment\`\` is the prevailing indent used in reference expansion. """ for line in text.splitlines(keepends=True): self.logger.debug("codeBlock(%r)", line) indent = self.context[-1] if len(line) == 0: # Degenerate case of empty CodeText command. Should not occur. pass elif not line.endswith('\\n'): # Possible start of indentation prior to a \`\`@\`\` target.write(indent\*' ') wrote = target.write(line) self.fragment = ' ' \* wrote # May be used by a \`\`ReferenceCommand\`\`, if needed. elif line.endswith('\\n'): target.write(indent\*' ') target.write(line) self.linesWritten += 1 else: raise RuntimeError("Non-exhaustive if statement.") .. .. container:: small ∎ *Emitter write a block of code with proper indents (29)*. Used by → `Tangler Subclass -- emits the output files (28)`_. The ``addIndent()`` increments the indent. Used by ``@`` to set a prevailing indent. The ``setIndent()`` pushes a fixed indent instead adding an increment. Used by a ``Chunk`` with an ``-indent`` option. The ``clrIndent()`` method discards the most recent indent from the context stack. This is used when finished tangling a source chunk. This restores the indent to the prevailing indent. The ``resetIndent()`` method removes all indent context information and resets the indent to a default. .. _`Emitter indent control: set, clear and reset (30)`: .. rubric:: Emitter indent control: set, clear and reset (30) = .. parsed-literal:: :class: code def addIndent(self, increment: int) -> None: self.lastIndent = self.context[-1]+increment self.context.append(self.lastIndent) self.log\_indent.debug("addIndent %d: %r", increment, self.context) self.fragment = "" def setIndent(self, indent: int) -> None: self.context.append(indent) self.lastIndent = self.context[-1] self.log\_indent.debug("setIndent %d: %r", indent, self.context) self.fragment = "" def clrIndent(self) -> None: if len(self.context) > 1: self.context.pop() self.lastIndent = self.context[-1] self.log\_indent.debug("clrIndent %r", self.context) self.fragment = "" def resetIndent(self, indent: int = 0) -> None: """Resets the indentation context.""" self.lastIndent = indent self.context = [self.lastIndent] self.log\_indent.debug("resetIndent %d: %r", indent, self.context) .. .. container:: small ∎ *Emitter indent control: set, clear and reset (30)*. Used by → `Tangler Subclass -- emits the output files (28)`_. An extension to the ``Tangler`` class that only updates a file if the content has changed. This tangles to a temporary file. If the content is identical, the temporary file is quietly disposed of. Otherwise, the temporary file is linked to the original name. Files are compared with the ``filecmp`` module. .. _`Imports (31)`: .. rubric:: Imports (31) += .. parsed-literal:: :class: code import filecmp import tempfile import os .. .. container:: small ∎ *Imports (31)*. Used by → `pyweb.py (79)`_. .. _`TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32)`: .. rubric:: TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32) = .. parsed-literal:: :class: code class TanglerMake(Tangler): def emit\_file(self, web: Web, file\_chunk: Chunk) -> None: target\_path = self.output / (file\_chunk.name or "Untitled.out") self.logger.debug("Writing %s via a temp file", target\_path) self.logger.debug("Chunk %r", file\_chunk) fd, tempname = tempfile.mkstemp(dir=os.curdir) with os.fdopen(fd, "w") as target: for command in file\_chunk.commands: command.tangle(self, target) try: same = filecmp.cmp(tempname, target\_path) except OSError as e: same = False # Doesn't exist. (Could check for errno.ENOENT) if same: self.logger.info("Unchanged '%s'", target\_path) os.remove(tempname) else: # Windows requires the original file name be removed first. try: target\_path.unlink() except OSError as e: pass # Doesn't exist. (Could check for errno.ENOENT) target\_path.parent.mkdir(parents=True, exist\_ok=True) target\_path.hardlink\_to(tempname) os.remove(tempname) self.logger.info("Wrote %d lines to %s", self.linesWritten, target\_path) .. .. container:: small ∎ *TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32)*. Used by → `Base Class Definitions (19)`_. Input Parsing ------------- There are three tiers to the input parsing: - A base tokenizer. = Additionally, a separate parser is used for options in ``@d`` and ``@o`` commands. - The overall ``WebReader`` class. .. uml:: class WebReader { load(path) : Web parse_source() } class Tokenizer <> { __next__(self) : str } WebReader --> Tokenizer WebReader --> WebReader : "parent" WebReader --> argparse.ArgumentParser We'll start with the ``WebReader`` class definition .. _`Base Class Definitions (33)`: .. rubric:: Base Class Definitions (33) += .. parsed-literal:: :class: code → `Tokenizer class - breaks input into tokens (51)`_ → `WebReader class - parses the input file, building the Web structure (34)`_ .. .. container:: small ∎ *Base Class Definitions (33)*. Used by → `pyweb.py (79)`_. The WebReader Class ~~~~~~~~~~~~~~~~~~~ There are two forms of the constructor for a ``WebReader``. The initial ``WebReader`` instance is created with code like the following: .. parsed-literal:: p = WebReader() p.command = options.commandCharacter This will define the command character; usually provided as a command-line parameter to the application. When processing an include file (with the ``@i`` command), a child ``WebReader`` instance is created with code like the following: .. parsed-literal:: c = WebReader(parent=parentWebReader) This will inherit the configuration from the parent ``WebReader``. This will also include a reference from child to parent so that embedded Python expressions can view the entire input context. The ``WebReader`` class parses the input file into command blocks. These are assembled into ``Chunks``, and the ``Chunks`` are assembled into the document ``Web``. Once this input pass is complete, the resulting ``Web`` can be tangled or woven. The commands have three general types: - "Structural" commands define the structure of the ``Chunks``. The structural commands are ``@d`` and ``@o``, as well as the ``@{``, ``@}``, ``@[``, ``@]`` brackets, and the ``@i`` command to include another file. - "Inline" commands are inline within a ``Chunk``: they define internal ``Commands``. Blocks of text are minor commands, as well as the ``@<``\ *name*\ ``@>`` references. The ``@@`` escape is also handled here so that all further processing is independent of any parsing. - "Content" commands generate woven content. These include the various cross-reference commands (``@f``, ``@m`` and ``@u``). There are two class-level ``argparse.ArgumentParser`` instances used by this class. :output_option_parser: An ``argparse.ArgumentParser`` used to parse the ``@o`` command's options. :definition_option_parser: An ``argparse.ArgumentParser`` used to parse the ``@d`` command's options. The class has the following attributes: :parent: is the outer ``WebReader`` when processing a ``@i`` command. :command: is the command character; a WebReader will use the parent command character if the parent is not ``None``. Default is ``@``. :permitList: is the list of commands that are permitted to fail. This is generally an empty list or ``('@i',)``. :_source: The open file-like object being used by ``load()``. :filePath: The path being processed; this provides a visible file name. :tokenizer: An instance of ``Tokenizer`` used to parse the input. This is built when ``load()`` is called. :totalLines: :totalFiles: :errors: Summary counts. .. _`WebReader class - parses the input file, building the Web structure (34)`: .. rubric:: WebReader class - parses the input file, building the Web structure (34) = .. parsed-literal:: :class: code class WebReader: """Parse an input file, creating Chunks and Commands.""" # Configuration #: The command prefix, default \`\`@\`\`. command: str #: Permitted errors, usually @i commands permitList: list[str] #: Working directory to resolve @i commands base\_path: Path #: The tokenizer used to find commands tokenizer: Tokenizer # State of the reader #: Parent context for @i commands parent: Optional["WebReader"] #: Input Path filePath: Path #: Input file-like object, default is self.filePath.open() \_source: TextIO #: The sequence of Chunk instances being built content: list[Chunk] def \_\_init\_\_(self, parent: Optional["WebReader"] = None) -> None: self.logger = logging.getLogger(self.\_\_class\_\_.\_\_qualname\_\_) self.output\_option\_parser = argparse.ArgumentParser(add\_help=False, exit\_on\_error=False) self.output\_option\_parser.add\_argument("-start", dest='start', type=str, default=None) self.output\_option\_parser.add\_argument("-end", dest='end', type=str, default="") self.output\_option\_parser.add\_argument("argument", type=str, nargs="\*") # TODO: Allow a numeric argument value in \`\`-indent\`\` self.definition\_option\_parser = argparse.ArgumentParser(add\_help=False, exit\_on\_error=False) self.definition\_option\_parser.add\_argument("-indent", dest='indent', action='store\_true', default=False) self.definition\_option\_parser.add\_argument("-noindent", dest='noindent', action='store\_true', default=False) self.definition\_option\_parser.add\_argument("argument", type=str, nargs="\*") # Configuration comes from the parent or defaults if there is no parent. self.parent = parent if self.parent: self.command = self.parent.command self.permitList = self.parent.permitList else: # Defaults until overridden self.command = '@' self.permitList = [] # Summary self.totalLines = 0 self.totalFiles = 0 self.errors = 0 → `WebReader command literals (49)`_ def \_\_str\_\_(self) -> str: return self.\_\_class\_\_.\_\_name\_\_ → `WebReader location in the input stream (46)`_ → `WebReader load the web (48)`_ → `WebReader handle a command string (35)`_ .. .. container:: small ∎ *WebReader class - parses the input file, building the Web structure (34)*. Used by → `Base Class Definitions (33)`_. The reader maintains a context into which constructs are added. The ``Web`` contains ``Chunk`` instances in ``self.web.chunks``. The current chunk is ``self.web.chunks[-1]``. Each ``Chunk``, similarly, has a command context in ``chunk.commands[-1]``. This works because the language is "flat": there are no nested ``@d`` or ``@o`` chunks. Command recognition is done via a **Chain of Command**-like design. There are two conditions: the command string is recognized or it is not recognized. If the command is recognized, ``handleCommand()`` will do one of the following: - For "structural" commands, it will attach the current ``Chunk`` (*self.aChunk*) to the current ``Web`` (*self.aWeb*), and start a new ``Chunk``. This becomes the context for processing commands. By default an anonymous ``Chunk`` used to accumulate text is available for all of the content outside named chunks. - For "inline" and "content" commands, create a ``Command``, attach it to the current ``Chunk`` (*self.aChunk*). If the command is not recognized, ``handleCommand()`` returns false, and this is a syntax error. A subclass can override ``handleCommand()`` to (1) Evaluate this superclass version; (2) If the command is unknown to the superclass, then the subclass can process it; (3) If the command is unknown to both classes, then return ``False``. Either a subclass will handle it, or the default activity taken by ``load()`` is to treat the command as a syntax error. The ``handleCommand()`` implementation is a massive ``match`` statement. It might be a good idea to decompose this into a number of separate methods. This would make the ``match`` statement shorter and easier to understand. .. _`WebReader handle a command string (35)`: .. rubric:: WebReader handle a command string (35) = .. parsed-literal:: :class: code def handleCommand(self, token: str) -> bool: self.logger.debug("Reading %r", token) new\_chunk: Optional[Chunk] = None match token[:2]: case self.cmdo: → `start an OutputChunk, adding it to the web (36)`_ case self.cmdd: → `start a NamedChunk or NamedDocumentChunk, adding it to the web (37)`_ case self.cmdi: → `include another file (38)`_ case self.cmdrcurl \| self.cmdrbrak: → `finish a chunk, start a new Chunk adding it to the web (39)`_ case self.cmdpipe: → `assign user identifiers to the current chunk (40)`_ case self.cmdf: self.content[-1].commands.append(FileXrefCommand(self.location())) case self.cmdm: self.content[-1].commands.append(MacroXrefCommand(self.location())) case self.cmdu: self.content[-1].commands.append(UserIdXrefCommand(self.location())) case self.cmdlangl: → `add a reference command to the current chunk (41)`_ case self.cmdlexpr: → `add an expression command to the current chunk (43)`_ case self.cmdcmd: → `double at-sign replacement, append this character to previous TextCommand (44)`_ case self.cmdlcurl \| self.cmdlbrak: # These should have been consumed as part of @o and @d parsing self.logger.error("Extra %r (possibly missing chunk name) near %r", token, self.location()) self.errors += 1 case \_: return False # did not recogize the command return True # did recognize the command .. .. container:: small ∎ *WebReader handle a command string (35)*. Used by → `WebReader class - parses the input file, building the Web structure (34)`_. An output chunk has the form ``@o`` *name* ``@{`` *content* ``@}``. We use the first two tokens to name the ``OutputChunk``. We expect the ``@{`` separator. We then attach all subsequent commands to this chunk while waiting for the final ``@}`` token to end the chunk. We'll use an ``ArgumentParser`` to locate the optional parameters. This will then let us build an appropriate instance of ``OutputChunk``. With some small additional changes, we could use ``OutputChunk(**options)``. .. _`start an OutputChunk, adding it to the web (36)`: .. rubric:: start an OutputChunk, adding it to the web (36) = .. parsed-literal:: :class: code arg\_str = next(self.tokenizer) self.expect({self.cmdlcurl}) options = self.output\_option\_parser.parse\_args(shlex.split(arg\_str)) new\_chunk = OutputChunk( name=' '.join(options.argument), comment\_start=options.start if '-start' in options else "# ", comment\_end=options.end if '-end' in options else "", ) self.content.append(new\_chunk) # capture an OutputChunk up to @} .. .. container:: small ∎ *start an OutputChunk, adding it to the web (36)*. Used by → `WebReader handle a command string (35)`_. A named chunk has the form ``@d`` *name* ``@{`` *content* ``@}`` for code and ``@d`` *name* ``@[`` *content* ``@]`` for document source. We use the first two tokens to name the ``NamedChunk`` or ``NamedDocumentChunk``. We expect either the ``@{`` or ``@[`` separator, and use the actual token found to choose which subclass of ``Chunk`` to create. We then attach all subsequent commands to this chunk while waiting for the final ``@}`` or ``@]`` token to end the chunk. We'll use an ``ArgumentParser`` to locate the optional parameter of ``-noindent``. **TODO:** Extend this to support ``-indent`` *number* Then we can use the ``options`` value to create an appropriate subclass of ``NamedChunk``. If ```"-indent"`` is in options, this is the default. If both are in the options, we should provide a warning. **TODO:** Add a warning for conflicting options. .. _`start a NamedChunk or NamedDocumentChunk, adding it to the web (37)`: .. rubric:: start a NamedChunk or NamedDocumentChunk, adding it to the web (37) = .. parsed-literal:: :class: code arg\_str = next(self.tokenizer) brack = self.expect({self.cmdlcurl, self.cmdlbrak}) options = self.definition\_option\_parser.parse\_args(shlex.split(arg\_str)) name = ' '.join(options.argument) if brack == self.cmdlbrak: new\_chunk = NamedDocumentChunk(name) elif brack == self.cmdlcurl: if 'noindent' in options and options.noindent: new\_chunk = NamedChunk\_Noindent(name) else: new\_chunk = NamedChunk(name) elif brack == None: new\_chunk = None pass # Error already noted by \`\`expect()\`\` else: raise RuntimeError("Design Error") if new\_chunk: self.content.append(new\_chunk) # capture a NamedChunk up to @} or @] .. .. container:: small ∎ *start a NamedChunk or NamedDocumentChunk, adding it to the web (37)*. Used by → `WebReader handle a command string (35)`_. An import command has the unusual form of ``@i`` *name*, with no trailing separator. When we encounter the ``@i`` token, the next token will start with the file name, but may continue with an anonymous chunk. To avoid confusion, we require that all ``@i`` commands occur at the end of a line, The break on the ``'\n'`` which ends the file name. This permits file names with embedded spaces. It also permits arguments and options, if really necessary. Once we have split the file name away from the rest of the following anonymous chunk, we push the following token (a ``\n``) back into the token stream, so that it will be the first token examined at the top of the ``load()`` loop. We create a child ``WebReader`` instance to process the included file. The entire file is loaded into the current ``Web`` instance. A new, empty ``Chunk`` is created at the end of the file so that processing can resume with an anonymous ``Chunk``. The reader has a ``permitList`` attribute. This lists any commands where failure is permitted. Currently, only the ``@i`` command can be set to permit failure; this allows a ``.w`` to include a file that does not yet exist. The primary use case for this permitted error feature is when weaving test output. A first use of the **py-web-lp** can be used to tangle the program source files, ignoring a missing test output file, named in an ``@i`` command. The application can then be run to create the missing test output file. After this, a second use of the **py-web-lp** can weave the test output file into a final, complete document. .. _`include another file (38)`: .. rubric:: include another file (38) = .. parsed-literal:: :class: code incPath = Path(next(self.tokenizer).strip()) try: include = WebReader(parent=self) if not incPath.is\_absolute(): incPath = self.base\_path / incPath self.logger.info("Including '%s'", incPath) self.content.extend(include.load(incPath)) self.totalLines += include.tokenizer.lineNumber self.totalFiles += include.totalFiles if include.errors: self.errors += include.errors self.logger.error("Errors in included file '%s', output is incomplete.", incPath) except Error as e: self.logger.error("Problems with included file '%s', output is incomplete.", incPath) self.errors += 1 except IOError as e: self.logger.error("Problems finding included file '%s', output is incomplete.", incPath) # Discretionary -- sometimes we want to continue if self.cmdi in self.permitList: pass else: raise # Seems heavy-handed, but, the file wasn't found! # Start a new context for text or commands \*after\* the \`\`@i\`\`. self.content.append(Chunk()) .. .. container:: small ∎ *include another file (38)*. Used by → `WebReader handle a command string (35)`_. When a ``@}`` or ``@]`` are found, this finishes a named chunk. The next text is therefore part of an anonymous chunk. Note that no check is made to assure that the previous ``Chunk`` was indeed a named chunk or output chunk started with ``@{`` or ``@[``. To do this, an attribute would be needed for each ``Chunk`` subclass that indicated if a trailing bracket was necessary. For the base ``Chunk`` class, this would be false, but for all other subclasses of ``Chunk``, this would be true. .. _`finish a chunk, start a new Chunk adding it to the web (39)`: .. rubric:: finish a chunk, start a new Chunk adding it to the web (39) = .. parsed-literal:: :class: code # Start a new context for text or commands \*after\* this command. self.content.append(Chunk()) .. .. container:: small ∎ *finish a chunk, start a new Chunk adding it to the web (39)*. Used by → `WebReader handle a command string (35)`_. User identifiers occur after a ``@|`` command inside a ``NamedChunk``. Note that no check is made to assure that the previous ``Chunk`` was indeed a named chunk or output chunk started with ``@{``. To do this, an attribute would be needed for each ``Chunk`` subclass that indicated if user identifiers are permitted. For the base ``Chunk`` class, this would be false, but for the ``NamedChunk`` class and ``OutputChunk`` class, this would be true. User identifiers are name references at the end of a NamedChunk These are accumulated and expanded by ``@u`` reference .. _`assign user identifiers to the current chunk (40)`: .. rubric:: assign user identifiers to the current chunk (40) = .. parsed-literal:: :class: code try: names = next(self.tokenizer).strip().split() self.content[-1].def\_names.extend(names) except AttributeError: # Out of place @\| user identifier command self.logger.error("Unexpected references near %r: %r", self.location(), token) self.errors += 1 .. .. container:: small ∎ *assign user identifiers to the current chunk (40)*. Used by → `WebReader handle a command string (35)`_. A reference command has the form ``@<``\ *name*\ ``@>``. We accept three tokens from the input, the middle token is the referenced name. .. _`add a reference command to the current chunk (41)`: .. rubric:: add a reference command to the current chunk (41) = .. parsed-literal:: :class: code # get the name, introduce into the named Chunk dictionary name = next(self.tokenizer).strip() closing = self.expect({self.cmdrangl}) self.content[-1].commands.append(ReferenceCommand(name, self.location())) self.logger.debug("Reading %r %r", name, closing) .. .. container:: small ∎ *add a reference command to the current chunk (41)*. Used by → `WebReader handle a command string (35)`_. An expression command has the form ``@(``\ *Python Expression*\ ``@)``. We accept three tokens from the input, the middle token is the expression. There are two alternative semantics for an embedded expression. - **Deferred Execution**. This requires definition of a new subclass of ``Command``, ``ExpressionCommand``, and appends it into the current ``Chunk``. At weave and tangle time, this expression is evaluated. The insert might look something like this: ``aChunk.append(ExpressionCommand(expression, self.location()))``. - **Immediate Execution**. This simply creates a context and evaluates the Python expression. The output from the expression becomes a ``TextCommand``, and is append to the current ``Chunk``. We use the **Immediate Execution** semantics -- the expression is immediately appended to the current chunk's text. We provide a few elements of the ``os`` module. We provide ``os.path`` library. The ``os.getcwd()`` could be changed to ``os.path.realpath('.')``, but that seems too long-winded. .. _`Imports (42)`: .. rubric:: Imports (42) += .. parsed-literal:: :class: code import builtins import sys import platform .. .. container:: small ∎ *Imports (42)*. Used by → `pyweb.py (79)`_. **TODO:** Appening the text should be a method of a Chunk -- either append text, or append a command. .. _`add an expression command to the current chunk (43)`: .. rubric:: add an expression command to the current chunk (43) = .. parsed-literal:: :class: code # get the Python expression, create the expression result expression = next(self.tokenizer) self.expect({self.cmdrexpr}) try: # Build Context # \*\*TODO:\*\* Parts of this are static and can be built as part of \`\`\_\_init\_\_()\`\`. dangerous = { 'breakpoint', 'compile', 'eval', 'exec', 'execfile', 'globals', 'help', 'input', 'memoryview', 'open', 'print', 'super', '\_\_import\_\_' } safe = types.SimpleNamespace(\*\*dict( (name, obj) for name,obj in builtins.\_\_dict\_\_.items() if name not in dangerous )) globals = dict( \_\_builtins\_\_=safe, os=types.SimpleNamespace(path=os.path, getcwd=os.getcwd, name=os.name), time=time, datetime=datetime, platform=platform, theWebReader=self, theFile=self.filePath, thisApplication=sys.argv[0], \_\_version\_\_=\_\_version\_\_, # Legacy compatibility. Deprecated. version=\_\_version\_\_, theLocation=str(self.location()), # The only thing that's dynamic ) # Evaluate result = str(eval(expression, globals)) except Exception as exc: self.logger.error('Failure to process %r: exception is %r', expression, exc) self.errors += 1 result = f"@({expression!r}: Error {exc!r}@)" self.content[-1].add\_text(result, self.location()) .. .. container:: small ∎ *add an expression command to the current chunk (43)*. Used by → `WebReader handle a command string (35)`_. A double command sequence (``'@@'``, when the command is an ``'@'``) has the usual meaning of ``'@'`` in the input stream. We do this by appending text to the last command in the current ``Chunk``. This will append the character on the end of the most recent ``TextCommand`` or ``CodeCommand```; if this fails, it will create a new, empty ``TextCommand`` or ``CodeCommand``. **TODO:** This should be a method of a Chunk -- either append text, or append a command. .. _`double at-sign replacement, append this character to previous TextCommand (44)`: .. rubric:: double at-sign replacement, append this character to previous TextCommand (44) = .. parsed-literal:: :class: code self.logger.debug(f"double-command: {self.content[-1]=}") self.content[-1].add\_text(self.command, self.location()) .. .. container:: small ∎ *double at-sign replacement, append this character to previous TextCommand (44)*. Used by → `WebReader handle a command string (35)`_. The ``expect()`` method examines the next token to see if it is the expected item. ``'\n'`` are absorbed. If this is not found, a standard type of error message is raised. This is used by ``handleCommand()``. .. _`WebReader handle a command string (45)`: .. rubric:: WebReader handle a command string (45) += .. parsed-literal:: :class: code def expect(self, tokens: set[str]) -> str \| None: """Compare next token with expectation, quietly skipping whitespace (i.e., \`\`\\n\`\`).""" try: t = next(self.tokenizer) while t == '\\n': t = next(self.tokenizer) except StopIteration: self.logger.error("At %r: end of input, %r not found", self.location(), tokens) self.errors += 1 return None if t in tokens: return t else: self.logger.error("At %r: expected %r, found %r", self.location(), tokens, t) self.errors += 1 return None .. .. container:: small ∎ *WebReader handle a command string (45)*. Used by → `WebReader class - parses the input file, building the Web structure (34)`_. The ``location()`` provides the file name and line number. This allows error messages as well as tangled or woven output to correctly reference the original input files. .. _`WebReader location in the input stream (46)`: .. rubric:: WebReader location in the input stream (46) = .. parsed-literal:: :class: code def location(self) -> tuple[str, int]: return (str(self.filePath), self.tokenizer.lineNumber+1) .. .. container:: small ∎ *WebReader location in the input stream (46)*. Used by → `WebReader class - parses the input file, building the Web structure (34)`_. The ``load()`` method reads the entire input file as a sequence of tokens, split up by the ``Tokenizer``. Each token that appears to be a command is passed to the ``handleCommand()`` method. If the ``handleCommand()`` method returns a True result, the command was recognized and placed in the ``Web``. If ``handleCommand()`` returns a False result, the command was unknown, and we write a warning but treat it as text. The ``load()`` method is used recursively to handle the ``@i`` command. The issue is that it's always loading a single top-level web. .. _`Imports (47)`: .. rubric:: Imports (47) += .. parsed-literal:: :class: code from typing import TextIO, cast .. .. container:: small ∎ *Imports (47)*. Used by → `pyweb.py (79)`_. .. _`WebReader load the web (48)`: .. rubric:: WebReader load the web (48) = .. parsed-literal:: :class: code def load(self, filepath: Path, source: TextIO \| None = None) -> list[Chunk]: """Returns a flat list of chunks to be made into a Web. Also used to expand \`\`@i\`\` included files. """ self.filePath = filepath self.base\_path = self.filePath.parent if source: self.\_source = source self.parse\_source() else: with self.filePath.open() as self.\_source: self.parse\_source() return self.content def parse\_source(self) -> None: """Builds a sequence of Chunks.""" self.tokenizer = Tokenizer(self.\_source, self.command) self.totalFiles += 1 # Initial anonymous chunk. self.content = [Chunk()] for token in self.tokenizer: if len(token) >= 2 and token.startswith(self.command): if self.handleCommand(token): continue else: self.logger.error('Unknown @-command in input: %r near %r', token, self.location()) self.content[-1].add\_text(token, self.location()) elif token: # Accumulate a non-empty block of text in the current chunk. self.content[-1].add\_text(token, self.location()) else: # Whitespace pass self.logger.debug("parse\_source: [") for c in self.content: self.logger.debug(" %r", c) self.logger.debug("]") .. .. container:: small ∎ *WebReader load the web (48)*. Used by → `WebReader class - parses the input file, building the Web structure (34)`_. The command character can be changed to permit some flexibility when working with languages that make extensive use of the ``@`` symbol, i.e., PERL. The initialization of the ``WebReader`` is based on the selected command character. .. _`WebReader command literals (49)`: .. rubric:: WebReader command literals (49) = .. parsed-literal:: :class: code # Structural ("major") commands self.cmdo = self.command+'o' self.cmdd = self.command+'d' self.cmdlcurl = self.command+'{' self.cmdrcurl = self.command+'}' self.cmdlbrak = self.command+'[' self.cmdrbrak = self.command+']' self.cmdi = self.command+'i' # Inline ("minor") commands self.cmdlangl = self.command+'<' self.cmdrangl = self.command+'>' self.cmdpipe = self.command+'\|' self.cmdlexpr = self.command+'(' self.cmdrexpr = self.command+')' self.cmdcmd = self.command+self.command # Content "minor" commands self.cmdf = self.command+'f' self.cmdm = self.command+'m' self.cmdu = self.command+'u' .. .. container:: small ∎ *WebReader command literals (49)*. Used by → `WebReader class - parses the input file, building the Web structure (34)`_. The Tokenizer Class ~~~~~~~~~~~~~~~~~~~~ The ``WebReader`` requires a tokenizer. The tokenizer breaks the input text into a stream of tokens. There are two broad classes of tokens: - ``r'@.'`` command tokens, including the structural, inline, and content commands. - ``r'\n'``. Inside text, these matter. Within structural command tokens, these don't matter. Except after the filename after an ``@i`` command, where it ends the command. - The remaining text; neither newlines nor commands. The tokenizer works by reading the entire file and splitting on ``r'@.'`` patterns. The ``re.split()`` function will separate the input and preserve the actual character sequence on which the input was split. This breaks the input into blocks of text separated by the ``r'@.'`` characters. For example: .. parsed-literal:: >>> pat.split( "@{hi mom@}") ['', '@{', 'hi mom', '@}', ''] This tokenizer splits the input using ``r'@.|\n'``. The idea is that we locate commands, newlines and the interstitial text as three classes of tokens. We can then assemble each ``Command`` instance from a short sequence of tokens. The core ``TextCommand`` and ``CodeCommand`` will be a line of text ending with the ``\n``. The tokenizer counts newline characters for us, so that error messages can include a line number. Also, we can tangle extract comments into a file to reveal source line numbers. Since the tokenizer is a proper iterator, we can use ``tokens = iter(Tokenizer(source))`` and ``next(tokens)`` to step through the sequence of tokens until we raise a ``StopIteration`` exception. .. _`Imports (50)`: .. rubric:: Imports (50) += .. parsed-literal:: :class: code import re from collections.abc import Iterator, Iterable .. .. container:: small ∎ *Imports (50)*. Used by → `pyweb.py (79)`_. .. _`Tokenizer class - breaks input into tokens (51)`: .. rubric:: Tokenizer class - breaks input into tokens (51) = .. parsed-literal:: :class: code class Tokenizer(Iterator[str]): def \_\_init\_\_(self, stream: TextIO, command\_char: str='@') -> None: self.command = command\_char self.parsePat = re.compile(f'({self.command}.\|\\\\n)') self.token\_iter = (t for t in self.parsePat.split(stream.read()) if len(t) != 0) self.lineNumber = 0 def \_\_next\_\_(self) -> str: token = next(self.token\_iter) self.lineNumber += token.count('\\n') return token def \_\_iter\_\_(self) -> Iterator[str]: return self .. .. container:: small ∎ *Tokenizer class - breaks input into tokens (51)*. Used by → `Base Class Definitions (33)`_. Other Application Components ---------------------------- There are a number of other components: - `Error class`_ Defines a uniform exception for this module. - `Action Class Hierarchy`_ defines the actions the application can perform. This includes loading the WEB file, weaving, tangling, and doing combinations of actions. - `The Application Class`_ is a high-level definition of the **py-web-lp** application as a whole. - `Logging setup`_ defines a handy context manager to configure and shut down logging. - `The Main Function`_ is a top-level function to create an instance of ``Application``, and execute it with either supplied arguments or (by default) the actual command-line arguments. This makes it easy to import and reuse ``main()`` in other applications. Error class ~~~~~~~~~~~ An ``Error`` is raised whenever processing cannot continue. Since it is a subclass of Exception, it takes an arbitrary number of arguments. The first should be the basic message text. Subsequent arguments provide additional details. We will try to be sure that all of our internal exceptions reference a specific chunk, if possible. This means either including the chunk as an argument, or catching the exception and appending the current chunk to the exception's arguments. The Python ``raise`` statement takes an instance of ``Error`` and passes it to the enclosing ``try/except`` statement for processing. The typical creation is as follows: .. parsed-literal:: raise Error(f"No full name for {chunk.name!r}", chunk) A typical exception-handling suite might look like this: .. parsed-literal:: try: *...something that may raise an Error or Exception...* except Error as e: print(e.args) # this is a pyWeb internal Error except Exception as w: print(w.args) # this is some other Python Exception The ``Error`` class is a subclass of ``Exception`` used to differentiate application-specific exceptions from other Python exceptions. It does no additional processing, but merely creates a distinct class to facilitate writing ``except`` statements. .. _`Error class defines the errors raised (52)`: .. rubric:: Error class defines the errors raised (52) = .. parsed-literal:: :class: code class Error(Exception): pass .. .. container:: small ∎ *Error class defines the errors raised (52)*. Used by → `pyweb.py (79)`_. Action Class Hierarchy ~~~~~~~~~~~~~~~~~~~~~~ This application performs three major actions: loading the document web, weaving and tangling. Generally, the use case is to perform a load, weave and tangle. However, a less common use case is to first load and tangle output files, run a regression test and then load and weave a result that includes the test output file. The ``-x`` option excludes one of the two output actions. The ``-xw`` excludes the weave pass, doing only the tangle action. The ``-xt`` excludes the tangle pass, doing the weave action. This two pass action might be embedded in the following type of Python program. .. parsed-literal:: import pyweb, os, runpy, sys, pathlib, contextlib log = pathlib.Path("source.log") Tangler().emit(web) with log.open("w") as target: with contextlib.redirect_stdout(target): # run the app, capturing the output runpy.run_path('source.py') # Alternatives include using pytest or doctest Weaver().emit(web, "something.rst") The first step runs **py-web-lp** , excluding the final weaving pass. The second step runs the tangled program, ``source.py``, and produces test results in some log file, ``source.log``. The third step runs **py-web-lp** excluding the tangle pass. This produces a final document that includes the ``source.log`` test results. To accomplish this, we provide a class hierarchy that defines the various actions of the **py-web-lp** application. This class hierarchy defines an extensible set of fundamental actions. This gives us the flexibility to create a simple sequence of actions and execute any combination of these. It eliminates the need for a forest of ``if``-statements to determine precisely what will be done. Each action has the potential to update the state of the overall application. A partner with this command hierarchy is the Application class that defines the application options, inputs and results. .. _`Action class hierarchy used to describe actions of the application (53)`: .. rubric:: Action class hierarchy used to describe actions of the application (53) = .. parsed-literal:: :class: code → `Action superclass has common features of all actions (54)`_ → `ActionSequence subclass that holds a sequence of other actions (57)`_ → `WeaveAction subclass initiates the weave action (60)`_ → `TangleAction subclass initiates the tangle action (63)`_ → `LoadAction subclass loads the document web (66)`_ .. .. container:: small ∎ *Action class hierarchy used to describe actions of the application (53)*. Used by → `pyweb.py (79)`_. The ``Action`` class embodies the basic operations of **py-web-lp** . The intent of this hierarchy is to both provide an easily expanded method of adding new actions, but an easily specified list of actions for a particular run of **py-web-lp** . The overall process of the application is defined by an instance of ``Action``. This instance may be the ``WeaveAction`` instance, the ``TangleAction`` instance or a ``ActionSequence`` instance. The instance is constructed during parsing of the input parameters. Then the ``Action`` class ``perform()`` method is called to actually perform the action. There are three standard ``Action`` instances available: an instance that is a macro and does both tangling and weaving, an instance that excludes tangling, and an instance that excludes weaving. These correspond to the command-line options. .. parsed-literal:: anOp = SomeAction("name") anOp(*argparse.Namespace*) The ``Action`` is the superclass for all actions. An ``Action`` has a number of common attributes. :name: A name for this action. :options: The ``argparse.Namespace`` object. A LoadAction will update this with the ``Web`` object that was loaded. !start: The time at which the action started. .. _`Action superclass has common features of all actions (54)`: .. rubric:: Action superclass has common features of all actions (54) = .. parsed-literal:: :class: code class Action: """An action performed by pyWeb.""" start: float options: argparse.Namespace def \_\_init\_\_(self, name: str) -> None: self.name = name self.logger = logging.getLogger(self.\_\_class\_\_.\_\_qualname\_\_) def \_\_str\_\_(self) -> str: return f"{self.name!s} [{self.options.web!s}]" → `Action call method actually does the real work (55)`_ → `Action final summary of what was done (56)`_ .. .. container:: small ∎ *Action superclass has common features of all actions (54)*. Used by → `Action class hierarchy used to describe actions of the application (53)`_. The ``__call__()`` method does the real work of the action. For the superclass, it merely logs a message. This is overridden by a subclass. .. _`Action call method actually does the real work (55)`: .. rubric:: Action call method actually does the real work (55) = .. parsed-literal:: :class: code def \_\_call\_\_(self, options: argparse.Namespace) -> None: self.logger.info("Starting %s", self.name) self.options = options self.start = time.process\_time() .. .. container:: small ∎ *Action call method actually does the real work (55)*. Used by → `Action superclass has common features of all actions (54)`_. The ``summary()`` method returns some basic processing statistics for this action. .. _`Action final summary of what was done (56)`: .. rubric:: Action final summary of what was done (56) = .. parsed-literal:: :class: code def duration(self) -> float: """Return duration of the action.""" return (self.start and time.process\_time()-self.start) or 0 def summary(self) -> str: return f"{self.name!s} in {self.duration():0.3f} sec." .. .. container:: small ∎ *Action final summary of what was done (56)*. Used by → `Action superclass has common features of all actions (54)`_. ActionSequence Class ~~~~~~~~~~~~~~~~~~~~ A ``ActionSequence`` defines a composite action; it is a sequence of other actions. When the macro is performed, it delegates to the sub-actions. The instance is created during parsing of input parameters. An instance of this class is one of the three standard actions available; it generally is the default, "do everything" action. This class overrides the ``perform()`` method of the superclass. It also adds an ``append()`` method that is used to construct the sequence of actions. .. _`ActionSequence subclass that holds a sequence of other actions (57)`: .. rubric:: ActionSequence subclass that holds a sequence of other actions (57) = .. parsed-literal:: :class: code class ActionSequence(Action): """An action composed of a sequence of other actions.""" def \_\_init\_\_(self, name: str, opSequence: list[Action] \| None = None) -> None: super().\_\_init\_\_(name) if opSequence: self.opSequence = opSequence else: self.opSequence = [] def \_\_str\_\_(self) -> str: return "; ".join([str(x) for x in self.opSequence]) → `ActionSequence call method delegates the sequence of ations (58)`_ → `ActionSequence summary summarizes each step (59)`_ .. .. container:: small ∎ *ActionSequence subclass that holds a sequence of other actions (57)*. Used by → `Action class hierarchy used to describe actions of the application (53)`_. Since the macro ``__call__()`` method delegates to other Actions, it is possible to short-cut argument processing by using the Python ``*args`` construct to accept all arguments and pass them to each sub-action. .. _`ActionSequence call method delegates the sequence of ations (58)`: .. rubric:: ActionSequence call method delegates the sequence of ations (58) = .. parsed-literal:: :class: code def \_\_call\_\_(self, options: argparse.Namespace) -> None: super().\_\_call\_\_(options) for o in self.opSequence: o(self.options) .. .. container:: small ∎ *ActionSequence call method delegates the sequence of ations (58)*. Used by → `ActionSequence subclass that holds a sequence of other actions (57)`_. The ``summary()`` method returns some basic processing statistics for each step of this action. .. _`ActionSequence summary summarizes each step (59)`: .. rubric:: ActionSequence summary summarizes each step (59) = .. parsed-literal:: :class: code def summary(self) -> str: return ", ".join([o.summary() for o in self.opSequence]) .. .. container:: small ∎ *ActionSequence summary summarizes each step (59)*. Used by → `ActionSequence subclass that holds a sequence of other actions (57)`_. WeaveAction Class ~~~~~~~~~~~~~~~~~~ The ``WeaveAction`` defines the action of weaving. This action logs a message, and invokes the ``weave()`` method of the ``Web`` instance. This method also includes the basic decision on which weaver to use. If a ``Weaver`` was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected. This class overrides the ``__call__()`` method of the superclass. If the options include ``theWeaver``, that ``Weaver`` instance will be used. Otherwise, the ``web.language()`` method function is used to guess what weaver to use. .. _`WeaveAction subclass initiates the weave action (60)`: .. rubric:: WeaveAction subclass initiates the weave action (60) = .. parsed-literal:: :class: code class WeaveAction(Action): """Weave the final document.""" def \_\_init\_\_(self) -> None: super().\_\_init\_\_("Weave") def \_\_str\_\_(self) -> str: return f"{self.name!s} [{self.options.web!s}, {self.options.theWeaver!s}]" → `WeaveAction call method to pick the language (61)`_ → `WeaveAction summary of language choice (62)`_ .. .. container:: small ∎ *WeaveAction subclass initiates the weave action (60)*. Used by → `Action class hierarchy used to describe actions of the application (53)`_. The language is picked just prior to weaving. It is either (1) the language specified on the command line, or, (2) if no language was specified, a language is selected based on the first few characters of the input. Weaving can only raise an exception when there is a reference to a chunk that is never defined. .. _`WeaveAction call method to pick the language (61)`: .. rubric:: WeaveAction call method to pick the language (61) = .. parsed-literal:: :class: code def \_\_call\_\_(self, options: argparse.Namespace) -> None: super().\_\_call\_\_(options) if not self.options.weaver: # Examine first few chars of first chunk of web to determine language self.options.weaver = self.options.web.language() self.logger.info("Using %s", self.options.theWeaver) self.options.theWeaver.output = self.options.output try: self.options.theWeaver.set\_markup(self.options.weaver) self.options.theWeaver.emit(self.options.web) self.logger.info("Finished Normally") except Error as e: self.logger.error("Problems weaving document from %r (weave file is faulty).", self.options.web.web\_path) #raise .. .. container:: small ∎ *WeaveAction call method to pick the language (61)*. Used by → `WeaveAction subclass initiates the weave action (60)`_. The ``summary()`` method returns some basic processing statistics for the weave action. .. _`WeaveAction summary of language choice (62)`: .. rubric:: WeaveAction summary of language choice (62) = .. parsed-literal:: :class: code def summary(self) -> str: if self.options.theWeaver and self.options.theWeaver.linesWritten > 0: return ( f"{self.name!s} {self.options.theWeaver.linesWritten:d} lines in {self.duration():0.3f} sec." ) return f"did not {self.name!s}" .. .. container:: small ∎ *WeaveAction summary of language choice (62)*. Used by → `WeaveAction subclass initiates the weave action (60)`_. TangleAction Class ~~~~~~~~~~~~~~~~~~~ The ``TangleAction`` defines the action of tangling. This operation logs a message, and invokes the ``weave()`` method of the ``Web`` instance. This method also includes the basic decision on which weaver to use. If a ``Weaver`` was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected. This class overrides the ``__call__()`` method of the superclass. The options **must** include ``theTangler``, with the ``Tangler`` instance to be used. .. _`TangleAction subclass initiates the tangle action (63)`: .. rubric:: TangleAction subclass initiates the tangle action (63) = .. parsed-literal:: :class: code class TangleAction(Action): """Tangle source files.""" def \_\_init\_\_(self) -> None: super().\_\_init\_\_("Tangle") → `TangleAction call method does tangling of the output files (64)`_ → `TangleAction summary method provides total lines tangled (65)`_ .. .. container:: small ∎ *TangleAction subclass initiates the tangle action (63)*. Used by → `Action class hierarchy used to describe actions of the application (53)`_. Tangling can only raise an exception when a cross reference request (``@f``, ``@m`` or ``@u``) occurs in a program code chunk. Program code chunks are defined with any of ``@d`` or ``@o`` and use ``@{`` ``@}`` brackets. .. _`TangleAction call method does tangling of the output files (64)`: .. rubric:: TangleAction call method does tangling of the output files (64) = .. parsed-literal:: :class: code def \_\_call\_\_(self, options: argparse.Namespace) -> None: super().\_\_call\_\_(options) self.options.theTangler.include\_line\_numbers = self.options.tangler\_line\_numbers self.options.theTangler.output = self.options.output try: self.options.theTangler.emit(self.options.web) except Error as e: self.logger.error("Problems tangling outputs from %r (tangle files are faulty).", self.options.web.web\_path) #raise .. .. container:: small ∎ *TangleAction call method does tangling of the output files (64)*. Used by → `TangleAction subclass initiates the tangle action (63)`_. The ``summary()`` method returns some basic processing statistics for the tangle action. .. _`TangleAction summary method provides total lines tangled (65)`: .. rubric:: TangleAction summary method provides total lines tangled (65) = .. parsed-literal:: :class: code def summary(self) -> str: if self.options.theTangler and self.options.theTangler.linesWritten > 0: return ( f"{self.name!s} {self.options.theTangler.totalLines:d} lines in {self.duration():0.3f} sec." ) return f"did not {self.name!r}" .. .. container:: small ∎ *TangleAction summary method provides total lines tangled (65)*. Used by → `TangleAction subclass initiates the tangle action (63)`_. LoadAction Class ~~~~~~~~~~~~~~~~~~ The ``LoadAction`` defines the action of loading the web structure. This action uses the application's ``webReader`` to actually do the load. An instance is created during parsing of the input parameters. An instance of this class is part of any of the weave, tangle and "do everything" action. This class overrides the ``__call__()`` method of the superclass. The options **must** include ``webReader``, with the ``WebReader`` instance to be used. .. _`LoadAction subclass loads the document web (66)`: .. rubric:: LoadAction subclass loads the document web (66) = .. parsed-literal:: :class: code class LoadAction(Action): """Load the source web.""" def \_\_init\_\_(self) -> None: super().\_\_init\_\_("Load") def \_\_str\_\_(self) -> str: return f"Load [{self.webReader!s}, {self.options.web!s}]" → `LoadAction call method loads the input files (67)`_ → `LoadAction summary provides lines read (68)`_ .. .. container:: small ∎ *LoadAction subclass loads the document web (66)*. Used by → `Action class hierarchy used to describe actions of the application (53)`_. Trying to load the web involves two steps, either of which can raise exceptions due to incorrect inputs. 1. The ``WebReader`` class ``load()`` method can raise exceptions for a number of syntax errors as well as OS errors. - Missing closing brackets (``@}``, ``@]`` or ``@>``). - Missing opening bracket (``@{`` or ``@[``) after a chunk name (``@d`` or ``@o``). - Extra brackets (``@{``, ``@[``, ``@}``, ``@]``). - Extra ``@|``. - The input file does not exist or is not readable. 2. The ``Web`` class ``createUsedBy()`` method can raise an exception when a chunk reference cannot be resolved to a named chunk. .. _`LoadAction call method loads the input files (67)`: .. rubric:: LoadAction call method loads the input files (67) = .. parsed-literal:: :class: code def \_\_call\_\_(self, options: argparse.Namespace) -> None: super().\_\_call\_\_(options) self.webReader = self.options.webReader self.webReader.command = self.options.command self.webReader.permitList = self.options.permitList self.logger.debug("Reader Class %s", self.webReader.\_\_class\_\_.\_\_name\_\_) error = f"Problems with source file {self.options.source\_path!r}, no output produced." try: chunks = self.webReader.load(self.options.source\_path) if self.webReader.errors != 0: raise Error("Syntax Errors in the Web") self.logger.debug("Read %d Chunks", len(chunks)) self.options.web = Web(chunks) self.options.web.web\_path = self.options.source\_path self.logger.debug("Web contains %3d chunks", len(self.options.web.chunks)) self.logger.debug("Web defines %3d files", len(self.options.web.files)) self.logger.debug("Web defines %3d macros", len(self.options.web.macros)) self.logger.debug("Web defines %3d names", len(self.options.web.userids)) except Error as e: self.logger.error(error) raise # Could not be parsed or built. except IOError as e: self.logger.error(error) raise .. .. container:: small ∎ *LoadAction call method loads the input files (67)*. Used by → `LoadAction subclass loads the document web (66)`_. The ``summary()`` method returns some basic processing statistics for the load action. .. _`LoadAction summary provides lines read (68)`: .. rubric:: LoadAction summary provides lines read (68) = .. parsed-literal:: :class: code def summary(self) -> str: return ( f"{self.name!s} {self.webReader.totalLines:d} lines from {self.webReader.totalFiles:d} files in {self.duration():0.3f} sec." ) .. .. container:: small ∎ *LoadAction summary provides lines read (68)*. Used by → `LoadAction subclass loads the document web (66)`_. The Application Class ~~~~~~~~~~~~~~~~~~~~~ The ``Application`` class is provided so that the ``Action`` instances have an overall application to update. This allows the ``WeaveAction`` to provide the selected ``Weaver`` instance to the application. It also provides a central location for the various options and alternatives that might be accepted from the command line. The constructor creates a default ``argparse.Namespace`` with values suitable for weaving and tangling. The ``parseArgs()`` method uses the ``sys.argv`` sequence to parse the command line arguments and update the options. This allows a program to pre-process the arguments, passing other arguments to this module. The ``process()`` method processes a list of files. This is either the list of files passed as an argument, or it is the list of files parsed by the ``parseArgs()`` method. The ``parseArgs()`` and process() functions are separated so that another application can ``import pyweb``, bypass command-line parsing, yet still perform the basic actionss simply and consistently. For example: .. parsed-literal:: import pyweb, argparse p = argparse.ArgumentParser() *argument definition* config = p.parse_args() a = pyweb.Application() *Configure the Application based on options* a.process(config) The ``main()`` function creates an ``Application`` instance and calls the ``parseArgs()`` and ``process()`` methods to provide the expected default behavior for this module when it is used as the main program. The configuration can be either a ``types.SimpleNamespace`` or an ``argparse.Namespace`` instance. .. _`Imports (69)`: .. rubric:: Imports (69) += .. parsed-literal:: :class: code import argparse import shlex .. .. container:: small ∎ *Imports (69)*. Used by → `pyweb.py (79)`_. .. _`Application Class for overall CLI operation (70)`: .. rubric:: Application Class for overall CLI operation (70) = .. parsed-literal:: :class: code class Application: def \_\_init\_\_(self, base\_config: dict[str, Any] \| None = None) -> None: self.logger = logging.getLogger(self.\_\_class\_\_.\_\_qualname\_\_) → `Application default options (71)`_ → `Application parse command line (72)`_ → `Application class process all files (73)`_ .. .. container:: small ∎ *Application Class for overall CLI operation (70)*. Used by → `pyweb.py (79)`_. The first part of parsing the command line is setting default values that apply when parameters are omitted. The default values are set as follows: :defaults: A default configuration. :webReader: is the ``WebReader`` instance created for the current input file. :doWeave: instance of ``Action`` that does weaving only. :doTangle: instance of ``Action`` that does tangling only. :theAction: is an instance of ``Action`` that describes the default overall action: load, tangle and weave. This is the default unless overridden by an option. Here are the configuration values. These are attributes of the ``argparse.namespace`` default as well as the updated namespace returned by ``parseArgs()``. :verbosity: Either ``logging.INFO``, ``logging.WARN`` or ``logging.DEBUG`` :command: is set to ``@`` as the default command introducer. :permit: The raw list of permitted command characters, perhaps ``'i'``. :permitList: provides a list of commands that are permitted to fail. Typically this is empty, or contains ``@i`` to allow the include command to fail. :files: is the final list of argument files from the command line; these will be processed unless overridden in the call to ``process()``. !skip: a list of steps to skip: perhaps ``'w'`` or ``'t'`` to skip weaving or tangling. :weaver: the short name of the weaver. :theTangler: is set to a ``TanglerMake`` instance to create the output files. :theWeaver: is set to an instance of a subclass of ``Weaver`` based on ``weaver`` .. _`Application default options (71)`: .. rubric:: Application default options (71) = .. parsed-literal:: :class: code self.defaults = argparse.Namespace( verbosity=logging.INFO, command='@', weaver='rst', skip='', # Don't skip any steps permit='', # Don't tolerate missing includes reference='s', # Simple references tangler\_line\_numbers=False, output=Path.cwd(), ) # Primitive Actions self.loadOp = LoadAction() self.weaveOp = WeaveAction() self.tangleOp = TangleAction() # Composite Actions self.doWeave = ActionSequence("load and weave", [self.loadOp, self.weaveOp]) self.doTangle = ActionSequence("load and tangle", [self.loadOp, self.tangleOp]) self.theAction = ActionSequence("load, tangle and weave", [self.loadOp, self.tangleOp, self.weaveOp]) .. .. container:: small ∎ *Application default options (71)*. Used by → `Application Class for overall CLI operation (70)`_. The algorithm for parsing the command line parameters uses the built in ``argparse`` module. We have to build a parser, define the options, and the parse the command-line arguments, updating the default namespace. We further expand on the arguments. This transforms simple strings into object instances. .. _`Application parse command line (72)`: .. rubric:: Application parse command line (72) = .. parsed-literal:: :class: code def parseArgs(self, argv: list[str]) -> argparse.Namespace: p = argparse.ArgumentParser() p.add\_argument("-v", "--verbose", dest="verbosity", action="store\_const", const=logging.INFO) p.add\_argument("-s", "--silent", dest="verbosity", action="store\_const", const=logging.WARN) p.add\_argument("-d", "--debug", dest="verbosity", action="store\_const", const=logging.DEBUG) p.add\_argument("-c", "--command", dest="command", action="store") p.add\_argument("-w", "--weaver", dest="weaver", action="store") p.add\_argument("-x", "--except", dest="skip", action="store", choices=('w', 't')) p.add\_argument("-p", "--permit", dest="permit", action="store") p.add\_argument("-n", "--linenumbers", dest="tangler\_line\_numbers", action="store\_true") p.add\_argument("-o", "--output", dest="output", action="store", type=Path) p.add\_argument("-V", "--Version", action='version', version=f"py-web-lp pyweb.py {\_\_version\_\_}") p.add\_argument("files", nargs='+', type=Path) config = p.parse\_args(argv, namespace=self.defaults) self.expand(config) return config def expand(self, config: argparse.Namespace) -> argparse.Namespace: """Translate the argument values from simple text to useful objects. Weaver. Tangler. WebReader. """ # Weaver & Tangler config.theWeaver = Weaver(config.output) config.theTangler = TanglerMake(config.output) if config.permit: # save permitted errors, usual case is \`\`-pi\`\` to permit \`\`@i\`\` include errors config.permitList = [f'{config.command!s}{c!s}' for c in config.permit] else: config.permitList = [] config.webReader = WebReader() return config .. .. container:: small ∎ *Application parse command line (72)*. Used by → `Application Class for overall CLI operation (70)`_. The ``process()`` function uses the current ``Application`` settings to process each file as follows: 1. Create a new ``WebReader`` for the ``Application``, providing the parameters required to process the input file. 2. Create a ``Web`` instance, *w* and set the Web's *sourceFileName* from the WebReader's *filePath*. 3. Perform the given command, typically a ``ActionSequence``, which does some combination of load, tangle the output files and weave the final document in the target language; if necessary, examine the ``Web`` to determine the documentation language. 4. Print a performance summary line that shows lines processed per second. In the event of failure in any of the major processing steps, a summary message is produced, to clarify the state of the output files, and the exception is reraised. The re-raising is done so that all exceptions are handled by the outermost main program. .. _`Application class process all files (73)`: .. rubric:: Application class process all files (73) = .. parsed-literal:: :class: code def process(self, config: argparse.Namespace) -> None: root = logging.getLogger() root.setLevel(config.verbosity) self.logger.debug("Setting root log level to %r", logging.getLevelName(root.getEffectiveLevel())) if config.command: self.logger.debug("Command character %r", config.command) if config.skip: if config.skip.lower().startswith('w'): # not weaving == tangling self.theAction = self.doTangle elif config.skip.lower().startswith('t'): # not tangling == weaving self.theAction = self.doWeave else: raise Exception(f"Unknown -x option {config.skip!r}") for f in config.files: self.logger.info("%s %s %r", self.theAction.name, \_\_version\_\_, f) config.source\_path = f self.theAction(config) self.logger.info(self.theAction.summary()) .. .. container:: small ∎ *Application class process all files (73)*. Used by → `Application Class for overall CLI operation (70)`_. Logging Setup ~~~~~~~~~~~~~~~~~~~~~ We'll create a logging context manager. This allows us to wrap the ``main()`` function in an explicit ``with`` statement that assures that logging is configured and cleaned up politely. .. _`Imports (74)`: .. rubric:: Imports (74) += .. parsed-literal:: :class: code import logging import logging.config .. .. container:: small ∎ *Imports (74)*. Used by → `pyweb.py (79)`_. This has two configuration approaches. If a positional argument is given, that dictionary is used for ``logging.config.dictConfig``. Otherwise, keyword arguments are provided to ``logging.basicConfig``. A subclass might properly load a dictionary encoded in YAML and use that with ``logging.config.dictConfig``. .. _`Logging Setup (75)`: .. rubric:: Logging Setup (75) = .. parsed-literal:: :class: code class Logger: def \_\_init\_\_(self, dict\_config: dict[str, Any] \| None = None, \*\*kw\_config: Any) -> None: self.dict\_config = dict\_config self.kw\_config = kw\_config def \_\_enter\_\_(self) -> "Logger": if self.dict\_config: logging.config.dictConfig(self.dict\_config) else: logging.basicConfig(\*\*self.kw\_config) return self def \_\_exit\_\_(self, \*args: Any) -> Literal[False]: logging.shutdown() return False .. .. container:: small ∎ *Logging Setup (75)*. Used by → `pyweb.py (79)`_. Here's a sample logging setup. This creates a simple console handler and a formatter that matches the ``basicConfig`` formatter. This configuration dictionary defines the root logger plus some overrides for class loggers that might be used to gather additional information. .. _`Logging Setup (76)`: .. rubric:: Logging Setup (76) += .. parsed-literal:: :class: code default\_logging\_config = { 'version': 1, 'disable\_existing\_loggers': False, # Allow pre-existing loggers to work. 'style': '{', 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'stream': 'ext://sys.stderr', 'formatter': 'basic', }, }, 'formatters': { 'basic': { 'format': "{levelname}:{name}:{message}", 'style': "{", } }, 'root': {'handlers': ['console'], 'level': logging.INFO,}, # For specific debugging support... 'loggers': { 'Weaver': {'level': logging.INFO}, 'WebReader': {'level': logging.INFO}, 'Tangler': {'level': logging.INFO}, 'TanglerMake': {'level': logging.INFO}, 'indent.TanglerMake': {'level': logging.INFO}, 'Web': {'level': logging.INFO}, # Unit test requires this... 'ReferenceCommand': {'level': logging.INFO}, }, } .. .. container:: small ∎ *Logging Setup (76)*. Used by → `pyweb.py (79)`_. The above is wired into the application as a default. Exposing this via a configuration file is better. .. _`pyweb.toml (77)`: .. rubric:: pyweb.toml (77) = .. parsed-literal:: :class: code [pyweb] # PyWeb options go here. [logging] version = 1 disable\_existing\_loggers = false [logging.root] handlers = [ "console",] level = "INFO" [logging.handlers.console] class = "logging.StreamHandler" stream = "ext://sys.stderr" formatter = "basic" [logging.formatters.basic] format = "{levelname}:{name}:{message}" style = "{" [logging.loggers.Weaver] level = "INFO" [logging.loggers.WebReader] level = "INFO" [logging.loggers.Tangler] level = "INFO" [logging.loggers.TanglerMake] level = "INFO" [logging.loggers.indent.TanglerMake] level = "INFO" [logging.loggers.ReferenceCommand] # Unit test requires this... level = "INFO" .. .. container:: small ∎ *pyweb.toml (77)*. We can load this with something like the following: .. parsed-literal:: config_path = Path("pyweb.toml") with config_path.open('rb') as config_file: config = toml.load(config_file) log_config = config.get('logging', {'version': 1, level=logging.INFO}) This makes it slightly easier to add and change debuging alternatives. Rather then use the ``-v`` and ``-d`` options, the ``pyweb.toml`` provides a complete logging config. Also, we might want a decorator to define loggers more consistently for each class definition. The Main Function ~~~~~~~~~~~~~~~~~~~~~ The top-level interface is the ``main()`` function. This function creates an ``Application`` instance. The ``Application`` object parses the command-line arguments. Then the ``Application`` object does the requested processing. This two-step process allows for some dependency injection to customize argument processing. We might also want to parse a logging configuration file, as well as a weaver template configuration file. .. _`Interface Functions (78)`: .. rubric:: Interface Functions (78) = .. parsed-literal:: :class: code def main(argv: list[str] = sys.argv[1:], base\_config: dict[str, Any] \| None=None) -> None: a = Application(base\_config) config = a.parseArgs(argv) a.process(config) .. .. container:: small ∎ *Interface Functions (78)*. Used by → `pyweb.py (79)`_. **pyWeb** Module File ------------------------ The **pyWeb** application file is shown below: .. _`pyweb.py (79)`: .. rubric:: pyweb.py (79) = .. parsed-literal:: :class: code → `Overheads (81)`_ → `Imports (2)`_ → `Error class defines the errors raised (52)`_ → `Base Class Definitions (1)`_ → `Action class hierarchy used to describe actions of the application (53)`_ → `Application Class for overall CLI operation (70)`_ → `Logging Setup (75)`_ → `Interface Functions (78)`_ if \_\_name\_\_ == "\_\_main\_\_": config\_paths = Path("pyweb.toml"), Path.home()/"pyweb.toml" base\_config: dict[str, Any] = {} for cp in config\_paths: if cp.exists(): with cp.open('rb') as config\_file: base\_config = toml.load(config\_file) break log\_config = base\_config.get('logging', default\_logging\_config) with Logger(log\_config): main(base\_config=base\_config.get('pyweb', {})) .. .. container:: small ∎ *pyweb.py (79)*. The `Overheads`_ are described below, they include things like: - shell escape - doc string - ``__version__`` setting `Python Library Imports`_ are actually scattered in various places in this description. The more important elements are described in separate sections: - Base Class Definitions - Application Class and Main Functions - Interface Functions Python Library Imports ~~~~~~~~~~~~~~~~~~~~~~~ Numerous Python library modules are used by this application. A few are listed here because they're used widely. Others are listed closer to where they're referenced. - The ``os`` module provide os-specific file and path manipulations; it is used to transform the input file name into the output file name as well as track down file modification times. - The ``time`` module provides a handy current-time string; this is used to by the HTML Weaver to write a closing timestamp on generated HTML files, as well as log messages. - The ``datetime`` module is used to format times, phasing out use of ``time``. - The ``types`` module is used to get at ``SimpleNamespace`` for configuration. .. _`Imports (80)`: .. rubric:: Imports (80) += .. parsed-literal:: :class: code import os import time import datetime import sys import types if sys.version\_info[:2] <= (3, 10): import tomli as toml else: import tomllib as toml .. .. container:: small ∎ *Imports (80)*. Used by → `pyweb.py (79)`_. Note that ``os.path``, ``time``, ``datetime`` and ``platform``` are provided in the expression context. Overheads ~~~~~~~~~~~~ The shell escape is provided so that the user can define this file as executable, and launch it directly from their shell. The shell reads the first line of a file; when it finds the ``'#!'`` shell escape, the remainder of the line is taken as the path to the binary program that should be run. The shell runs this binary, providing the file as standard input. .. _`Overheads (81)`: .. rubric:: Overheads (81) = .. parsed-literal:: :class: code #!/usr/bin/env python .. .. container:: small ∎ *Overheads (81)*. Used by → `pyweb.py (79)`_. A Python ``__doc__`` string provides a standard vehicle for documenting the module or the application program. The usual style is to provide a one-sentence summary on the first line. This is followed by more detailed usage information. .. _`Overheads (82)`: .. rubric:: Overheads (82) += .. parsed-literal:: :class: code """py-web-lp Literate Programming. Yet another simple literate programming tool derived from \*\*nuweb\*\*, implemented entirely in Python. With a suitable configuration, this weaves documents with any markup language, and tangles source files for any programming language. """ .. .. container:: small ∎ *Overheads (82)*. Used by → `pyweb.py (79)`_. The keyword cruft is a standard way of placing version control information into a Python module so it is preserved. See PEP (Python Enhancement Proposal) #8 for information on recommended styles. We also sneak in a "DO NOT EDIT" warning that belongs in all generated application source files. .. _`Overheads (83)`: .. rubric:: Overheads (83) += .. parsed-literal:: :class: code \_\_version\_\_ = """3.2""" ### DO NOT EDIT THIS FILE! ### It was created by pyweb.py, \_\_version\_\_='3.2'. ### From source impl.w modified Tue Jul 11 09:03:19 2023. ### In working directory '/Users/slott/Documents/Projects/py-web-tool/src'. .. .. container:: small ∎ *Overheads (83)*. Used by → `pyweb.py (79)`_. This can be extended by doing something like the following. 1. Subclass ``Weaver`` create a subclass with different templates. 2. Update the ``pyweb.weavers`` dictionary. 3. Call ``pyweb.main()`` to run the existing main program with extra classes available to it. .. parsed-literal:: import pyweb class MyWeaver(HTML): *Any template changes* pyweb.weavers['myweaver']= MyWeaver() pyweb.main() This will create a variant on **py-web-lp** that will handle a different weaver via the command-line option ``-w myweaver``. .. py-web-tool/src/test.w Unit Tests =========== The ``tests`` directory includes ``pyweb_test.w``, which will create a complete test suite. This source will weaves a ``pyweb_test.html`` file. See `tests/pyweb_test.html `_. This source will tangle several test modules: ``test.py``, ``test_tangler.py``, ``test_weaver.py``, ``test_loader.py``, ``test_unit.py``, and ``test_scripts.py``. Use **pytest** to discover and run all 80+ test cases. Here's a script that works out well for running this without disturbing the development environment. The ``PYTHONPATH`` setting is essential to support importing ``pyweb``. .. parsed-literal:: python pyweb.py -o tests tests/pyweb_test.w PYTHONPATH=$(PWD) pytest Note that the last line really does set an environment variable and run the ``pytest`` tool on a single line. .. py-web-tool/src/scripts.w Handy Scripts and Other Files ================================================= Two aditional scripts, ``tangle.py`` and ``weave.py``, are provided as examples which can be customized and extended. ``tangle.py`` Script --------------------- This script shows a simple version of Tangling. This has a permitted error for '@i' commands to allow an include file (for example test results) to be omitted from the tangle operation. Note the general flow of this top-level script. 1. Create the logging context. 2. Create the options. This hard-coded object is a stand-in for parsing command-line options. 3. Create the web object. 4. For each action (``LoadAction`` and ``TangleAction`` in this example) Set the web, set the options, execute the callable action, and write a summary. .. _`tangle.py (84)`: .. rubric:: tangle.py (84) = .. parsed-literal:: :class: code #!/usr/bin/env python3 """Sample tangle.py script.""" import argparse import logging from pathlib import Path import pyweb def main(source: Path) -> None: with pyweb.Logger(pyweb.default\_logging\_config): logger = logging.getLogger(\_\_file\_\_) options = argparse.Namespace( source\_path=source, output=source.parent, verbosity=logging.INFO, command='@', permitList=['@i'], tangler\_line\_numbers=False, webReader=pyweb.WebReader(), theTangler=pyweb.TanglerMake(), ) for action in pyweb.LoadAction(), pyweb.TangleAction(): action(options) logger.info(action.summary()) if \_\_name\_\_ == "\_\_main\_\_": main(Path("examples/test\_rst.w")) .. .. container:: small ∎ *tangle.py (84)*. ``weave.py`` Script --------------------- This script shows a simple version of Weaving. This shows how to define a customized set of templates for a different markup language. A customized weaver generally has three parts. .. _`weave.py (85)`: .. rubric:: weave.py (85) = .. parsed-literal:: :class: code → `weave.py overheads for correct operation of a script (86)`_ → `weave.py custom weaver definition to customize the Weaver being used (87)`_ → `weaver.py processing: load and weave the document (88)`_ .. .. container:: small ∎ *weave.py (85)*. .. _`weave.py overheads for correct operation of a script (86)`: .. rubric:: weave.py overheads for correct operation of a script (86) = .. parsed-literal:: :class: code #!/usr/bin/env python3 """Sample weave.py script.""" import argparse import logging import string from pathlib import Path from textwrap import dedent import pyweb .. .. container:: small ∎ *weave.py overheads for correct operation of a script (86)*. Used by → `weave.py (85)`_. To override templates, a class needs to provide a text definition of the Jinja ``{% macro %}`` definitions. This is used to update the superclass ``template_map``. Something like the following sets the macros in use. .. parsed-literal:: self.template_map['html_macros'] = my_templates Any macro **not** defined gets a default implementation. .. _`weave.py custom weaver definition to customize the Weaver being used (87)`: .. rubric:: weave.py custom weaver definition to customize the Weaver being used (87) = .. parsed-literal:: :class: code class MyHTML(pyweb.Weaver): bootstrap\_html = dedent(""" {%- macro begin\_code(chunk) %}

{{chunk.full\_name or chunk.name}} ({{chunk.seq}}) {% if chunk.initial %}={% else %}+={% endif %}


        {%- endmacro -%}
    
        {%- macro end\_code(chunk) %}
            
{% endmacro -%} """) def \_\_init\_\_(self, output: Path = Path.cwd()) -> None: super().\_\_init\_\_(output) self.template\_map = pyweb.Weaver.template\_map \| {"html\_macros": self.bootstrap\_html} .. .. container:: small ∎ *weave.py custom weaver definition to customize the Weaver being used (87)*. Used by → `weave.py (85)`_. .. _`weaver.py processing: load and weave the document (88)`: .. rubric:: weaver.py processing: load and weave the document (88) = .. parsed-literal:: :class: code def main(source: Path) -> None: with pyweb.Logger(pyweb.default\_logging\_config): logger = logging.getLogger(\_\_file\_\_) options = argparse.Namespace( source\_path=source, output=source.parent, verbosity=logging.INFO, weaver="html", command='@', permitList=[], tangler\_line\_numbers=False, webReader=pyweb.WebReader(), theWeaver=MyHTML(), # Customize with a specific Weaver subclass ) for action in pyweb.LoadAction(), pyweb.WeaveAction(): action(options) logger.info(action.summary()) if \_\_name\_\_ == "\_\_main\_\_": main(Path("examples/test\_rst.w")) .. .. container:: small ∎ *weaver.py processing: load and weave the document (88)*. Used by → `weave.py (85)`_. .. py-web-tool/src/todo.w To Do ======= 1. Change the comment start and comment end options to use Jinja template fragments. There needs to be an ``-add '# {{chunk.position}}'`` which overrides the default of ``''`` and injects this into each Tangled chunk. Indented appropriately. 1. Implement all four alternative references in the ``end_code()`` macro. - Nothing. - The immediate reference. - The two variants on full paths: - top-down ``→ Named (1) / → Sub-Named (2) / → Sub-Sub-Named (3)`` - bottom-up ``→ Sub-Sub-Named (3) ∈ → Sub-Named (2) ∈ → Named (1)`` #. Implement ``@h`` command and ``HiddenOutputChunk``. This may be sufficient. Tangling can then include non-woven output files. Viewed the other way, Weaving can exclude the non-woven file. The use case is a book chapter with test cases that are **not** woven into the text. See https://github.com/slott56/py-web-tool/wiki/Tangle-only-Output. #. Permit selecting a specific ``begin_code()`` template for a named block. A book may have distinct formatting for REPL examples and code examples. It may be as small as CSS classes for RST or HTML. It may be a more elaborate set of differences in LaTeX. #. Update the ``-indent`` option on ``@d`` chunks to accept a numeric argument with the specific indentation value. This becomes a kind of "noindent" with a given value. The ``-noindent`` would then be the same as ``-indent 0``. Currently, `-indent` and `-noindent` are true/false flags. #. We might want to decompose the ``impl.w`` file: it's huge. The four major sections -- base model, output, input parsing, other components -- make sense. However, since the output is a *single* .rst file, it doesn't change much to do this. #. We might want to interleave code and test into a document that presents both side-by-side. We can route the tangled code to multiple files. It can be awkward to create tangled files in multiple directories, however. We'd have to use ``../tests/whatever.py``, **assuming** we were always using ``-o src``. #. Rename the module from ``pyweb`` to ``pylpweb`` to avoid name squatting issues. Rename the project from ``py-web-lp`` to ``py-lpweb``. #. Offer a basic XHTML template that uses ``CDATA`` sections instead of quoting. Does require the standard quoting for the ``CDATA`` end tag. #. Note that the overall ``Web`` is a bit like a ``NamedChunk`` that contains ``Chunks``. This similarity could be factored out. While this will create a more proper **Composition** pattern implementation, it leads to the question of why nest ``@d`` or ``@o`` chunks in the first place? .. py-web-tool/src/done.w Change Log =========== Changes for 3.2 - Rename to ``py-web-lp`` to limit collisions in PyPI. - Replaced ``toml`` with ``tomli`` or built-in ``tomllib``. - Fiddle with ``pyproject.toml`` and ``tox.ini`` to eliminate ``setup.py``. - Replaced home-brewed ``OptionParser`` with ``argparse.ArgumentParser``. Now the options for ``@d`` and ``@o`` can be *much* more flexible. - Added the transitive path of references as a Chunk property. Removes ``SimpleReference`` and ``TransitiveReference`` classes, and the associated "reference_style" attribute of a ``Weaver``. To show transitive references, a revised ``code_end`` template is required. - Added a TOML-based configuration file with a ``[logging]`` section. - Incorporated Sphinx and PlantUML into the documentation. Support continues for ```rst2html.py``. It's used for test documentaion. See https://github.com/slott56/py-web-tool/wiki/PlantUML-support-for-RST-%5BCompleted%5D - Replaced weaving process with Jinja templates. See https://github.com/slott56/py-web-tool/wiki/Jinja-Template-for-Weaving-%5BCompleted%5D - Dramatic redesign to Class, Chunk, and Command class hierarchies. - Dramatic redesign to Emitters to switch to using Jinja templates. By stepping away from the ``string.Template``, we can incorporate list-processing ``{% for %}...{% endfor %}`` construct that pushes some processing into the template. - Removed ``'\N{LOZENGE}'`` (borrowed from Interscript) and use the ``'\N{END OF PROOF}'`` symbol instead. - Created a better ``weave.py`` example that shows how to incorporate bootstrap CSS into HTML overrides. This also requires designing a more easily extended ``Weaver`` class. Changes for 3.1 - Change to Python 3.10 as the supported version. - Add type hints, f-strings, ``pathlib``, ``abc.ABC``. - Replace some complex ``elif`` blocks with ``match`` statements. - Use **pytest** as a test runner. - Add a ``Makefile``, ``pyproject.toml``, ``requirements.txt`` and ``requirements-dev.txt``. - Implement ``-o dir`` option to write output to a directory of choice, simplifying **tox** setup. - Add ``bootstrap`` directory with a snapshot of a previous working release to simplify development. - Add Test cases for ``weave.py`` and ``tangle.py`` - Replace hand-build mock classes with ``unittest.mock.Mock`` objects - Separate the projec into ``src``, ``tests``, ``examples``. Cleanup ``Makefile``, ``pyproject.toml``, etc. - Silence the ERROR-level logging during testing. - Clean up the examples Changes for 3.0 - Move to GitHub Changes for 2.3.2. - Fix all ``{:s}`` format strings to be ``{!s}``. Changes for 2.3.1. - Cleanup some stray comment errors. - Revise the documentation structure and organization. - Tweak the error messages. Changes for 2.3. - Changed to Python 3.3 -- Fixed ``except``, ``raise`` and ``%``. - Removed ``doWrite()`` and simplified ``doOpen()`` and ``doClose()``. - Cleaned up RST output to be much nicer. - Change the baseline ``pyweb.w`` file to be RST instead of HTML. docutils required to produce HTML from the woven output. - Removed the unconstrained ``eval()`` function. Provided a slim set of globals. ``os`` is really just ``os.path``. Any ``os.getcwd()`` can be changed to ``os.path.realpath('.')``. ``time`` was removed and replaced with ``datetime``. Any ``time.asctime()`` must be ``datetime.datetime.now().ctime()``. - Resolved a small dispute between ``weaveReferenceTo()`` (wrong) and ``tangle()`` (right). for NamedChunks. The issue was one of failure to understand the differences between weaving -- where indentation is localized -- and tangling -- where indentation must be tracked globally. Root cause was a huge problem in ``codeBlock()`` which didn't really weave properly at all. - Fix the tokenizer and parsing. Stop using a complex tokenizer and use a simpler iterator over the tokens with ``StopIteration`` exception handling. - Replace ``optparse`` with ``argparse``. - Get rid of the global ``logger`` variable. - Remove the filename as part of ``Web()`` initial creation. A basename comes from the initial ``.w`` file loaded by the ``WebReader``. - Fix the Action class hierarchy so that composite actions are simpler. - Change references to return ``Chunk`` objects, not ``(name,sequence)`` pairs. - Make the ref list separator in ``Weaver reference summary...`` a proper template feature, not a hidden punctuation mark in the code. - Configure ``Web.reference_style`` properly so that simple or transitive references can be included as a command-line option. The default is Simple. Add the ``-r`` option so that ``-rt`` includes transitive references. - Reduce the "hard-coded" punctuation. For example, the ``", "`` in ``@d Web weave...`` ``weaveChunk()``. This was moved into a template. - Add an ``__enter__()`` and ``__exit__()`` to make an ``Emitter`` into a proper Context Manager that can be used with a ``with`` statement. - Add the ``-n`` option to include tangler line numbers if the ``@o`` includes the comment characters. - Cleanup the ``TanglerMake`` unit tests to remove the ``sleep()`` used to assure that the timestamps really are different. - Cleanup the syntax for adding a comment template to ``@o``. Use ``-start`` and ``-end`` before the filename. - Cleanup the syntax for noindent named chunks. Use ``-noindent`` before the chunk name. This creates a distinct ``NamedChunk_Noindent`` instance that handles indentation differently from other ``Chunk`` subclasses. - Cleanup the ``TangleAction`` summary. - Clean up the error messages. Raising an exception seems heavy-handed and confusing. Count errors instead. Changes since version 1.4 - Removed home-brewed logger. - Replaced ``getopt`` with ``optparse``. - Replaced LaTeX markup. - Corrected significant problems in cross-reference resolution. - Replaced all HTML and LaTeX-specific features with a much simpler template engine which applies a template to a Chunk. The Templates are separate configuration items. The big issue with templates are conditional processing and the use of loops to handle multiple references in a transitive closure. While it's nice to depend on Jinja2, it's also nice to be totally stand-alone. Sigh. Choices include the no-logic ``string.Template`` in the standard library or the ``Templite+`` Recipe 576663. - Looked at SCons API. Renamed "Operation" to "Action"; renamed "perform" to "__call__". Consider having "__call__" which does logging, then call "execute". - Eliminated the EmitterFactory; replace this with simple injection of the proper template configuration. - Removed the ``@O`` command; it was essentially a variant template for LaTeX. - Disentangled indentation and quoting in the codeBlock. Indentation rules vary between Tangling and Weaving. Quoting is unique to a woven codeBlock. Fix ``referenceTo()`` to write indented without code quoting. - Offer a basic RST template. Note that colorizing may be easier to handle with an RST template. The weaving markup template degenerates to ``.. parsed-literal::`` and indent. By doing this, the RST output from *pyWeb* can be run through DocUtils ``rst2html.py`` or perhaps *Sphix* to create final HTML. The hard part is the indent. - Tweaked (but didn't fix) ReferenceCommand tangle and all setIndent/clrIndent operations. Only a ReferenceCommand actually cares about indentation. And that indentation is totally based on the "context" plus the text in the Command immediate in front of the ReferenceCommand. Indices ======= Files ------ :pyweb.toml: → `pyweb.toml (77)`_:pyweb.py: → `pyweb.py (79)`_:tangle.py: → `tangle.py (84)`_:weave.py: → `weave.py (85)`_ Macros ------ :Action call method actually does the real work: → `Action call method actually does the real work (55)`_ :Action class hierarchy used to describe actions of the application: → `Action class hierarchy used to describe actions of the application (53)`_ :Action final summary of what was done: → `Action final summary of what was done (56)`_ :Action superclass has common features of all actions: → `Action superclass has common features of all actions (54)`_ :ActionSequence call method delegates the sequence of ations: → `ActionSequence call method delegates the sequence of ations (58)`_ :ActionSequence subclass that holds a sequence of other actions: → `ActionSequence subclass that holds a sequence of other actions (57)`_ :ActionSequence summary summarizes each step: → `ActionSequence summary summarizes each step (59)`_ :Application Class for overall CLI operation: → `Application Class for overall CLI operation (70)`_ :Application class process all files: → `Application class process all files (73)`_ :Application default options: → `Application default options (71)`_ :Application parse command line: → `Application parse command line (72)`_ :Base Class Definitions: → `Base Class Definitions (1)`_, → `Base Class Definitions (19)`_, → `Base Class Definitions (33)`_ :Chunk class hierarchy -- used to describe individual chunks: → `Chunk class hierarchy -- used to describe individual chunks (8)`_, → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :Command class hierarchy -- used to describe individual commands in a chunk: → `Command class hierarchy -- used to describe individual commands in a chunk (10)`_ :Common base template -- this is used for ALL weaving: → `Common base template -- this is used for ALL weaving (23)`_ :Debug Templates -- these display debugging information: → `Debug Templates -- these display debugging information (24)`_ :Emitter Superclass: → `Emitter Superclass (21)`_ :Emitter indent control: set, clear and reset: → `Emitter indent control: set, clear and reset (30)`_ :Emitter write a block of code with proper indents: → `Emitter write a block of code with proper indents (29)`_ :Error class defines the errors raised: → `Error class defines the errors raised (52)`_ :HTML Templates -- emit HTML weave output: → `HTML Templates -- emit HTML weave output (26)`_ :Imports: → `Imports (2)`_, → `Imports (11)`_, → `Imports (20)`_, → `Imports (31)`_, → `Imports (42)`_, → `Imports (47)`_, → `Imports (50)`_, → `Imports (69)`_, → `Imports (74)`_, → `Imports (80)`_ :Interface Functions: → `Interface Functions (78)`_ :LaTeX Templates -- emit LaTeX weave output: → `LaTeX Templates -- emit LaTeX weave output (27)`_ :LoadAction call method loads the input files: → `LoadAction call method loads the input files (67)`_ :LoadAction subclass loads the document web: → `LoadAction subclass loads the document web (66)`_ :LoadAction summary provides lines read: → `LoadAction summary provides lines read (68)`_ :Logging Setup: → `Logging Setup (75)`_, → `Logging Setup (76)`_ :Overheads: → `Overheads (81)`_, → `Overheads (82)`_, → `Overheads (83)`_ :RST Templates -- the default weave output: → `RST Templates -- the default weave output (25)`_ :TangleAction call method does tangling of the output files: → `TangleAction call method does tangling of the output files (64)`_ :TangleAction subclass initiates the tangle action: → `TangleAction subclass initiates the tangle action (63)`_ :TangleAction summary method provides total lines tangled: → `TangleAction summary method provides total lines tangled (65)`_ :Tangler Subclass -- emits the output files: → `Tangler Subclass -- emits the output files (28)`_ :TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change: → `TanglerMake Subclass -- extends Tangler to avoid touching files that didn't change (32)`_ :The CodeCommand Class: → `The CodeCommand Class (16)`_ :The Command Abstract Base Class: → `The Command Abstract Base Class (13)`_ :The HasText Type Hint -- used instead of another abstract class: → `The HasText Type Hint -- used instead of another abstract class (14)`_ :The ReferenceCommand Class: → `The ReferenceCommand Class (17)`_ :The TextCommand Class: → `The TextCommand Class (15)`_ :The TypeId Class -- to help the template engine: → `The TypeId Class -- to help the template engine (12)`_ :The XrefCommand Subclasses -- files, macros, and user names: → `The XrefCommand Subclasses -- files, macros, and user names (18)`_ :Tokenizer class - breaks input into tokens: → `Tokenizer class - breaks input into tokens (51)`_ :WeaveAction call method to pick the language: → `WeaveAction call method to pick the language (61)`_ :WeaveAction subclass initiates the weave action: → `WeaveAction subclass initiates the weave action (60)`_ :WeaveAction summary of language choice: → `WeaveAction summary of language choice (62)`_ :Weaver Subclass -- Uses Jinja templates to weave documentation: → `Weaver Subclass -- Uses Jinja templates to weave documentation (22)`_ :Web class -- describes the overall "web" of chunks: → `Web class -- describes the overall "web" of chunks (3)`_, → `Web class -- describes the overall "web" of chunks (4)`_, → `Web class -- describes the overall "web" of chunks (5)`_, → `Web class -- describes the overall "web" of chunks (6)`_, → `Web class -- describes the overall "web" of chunks (7)`_ :WebReader class - parses the input file, building the Web structure: → `WebReader class - parses the input file, building the Web structure (34)`_ :WebReader command literals: → `WebReader command literals (49)`_ :WebReader handle a command string: → `WebReader handle a command string (35)`_, → `WebReader handle a command string (45)`_ :WebReader load the web: → `WebReader load the web (48)`_ :WebReader location in the input stream: → `WebReader location in the input stream (46)`_ :add a reference command to the current chunk: → `add a reference command to the current chunk (41)`_ :add an expression command to the current chunk: → `add an expression command to the current chunk (43)`_ :assign user identifiers to the current chunk: → `assign user identifiers to the current chunk (40)`_ :double at-sign replacement, append this character to previous TextCommand: → `double at-sign replacement, append this character to previous TextCommand (44)`_ :finish a chunk, start a new Chunk adding it to the web: → `finish a chunk, start a new Chunk adding it to the web (39)`_ :include another file: → `include another file (38)`_ :start a NamedChunk or NamedDocumentChunk, adding it to the web: → `start a NamedChunk or NamedDocumentChunk, adding it to the web (37)`_ :start an OutputChunk, adding it to the web: → `start an OutputChunk, adding it to the web (36)`_ :weave.py custom weaver definition to customize the Weaver being used: → `weave.py custom weaver definition to customize the Weaver being used (87)`_ :weave.py overheads for correct operation of a script: → `weave.py overheads for correct operation of a script (86)`_ :weaver.py processing: load and weave the document: → `weaver.py processing: load and weave the document (88)`_ User Identifiers ---------------- :Action: → `Action superclass has common features of all actions (54)`_ :ActionSequence: → `ActionSequence subclass that holds a sequence of other actions (57)`_ :Application: → `Application Class for overall CLI operation (70)`_ :Chunk: → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :Error: → `Error class defines the errors raised (52)`_ :LoadAction: → `LoadAction subclass loads the document web (66)`_ :NamedChunk: → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :NamedChunk_Noindent: → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :NamedDocumentChunk: → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :OutputChunk: → `Chunk class hierarchy -- used to describe individual chunks (9)`_ :TangleAction: → `TangleAction subclass initiates the tangle action (63)`_ :Tokenizer: → `Tokenizer class - breaks input into tokens (51)`_ :TypeId: → `The TypeId Class -- to help the template engine (12)`_ :TypeIdMeta: → `The TypeId Class -- to help the template engine (12)`_ :WeaveAction: → `WeaveAction subclass initiates the weave action (60)`_ :Web: → `Web class -- describes the overall "web" of chunks (3)`_, → `Web class -- describes the overall "web" of chunks (7)`_ :WebReader: → `WebReader class - parses the input file, building the Web structure (34)`_ :__version__: → `Overheads (83)`_ :addIndent: → `Emitter indent control: set, clear and reset (30)`_ :argparse: → `Imports (69)`_ :builtins: → `Imports (42)`_ :clrIndent: → `Emitter indent control: set, clear and reset (30)`_ :codeBlock: → `Emitter write a block of code with proper indents (29)`_ :datetime: → `Imports (80)`_ :duration: → `Action final summary of what was done (56)`_ :expand: → `Application parse command line (72)`_ :expect: → `WebReader handle a command string (45)`_ :handleCommand: → `WebReader handle a command string (35)`_ :load: → `WebReader load the web (48)`_ :location: → `WebReader location in the input stream (46)`_ :logging: → `Imports (74)`_ :logging.config: → `Imports (74)`_ :os: → `Imports (80)`_ :parse: → `WebReader load the web (48)`_ :parseArgs: → `Application parse command line (72)`_ :perform: → `Action call method actually does the real work (55)`_, → `ActionSequence call method delegates the sequence of ations (58)`_, → `WeaveAction call method to pick the language (61)`_, → `TangleAction call method does tangling of the output files (64)`_, → `LoadAction call method loads the input files (67)`_ :platform: → `Imports (42)`_ :process: → `Application class process all files (73)`_ :re: → `Imports (50)`_ :resetIndent: → `Emitter indent control: set, clear and reset (30)`_ :setIndent: → `Emitter indent control: set, clear and reset (30)`_ :shlex: → `Imports (69)`_ :summary: → `Action final summary of what was done (56)`_, → `ActionSequence summary summarizes each step (59)`_, → `WeaveAction summary of language choice (62)`_, → `TangleAction summary method provides total lines tangled (65)`_, → `LoadAction summary provides lines read (68)`_ :sys: → `Imports (42)`_ :time: → `Imports (80)`_ :toml: → `Imports (80)`_ :types: → `Imports (80)`_ --------- .. class:: small Created by pyweb.py at Tue Jul 11 09:15:02 2023. Source pyweb.w modified Sun Jul 3 13:07:49 2022. pyweb.__version__ '3.2'. Working directory '/Users/slott/Documents/Projects/py-web-tool/src'.