Literate Programming

The Philosophy of Literate Programming

The philosophy of literate programming (LP) and the term itself were the creation of Donald Knuth, the author of The Art of Computer Programming, the TeX typesetting system, and other works of the programming art.

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

—Donald E. Knuth, Literate Programming, 1984

The Literate Programming FAQ gives the following definition:

Literate programming is the combination of documentation and source together in a fashion suited for reading by human beings.

and the Wikipedia says:

Literate programming is a philosophy of computer programming based on the premise that a computer program should be written with human readability as a primary goal, similarly to a work of literature.

According to this philosophy, programmers should aim for a “literate” style in their programming just as writers aim for an intelligible and articulate style in their writing. This philosophy contrasts with the mainstream view that the programmer’s primary or sole objective is to create source code and that documentation should only be a secondary objective.

John Max Skaller put the philosophy of literate programming in one sentence

The idea is that you do not document programs (after the fact), but write documents that contain the programs.

—John Max Skaller in a Charming Python interview.

LP Technique and Systems

Donald Knuth not only conceived the philosophy of literate programming, he also wrote the first LP system WEB that implements the technique of literate programming:

WEB itself is chiefly a combination of two other languages:

  1. a document formatting language and
  2. a programming language.

My prototype WEB system uses TEX as the document formatting language and PASCAL as the programming language, but the same principles would apply equally well if other languages were substituted. […]. The main point is that WEB is inherently bilingual, and that such a combination of languages proves to be much more powerful than either single language by itself. WEB does not make the other languages obsolete; on the contrary, it enhances them.

—Donald E. Knuth, Literate Programming, 1984

There exists a famous statement of requirements for a LP system:

Literate programming systems have the following properties:

  1. Code and extended, detailed comments are intermingled.
  2. The code sections can be written in whatever order is best for people to understand, and are re-ordered automatically when the computer needs to run the program.
  3. The program and its documentation can be handsomely typeset into a single article that explains the program and how it works. Indices and cross-references are generated automatically.

—Mark-Jason Dominus POD is not Literate Programming

The article [contemporary-literate-programming] provides a survey on different implementations of the literate programming technique.

LP with reStructuredText

PyLit fails the statement of requirements for a LP system, as it does not provide for code re-ordering, automatic indices and cross-references.

According to the Wikipedia, PyLit qualifies as semi-literate (while POD is classified as embedded documentation):

It is the fact that documentation can be written freely whereas code must be marked in a special way that makes the difference between semi-literate programming and excessive documenting, where the documentation is embedded into the code as comments.

Wikipedia entry on literate programming

However, it is argued that PyLit is an educated choice as tool for the technique of literate programming:

  • PyLit is inherently bilingual. The document containing the program is a combination of:
    1. a document formatting language (reStructuredText) and
    2. a programming language.
  • Program and documentation can be typeset in a single article as well as converted to a hyperlinked HTML document representing the program’s “web-like” structure. For indexing and cross-references, PyLit relies on the rich features of the reStructuredText document formatting language. Auto-generating of cross references and indices is possible if the reStructuredText sources are converted with Sphinx

Dispensing with code re-ordering enables the dual-source concept. Code re-ordering is considered less important when LP is not seen as an alternative to structured programming but a technique orthogonal to structured, object oriented or functional programming styles:

The computer science and programming pioneers, like Wirth and Knuth, used another programming style than we recommend today. One notable difference is the use of procedure abstractions. When Wirth and Knuth wrote their influential books and programming, procedure calls were used with care, partly because they were relatively expensive. Consequently, they needed an extra level of structuring within a program or procedure. ‘Programming by stepwise refinement’ as well as ‘literate programming’ can be seen as techniques and representations to remedy the problem. The use of many small (procedure) abstractions goes in another direction. If you have attempted to use ‘literate programming’ on a program (maybe an object-oriented program) with lots of small abstractions, you will probably have realized that there is a misfit between, on the one hand, being forced to name literal program fragments (scraps) and on the other, named program abstractions. We do not want to leave the impression that this is a major conceptual problem, but it is the experience of the author that we need a variation of literate programming, which is well-suited to programs with many, small (named) abstractions.

—Kurt Nørmark: Literate Programming - Issues and Problems

Ordering code for optimal presentation to humans is still a problem in many “traditional” programming languages. Pythons dynamic name resolution allows a great deal of flexibility in the order of code.

Existing Frameworks

There are many LP systems available, both sophisticated and lightweight. This non-comprehensive list compares them to PyLit.

WEB and friends

The first published literate programming system was WEB by Donald Knuth. It uses Pascal as programming language and TeX as document formatting language.

CWEB is a newer version of WEB for the C programming language. Other implementations of the concept are the “multi-lingual” noweb and FunnelWeb. All of them use TeX as document formatting language.

Some of the problems with WEB-like systems are:

  • The steep learning curve for mastering TeX, special WEB commands and the actual programming language in parallel discourages potential literate programmers.

  • While a printout has the professional look of a TeX-typeset essay, the programmer will spend most time viewing and editing the source. A WEB source in a text editor is not “suited for reading by human beings”:

    In Knuth’s work, beautiful literate programs were woven from ugly mixes of markups in combined program and documentation files. The literate program was directly targeted towards paper and books. Of obvious reasons, programmers do not very often print their programs on paper. (And as argued above, few of us intend to publish our programs in a book). Within hours, such a program print out is no longer up-to-date. And furthermore, being based on a paper representation, we cannot use the computers dynamic potentials, such as outlining and searching.

    —Kurt Nørmark: Literate Programming - Issues and Problems

How does PyLit differ from the web-like LP frameworks (For a full discussion of PyLit’s concepts see the Features page.):

  • PyLit is not a complete LP system but a simple tool. It does not support re-arranging of named code chunks.

  • PyLit uses reStructuredText as document formatting language. The literate program source is legible in any text editor.

    Even the code source is “in a fashion suited for reading by human beings”, in opposition to source code produced by a full grown web system:

    Just consider the output of tangle to be object code: You can run it, but you don’t want to look at it.

    http://www.perl.com/pub/a/tchrist/litprog.html

  • PyLit is a bidirectional text <-> code converter.

    • You can start a literate program from a “normal” code source by converting to a text source and adding the documentation parts.
    • You can edit code in its “native” mode in any decent text editor or IDE.
    • You can debug and fix code with the standard tools using the code source Line numbers are the same in text and code format. Once ready, forward your changes to the text version with a code->text conversion.
    • PyLit cannot support export to several code files from one text source.

SourceBrowser

SourceBrowser is a complex tool generating a hyper-linked representation of complex software projects.

SourceBrowser is a documentation meta-programming tool that generates a Wiki representation of a source tree.

It provides a Source Annotation Language for writing a documentation meta-program and a tool for generating a static Wiki for the host program.

SourceBrowser home page

Interscript

Interscript is a complex framework that uses Python both for its implementation and to supply scripting services to client sourceware.

Interscript is different, because the fundamental paradigm is extended so that there are three kinds of LP sections in file:

  1. Code sections
  2. Documentation sections
  3. Scripting sections

where the scripting sections […] control and possibly generate both documentation and code.

Interscript homepage

Ly

Ly, the “lyterate programming thingy.” is an engine for Literate Programming. The design considerations look very much like the ones for PyLit:

So why would anybody use Ly instead of the established Literate Programming tools like WEB or noweb?

  • Elegance. I do not consider the source code of WEB or noweb to be particularly nice. I especially wanted a syntax that allows me to easily write a short paragraph, then two or three lines of code, then another short paragraph. Ly accomplishes this by distinguishing code from text through indentation.
  • HTML. WEB and CWEB are targeted at outputting TeX. This is a valid choice, but not mine, as I want to create documents with WWW links, and as I want to re-create documents each time I change something– both of these features don’t go well together with a format suited to printout, like TeX is. (Of course you can create HTML from LaTeX, but if my output format is HTML, I want to be able to write HTML directly.)
  • Python. I originally created Ly for a Python-based project, and therefore I wanted it to run everywhere where Python runs. WEB and noweb are mostly targeted at UNIX-like platforms.

Ly introduction

However, Ly introduces its own language with

  • a syntax for code chunks,
  • some special ways for entering HTML tags (“currently undocumented”).

whereas PyLit relies on the established reStructuredText for documentation markup and linking.

XMLTangle

XMLTangle is an generic tangle program written in Python. It differs from PyLit mainly by its choice of XML as documenting language (making the source document harder to read for a human being).

Lightweight literate programming

Lightweight literate programming is one more example of a Python based LP framework. One XML source is processed to either text or code. The XML source itself is rather hard to read for human beings.

pyreport

Pyreport is quite similar to PyLit in its focus on Python and the use of reStructuredText. It is a combination of documentation generator (processing embedded documentation) and report tool (processing Python output).

pyreport is a program that runs a python script and captures its output, compiling it to a pretty report in a PDF or an HTML file. It can display the output embedded in the code that produced it and can process special comments (literate comments) according to markup languages (rst or LaTeX) to compile a very readable document.

This allows for extensive literate programming in python, for generating reports out of calculations written in python, and for making nice tutorials.

However, it is not a tool to “write documents that contain programs”.

References

[contemporary-literate-programming]Vreda Pieterse, Derrick G. Kourie, and Andrew Boake: A case for contemporary literate programming, in SAICSIT, Vol. 75, Pages: 2-9, Stellenbosch, Western Cape, South Africa: 2004, available at: http://portal.acm.org/citation.cfm?id=1035054