Looking at COBOL¶

From a Pythonic Perspective¶

Steven F. Lott

Topics¶

  • What — exactly — is the problem?

  • What is the COBOL asset?

  • How hard is this to fix?

  • What can we do?

What is the problem?¶

Exactly¶

It's not that COBOL is bad¶

  • It is

  • But that's not the problem

It's not the COBOL skills are rare¶

  • They are

  • But that's not the problem

The problem is intrasigence¶

If it ain't broke don't fix it¶

Keep The Lights On (KTLO)

  • Creates Tech Debt

  • Petrifies Tech Debt into a permanent feature of the enterprise

The COBOL Asset¶

What part is valuable?¶

And what should we discard?

A Guiding Principle¶

Software Captures Knowledge¶

All programming languagees are a Turing Complete

COBOL ↦ Python

More formally

$$ \begin{align} C \models A \\ P \models A \end{align} $$

For some underlying Algorithm defined by state changes, $A$.

COBOL and Knowledge¶

  • COBOL is a very simple language for knowledge capture

  • With some obscure and unpleasant features

Mainframes¶

  • While we think of mainframes as BUG

    • They weren't
  • An app written 30+ years ago

    • Target: 370/158 with 4Mb RAM and a 3.2Gb disk

    • <24bit address space

  • COBOL programs would easily run on your phone

Consequences¶

  • A modern "app" was a "system" of interconnected components

    • 100's of programs

    • Each program a few hundred lines of code

    • A few dragged on to 1,000's

  • A few central design patterns

Design Patterns¶

  • Edit -- read and validate a batch of transactions

  • Update -- match-merge updating of master files from transactions

  • Report -- View the master files

Edit Programs¶

  • Read source records -- they're generally prepared manually

  • Check ranges, types, and internal consistency

  • Stage valid batches for update processing

  • Display details of errors in a batch so it can be repaired

In Python¶

with source_path.open() as source_file, \
        good_path.open(“w”) as good_file, \
        bad_path.open(“w”) as bad_file:
    for batch in batch_read(source):
        if valid(batch):  # The interesting part
            batch_write(good_file)
        else:
            batch_write(bad_file)

COBOL Programs can be obscure¶

In Python, we can make the interesting part stand out very clearly.

In COBOL, it can be hard to track down

Update Programs¶

  • Read edited, sorted transaction records

  • Match keys with sorted master file

  • Perform Add, Change, and Delete on Master File based on Transaction(s)

  • Write new master file (or rewrite records in place)

In Python¶

with xact_path.open() as xact_file, 
        old_path.open() as master_file, 
        new_path.open(“w”) as new_master_file:
    master = master_read(master_file)
    xact = xact_read(xact_file)
    while master and xact:
        if master.key < xact.key:
            master_write(new_master_file, master)
            master = master_read(master_file)
        elif old_rec.key < xact_key:
            xact = xact_read(xact_file)
        else:
            update(master, xact)  # The interesting part
            xact = xact_read(xact_file)
    while master:
        master_write(new_master_file, master)
        master = master_read(master_file)

COBOL Programs can be obscure¶

In Python, we can make the interesting part stand out very clearly.

In COBOL, it can be hard to track down

It's not so bad¶

  • The COBOL apps are (generally) straight forward

    • The design patterns are often clear
  • There are a LOT of them in an enterprise

    • There may be only a dozen "master file update" apps

    • There may be several dozen edit variants

    • There will be several dozen file copy-with-filter apps

    • Hundreds and hundreds of reports -- all can be replaced with Pandas data frames

Optimization¶

A Very Necessary Evil¶

Remember 370/150¶

  • Less than 4MB RAM

  • Caching is essential

  • But

    • COBOL has no associative store

    • It barely has arrays

The workaround for no dict[str, str]?¶

  • list[tuple[str, str]]
DATA DIVISION.
WoRKING-STORAGE SECTION.
01  Some-Table.
    05  Places-Used COMP-3.
    05  Some-Record Occurs 20 Times.
        10  Key PIC XXX.
        10  Value PIC X(32).

Not Kidding¶

  • All COBOL has are Python Arrays (fixed size), NamedTuple, str, and Decimal

  • No list, dict, or set

  • No classes (more modern COBOL added OO features.)

  • No functions (generally)

    • They're part of the language, but often overlooked.

The Compounding Obscurities¶

  • GOTO

    • Can make the code utterly opaque
  • REDEFINES

    • A "free union" of various data types
  • ALTER

    • Targets of GOTO can be changed at run-time

How hard is this to fix?¶

Setting hype aside¶

Does COBOL Map to Python?¶

  • In the abstract? Yes

    • Turing Complete languages map to a Finite State Automaton
  • Pragmatically?

    • The states of a COBOL app may be opaque

      • One GOTO can make a real mess

Formally¶

Assume $F(C)$ is the Finite State Automaton (FSA) for some COBOL program.

The $P(F(C))$ -- A Python implementaion of the FSA -- will be utterly opaque

Knowledge Not Captured

If you've ever tried to read the output from yacc or lex, you know what this is

More important still¶

The COBOL optimizations -- caches

  • Called a "lookup table"

Each COBOL dev created their own unique dict[str, str] implementation

  • A testament to the "throw people at it" school of management

  • When schedule matters most, quality doesn't matter at all

A lot of ineffecient list[tuple[str, str]] processing

Most important?¶

The architecture issue:

  • Special Cases, Exceptions

Where did they live?

  • Everywhere

  • Copy-pasta programming is often rampant

COBOL isn't bad¶

  • It has a Lots of Little Programs (LoLP) architecture

  • LoLP can exacerbate bad management choices

    • Overwhelming details

    • Lots of redudant special-case IF statements

      • With code rot -- they no longer match
  • KTLO means latent bugs everywhere

    • Some documented bugs are "features"

What Can We Do?¶

It's difficult. But...¶

Data is the most valuable thing¶

Preserve the data¶

  • Processing is secondary

    • Save example files to create test scenarios

    • Spell out the test scenarion in Gherkin

    • Then do Acceptance Test Drive Develment (ATDD) to rewrite the mainframe app

Python has an EBCDIC Codec -- it can read COBOL files directly

Pragmaticaly¶

The data is an unholy mess

  • COBOL Redefines clauses

    • The data cannot "simply" be read

    • Code is required to disambiguate

  • Endless special cases and exceptions

It may get worse¶

COBOL Data Definition Entry (DDE) used in production code

  • May not match all records in the file

Some records may be skipped

  • Filtering rules in place

  • And possibly inconsistent

A Path Forward¶

  1. Expose the COBOL source

  2. Expose the JCL that knit the apps together

  3. Work out the directed acyclic graph (DAG) of updates to master files

  4. Reason backwards from master file writes

    • find map() transformations and filter() exclusions

    • find REDEFINES discrimination

    • find exceptions and special cases

Conclusion¶

COBOL code is a repository of enterprise knowledge¶

Capture the data¶

Rewrite the code into a modern language¶