What — exactly — is the problem?
What is the COBOL asset?
How hard is this to fix?
What can we do?
It is
But that's not the problem
They are
But that's not the problem
COBOL is a very simple language for knowledge capture
With some obscure and unpleasant features
While we think of mainframes as BUG
An app written 30+ years ago
Target: 370/158 with 4Mb RAM and a 3.2Gb disk
<24bit address space
COBOL programs would easily run on your phone
A modern "app" was a "system" of interconnected components
100's of programs
Each program a few hundred lines of code
A few dragged on to 1,000's
A few central design patterns
Edit -- read and validate a batch of transactions
Update -- match-merge updating of master files from transactions
Report -- View the master files
Read source records -- they're generally prepared manually
Check ranges, types, and internal consistency
Stage valid batches for update processing
Display details of errors in a batch so it can be repaired
with source_path.open() as source_file, \
good_path.open(“w”) as good_file, \
bad_path.open(“w”) as bad_file:
for batch in batch_read(source):
if valid(batch): # The interesting part
batch_write(good_file)
else:
batch_write(bad_file)
In Python, we can make the interesting part stand out very clearly.
In COBOL, it can be hard to track down
Read edited, sorted transaction records
Match keys with sorted master file
Perform Add, Change, and Delete on Master File based on Transaction(s)
Write new master file (or rewrite records in place)
with xact_path.open() as xact_file,
old_path.open() as master_file,
new_path.open(“w”) as new_master_file:
master = master_read(master_file)
xact = xact_read(xact_file)
while master and xact:
if master.key < xact.key:
master_write(new_master_file, master)
master = master_read(master_file)
elif old_rec.key < xact_key:
xact = xact_read(xact_file)
else:
update(master, xact) # The interesting part
xact = xact_read(xact_file)
while master:
master_write(new_master_file, master)
master = master_read(master_file)
In Python, we can make the interesting part stand out very clearly.
In COBOL, it can be hard to track down
The COBOL apps are (generally) straight forward
There are a LOT of them in an enterprise
There may be only a dozen "master file update" apps
There may be several dozen edit variants
There will be several dozen file copy-with-filter apps
Hundreds and hundreds of reports -- all can be replaced with Pandas data frames
Less than 4MB RAM
Caching is essential
But
COBOL has no associative store
It barely has arrays
dict[str, str]
?¶list[tuple[str, str]]
DATA DIVISION.
WoRKING-STORAGE SECTION.
01 Some-Table.
05 Places-Used COMP-3.
05 Some-Record Occurs 20 Times.
10 Key PIC XXX.
10 Value PIC X(32).
All COBOL has are Python Arrays (fixed size), NamedTuple, str, and Decimal
No list, dict, or set
No classes (more modern COBOL added OO features.)
No functions (generally)
GOTO
REDEFINES
ALTER
In the abstract? Yes
Pragmatically?
The states of a COBOL app may be opaque
Assume $F(C)$ is the Finite State Automaton (FSA) for some COBOL program.
The $P(F(C))$ -- A Python implementaion of the FSA -- will be utterly opaque
Knowledge Not Captured
If you've ever tried to read the output from yacc or lex, you know what this is
The COBOL optimizations -- caches
Each COBOL dev created their own unique dict[str, str]
implementation
A testament to the "throw people at it" school of management
When schedule matters most, quality doesn't matter at all
A lot of ineffecient list[tuple[str, str]]
processing
The architecture issue:
Where did they live?
Everywhere
Copy-pasta programming is often rampant
It has a Lots of Little Programs (LoLP) architecture
LoLP can exacerbate bad management choices
Overwhelming details
Lots of redudant special-case IF statements
KTLO means latent bugs everywhere
The data is an unholy mess
COBOL Redefines clauses
The data cannot "simply" be read
Code is required to disambiguate
Endless special cases and exceptions
COBOL Data Definition Entry (DDE) used in production code
Some records may be skipped
Filtering rules in place
And possibly inconsistent
Expose the COBOL source
Expose the JCL that knit the apps together
Work out the directed acyclic graph (DAG) of updates to master files
Reason backwards from master file writes
find map() transformations and filter() exclusions
find REDEFINES
discrimination
find exceptions and special cases