stingray.estruct

estruct – Unpack bytes with EBCDIC encodings

The estruct module unpacks EBCDIC-encoded values. It is a big-endian version of the struct module. It uses two COBOL DDE clauses, USAGE and PIC, to describe the format of data represented by a sequence of bytes.

Unpacking and Sizing

The format string is a COBOL DDE. The USAGE and PIC (also spelled PICTURE) clauses are required, the rest of the DDE is quietly ignored. For example, 'USAGE DISPLAY PIC S999.99', is the minimum to describe a textual value that occupies 7 bytes.

The unpack() uses the format string to unpack bytes into useful Python values. As with the built-in struct.unpack(), the result is always a tuple even if it has a single value.

The calcsize() functions uses the format string to compute the size of a value. This can be applied to a DDE to compute the offsets and positions of each field.

Note

Alternative Format Strings

The struct module uses a compact format string describe data. This string is used unpack text, int, and float values from a sequence of bytes. See https://docs.python.org/3/library/struct.html#format-characters. An alternative interface for this module could be to use single-letter codes.

For example:

  • 15x for display.

  • f and d for COMP-1 and COMP-2.

  • 9.2p for PIC 9(9)V99 packed decimal COMP-3.

  • 9.2n for zoned decimal text, DISPLAY instead of computational.

  • h, i, and l for COMP-4 variants.

This seems needless, but it is compact and somewhat more compatible with the struct module.

Examples:

>>> import stingray.estruct
>>> stingray.estruct.unpack("USAGE DISPLAY PIC S999V99", ' 12345'.encode("cp037"))
(Decimal('123.45'),)
>>> stingray.estruct.unpack("USAGE DISPLAY PIC X(5)", 'ABCDE'.encode("cp037"))
('ABCDE',)
>>> stingray.estruct.calcsize("USAGE COMP-3 PIC S9(11)V9(2)")
7

File Reading

An EBCDIC file can leverage physical “Record Format” (RECFM) assistance. These classes define a number of Z/OS RECFM conversion functions. We recognize four actual RECFM’s plus an additional special case.

  • F - Fixed. RECFM_F

  • FB - Fixed Blocked. RECFM_FB

  • V - Variable, each record is preceded by a 4-byte Record Description Word (RDW). RECFM_V

  • VB - Variable Blocked. Blocks have Block Description Word (BDW); each record within a block has a Record Description Word. RECFM_VB

  • N - Variable, but without BDW or RDW words. This involves some buffer management magic to recover the records properly. This is required to handle Occurs Depending On cases where there’s no V or VB header. This requires the consumer of bytes to announce how many bytes were consumed so the reader can advance an appropriate amount. RECFM_N

Each of these has a RECFM_Reader.record_iter() iterator that emits records stripped of header word(s).

with some_path.open('rb') as source:
    for record in RECFM_FB(source, lrecl=80).record_iter():
        process(record)

Note

IBM z/Architecture mainframes are all big-endian

COBOL Picture Parsing

The Representation object provides representation details based on COBOL syntax. This is used by the Struct Unpacker (schema_instance.Struct) as well as the EBCDIC Unpacker (schema_instance.EBCDIC).

In principle, this might be a separate thing, or might be part of the cobol_parser module. For now, it’s here and is reused by schema_instance.

Calcsize Function

stingray.estruct.calcsize(format: str) int

Compute the size, in bytes for an elementary (non-group-level) COBOL DDE format specification.

Parameters:

format – The COBOL DISPLAY and PIC clauses.

Returns:

integer size of the item in bytes.

Unpack Function

stingray.estruct.unpack(format: str, buffer: bytes) tuple[Any, ...]

Unpack EBCDIC bytes given a COBOL DDE format specification and a buffer of bytes.

USAGE DISPLAY special case: “external decimal” sometimes called “zoned decimal”. The PICTURE character-string of an external decimal item can contain only:

  • One or more of the symbol 9

  • The operational-sign, S

  • The assumed decimal point, V

  • One or more of the symbol P

External decimal items with USAGE DISPLAY are sometimes referred to as zoned decimal items. Each digit of a number is represented by a single byte. The 4 high-order bits of each byte are zone bits; the 4 high-order bits of the low-order byte represent the sign of the item. The 4 low-order bits of each byte contain the value of the digit.

Parameters:
  • format – A format string; a COBOL DDE.

  • buffer – A bytes object with a value to be unpacked.

Returns:

A Python object

RECFM_Reader

class stingray.estruct.RECFM_Reader(source: BinaryIO, lrecl: int | None = None)

Reads records based on a physical file format.

A subclass can handle details of the various kinds of Block and Record Descriptor Words (BDW, RDW) present a specific format.

abstract record_iter() Iterator[bytes]

Yields each physical record, stripped of headers.

used(size: int) None

Used by a row to announce the number of bytes consumed. Supports the rare case of RECFM_N, where records are variable length with no RDW or BDW headers.

RECFM_F

class stingray.estruct.RECFM_F(source: BinaryIO, lrecl: int | None = None)

Read RECFM=F files.

The schema’s record size is the lrecl, logical record length.

rdw_iter() Iterator[bytes]
Yields:

records with RDW injected, these look like RECFM_V format as a standard.

record_iter() Iterator[bytes]
Yields:

physical records, stripped of headers.

RECFM_FB

stingray.estruct.RECFM_FB

alias of RECFM_F

RECFM_N

class stingray.estruct.RECFM_N(source: BinaryIO, lrecl: int | None = None)

Read variable-length records without RDW (or BDW).

In the case of Occurs Depending On, the schema doesn’t have single, fixed size. The client of this class announces how the bytes were actually used.

record_iter() Iterator[bytes]

Provides the entire buffer. The first bytes are a record.

The used() method informs this object how many bytes were used. From this, the next record can be returned.

Yields:

blocks of bytes.

RECFM_V

class stingray.estruct.RECFM_V(source: BinaryIO, lrecl: int | None = None)

Read RECFM=V files.

The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data.

rdw_iter() Iterator[bytes]
Yields:

records which include the 4-byte RDW.

record_iter() Iterator[bytes]
Yields:

records, stripped of RDW’s.

RECFM_VB

class stingray.estruct.RECFM_VB(source: BinaryIO, lrecl: int | None = None)

Read RECFM=VB files.

The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data. Each block has a 4-byte Block Descriptor Word (BDW) followed by records.

bdw_iter() Iterator[bytes]
Yields:

blocks, which include 4-byte BDW and records with 4-byte RDW’s.

rdw_iter() Iterator[bytes]
Yields:

records which include the 4-byte RDW.

record_iter() Iterator[bytes]
Yields:

records, stripped of RDW’s.

Representation

class stingray.estruct.Representation(usage: str, picture_elements: list[dict[str, str]], picture_size: int)

COBOL Representation Details: Usage and Picture.

This is used internally by unpack() and calcsize().

>>> r = Representation.parse("USAGE DISPLAY PICTURE S9(5)V99")
>>> r
Representation(usage='DISPLAY', picture_elements=[{'sign': 'S'}, {'digit': '99999'}, {'decimal': 'V'}, {'digit': '99'}], picture_size=8)
>>> r.pattern
'[ +-]?\\d\\d\\d\\d\\d\\d\\d'
>>> r.digit_groups
['S', '99999', 'V', '99']
property digit_groups: list[str]

Parse the Picture into details: [sign, whole, separator, fraction] groups.

static normalize_picture(source: str) list[dict[str, str]]

Normalizes the PIC clause into a sequence of component details. This extracts sign, editing characters in char, the decimal place in decimal, any repeated picture characters with x(n), and any non-repeat-count picture characters.

The repeat count items are normalized into non-repeat-count. 9(5) becomes 99999.

Parameters:

source – The string value of a PICTURE clause

Returns:

a list of dictionaries that decomposes the picture

classmethod parse(format: str) Representation

Parse the COBOL DDE information. Extract the USAGE and PICTURE details to create a Representation object.

Parameters:
  • cls – the class being created a subclass of Representation

  • format – the format specification string

Returns:

An instance of the requested class.

property pattern: str

Summarize Picture Clause as a regexp to validate data.

picture_elements: list[dict[str, str]]

The decomposed PIC clause, created by the normalize_picture() method.

picture_size: int

Summary sizing information.

usage: str

The usage text, words like DISPLAY or COMPUTATIONAL or any of the numerous variants.

property zoned_decimal: bool

Examine the digit groups to see if this is purely numeric.

DesignError

exception stingray.estruct.DesignError

This is a catastrophic design problem. A common root cause is a named REGEX capture clause that’s not properly handled by a class, method, or function.