stingray.estruct¶
estruct – Unpack bytes with EBCDIC encodings
The estruct module unpacks EBCDIC-encoded values. It is a big-endian version of the struct module.
It uses two COBOL DDE clauses, USAGE and PIC, to describe the format of data represented by a sequence of bytes.
Unpacking and Sizing¶
The format string is a COBOL DDE. The USAGE and PIC (also spelled PICTURE) clauses are required,
the rest of the DDE is quietly ignored.
For example, 'USAGE DISPLAY PIC S999.99', is the minimum to describe a textual value that occupies
7 bytes.
The unpack() uses the format string to unpack bytes into useful Python values.
As with the built-in struct.unpack(), the result is always a tuple even if
it has a single value.
The calcsize() functions uses the format string to compute the size of a value.
This can be applied to a DDE to compute the offsets and positions of each field.
Note
Alternative Format Strings
The struct module uses a compact format string describe data.
This string is used unpack text, int, and float values from a sequence of bytes.
See https://docs.python.org/3/library/struct.html#format-characters.
An alternative interface for this module could be to use single-letter codes.
For example:
15xfor display.fanddfor COMP-1 and COMP-2.9.2pforPIC 9(9)V99packed decimal COMP-3.9.2nfor zoned decimal text, DISPLAY instead of computational.h,i, andlfor COMP-4 variants.
This seems needless, but it is compact and somewhat more compatible with the
structmodule.
Examples:
>>> import stingray.estruct
>>> stingray.estruct.unpack("USAGE DISPLAY PIC S999V99", ' 12345'.encode("cp037"))
(Decimal('123.45'),)
>>> stingray.estruct.unpack("USAGE DISPLAY PIC X(5)", 'ABCDE'.encode("cp037"))
('ABCDE',)
>>> stingray.estruct.calcsize("USAGE COMP-3 PIC S9(11)V9(2)")
7
File Reading¶
An EBCDIC file can leverage physical “Record Format” (RECFM) assistance. These classes define a number of Z/OS RECFM conversion functions. We recognize four actual RECFM’s plus an additional special case.
F- Fixed.RECFM_FFB- Fixed Blocked.RECFM_FBV- Variable, each record is preceded by a 4-byte Record Description Word (RDW).RECFM_VVB- Variable Blocked. Blocks have Block Description Word (BDW); each record within a block has a Record Description Word.RECFM_VBN- Variable, but without BDW or RDW words. This involves some buffer management magic to recover the records properly. This is required to handleOccurs Depending Oncases where there’s no V or VB header. This requires the consumer of bytes to announce how many bytes were consumed so the reader can advance an appropriate amount.RECFM_N
Each of these has a RECFM_Reader.record_iter() iterator that emits records stripped of header word(s).
with some_path.open('rb') as source:
for record in RECFM_FB(source, lrecl=80).record_iter():
process(record)
Note
IBM z/Architecture mainframes are all big-endian
COBOL Picture Parsing¶
The Representation object provides representation details
based on COBOL syntax. This is used by the Struct Unpacker (schema_instance.Struct) as well as the
EBCDIC Unpacker (schema_instance.EBCDIC).
In principle, this might be a separate thing, or might be part of the cobol_parser module.
For now, it’s here and is reused by schema_instance.
Calcsize Function¶
- stingray.estruct.calcsize(format: str) int¶
Compute the size, in bytes for an elementary (non-group-level) COBOL DDE format specification.
- Parameters:
format – The COBOL
DISPLAYandPICclauses.- Returns:
integer size of the item in bytes.
Unpack Function¶
- stingray.estruct.unpack(format: str, buffer: bytes) tuple[Any, ...]¶
Unpack EBCDIC bytes given a COBOL DDE format specification and a buffer of bytes.
USAGE DISPLAY special case: “external decimal” sometimes called “zoned decimal”. The PICTURE character-string of an external decimal item can contain only:
One or more of the symbol 9
The operational-sign, S
The assumed decimal point, V
One or more of the symbol P
External decimal items with USAGE DISPLAY are sometimes referred to as zoned decimal items. Each digit of a number is represented by a single byte. The 4 high-order bits of each byte are zone bits; the 4 high-order bits of the low-order byte represent the sign of the item. The 4 low-order bits of each byte contain the value of the digit.
- Parameters:
format – A format string; a COBOL DDE.
buffer – A bytes object with a value to be unpacked.
- Returns:
A Python object
RECFM_Reader¶
- class stingray.estruct.RECFM_Reader(source: BinaryIO, lrecl: int | None = None)¶
Reads records based on a physical file format.
A subclass can handle details of the various kinds of Block and Record Descriptor Words (BDW, RDW) present a specific format.
- abstract record_iter() Iterator[bytes]¶
Yields each physical record, stripped of headers.
- used(size: int) None¶
Used by a row to announce the number of bytes consumed. Supports the rare case of RECFM_N, where records are variable length with no RDW or BDW headers.
RECFM_F¶
- class stingray.estruct.RECFM_F(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=F files.
The schema’s record size is the lrecl, logical record length.
- rdw_iter() Iterator[bytes]¶
- Yields:
records with RDW injected, these look like RECFM_V format as a standard.
- record_iter() Iterator[bytes]¶
- Yields:
physical records, stripped of headers.
RECFM_FB¶
RECFM_N¶
- class stingray.estruct.RECFM_N(source: BinaryIO, lrecl: int | None = None)¶
Read variable-length records without RDW (or BDW).
In the case of
Occurs Depending On, the schema doesn’t have single, fixed size. The client of this class announces how the bytes were actually used.- record_iter() Iterator[bytes]¶
Provides the entire buffer. The first bytes are a record.
The
used()method informs this object how many bytes were used. From this, the next record can be returned.- Yields:
blocks of bytes.
RECFM_V¶
- class stingray.estruct.RECFM_V(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=V files.
The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data.
- rdw_iter() Iterator[bytes]¶
- Yields:
records which include the 4-byte RDW.
- record_iter() Iterator[bytes]¶
- Yields:
records, stripped of RDW’s.
RECFM_VB¶
- class stingray.estruct.RECFM_VB(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=VB files.
The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data. Each block has a 4-byte Block Descriptor Word (BDW) followed by records.
- bdw_iter() Iterator[bytes]¶
- Yields:
blocks, which include 4-byte BDW and records with 4-byte RDW’s.
- rdw_iter() Iterator[bytes]¶
- Yields:
records which include the 4-byte RDW.
- record_iter() Iterator[bytes]¶
- Yields:
records, stripped of RDW’s.
Representation¶
- class stingray.estruct.Representation(usage: str, picture_elements: list[dict[str, str]], picture_size: int)¶
COBOL Representation Details: Usage and Picture.
This is used internally by
unpack()andcalcsize().>>> r = Representation.parse("USAGE DISPLAY PICTURE S9(5)V99") >>> r Representation(usage='DISPLAY', picture_elements=[{'sign': 'S'}, {'digit': '99999'}, {'decimal': 'V'}, {'digit': '99'}], picture_size=8) >>> r.pattern '[ +-]?\\d\\d\\d\\d\\d\\d\\d' >>> r.digit_groups ['S', '99999', 'V', '99']
- property digit_groups: list[str]¶
Parse the Picture into details: [sign, whole, separator, fraction] groups.
- static normalize_picture(source: str) list[dict[str, str]]¶
Normalizes the
PICclause into a sequence of component details. This extractssign, editing characters inchar, the decimal place indecimal, any repeated picture characters withx(n), and any non-repeat-count picture characters.The repeat count items are normalized into non-repeat-count.
9(5)becomes99999.- Parameters:
source – The string value of a PICTURE clause
- Returns:
a list of dictionaries that decomposes the picture
- classmethod parse(format: str) Representation¶
Parse the COBOL DDE information. Extract the
USAGEandPICTUREdetails to create aRepresentationobject.- Parameters:
cls – the class being created a subclass of
Representationformat – the format specification string
- Returns:
An instance of the requested class.
- property pattern: str¶
Summarize Picture Clause as a regexp to validate data.
- picture_elements: list[dict[str, str]]¶
The decomposed
PICclause, created by thenormalize_picture()method.
- picture_size: int¶
Summary sizing information.
- usage: str¶
The usage text, words like
DISPLAYorCOMPUTATIONALor any of the numerous variants.
- property zoned_decimal: bool¶
Examine the digit groups to see if this is purely numeric.
DesignError¶
- exception stingray.estruct.DesignError¶
This is a catastrophic design problem. A common root cause is a named REGEX capture clause that’s not properly handled by a class, method, or function.