stingray.estruct¶
estruct – Unpack bytes with EBCDIC encodings
The estruct
module unpacks EBCDIC-encoded values. It is a big-endian version of the struct
module.
It uses two COBOL DDE clauses, USAGE
and PIC
, to describe the format of data represented by a sequence of bytes.
Unpacking and Sizing¶
The format string is a COBOL DDE. The USAGE
and PIC
(also spelled PICTURE
) clauses are required,
the rest of the DDE is quietly ignored.
For example, 'USAGE DISPLAY PIC S999.99'
, is the minimum to describe a textual value that occupies
7 bytes.
The unpack()
uses the format string to unpack bytes into useful Python values.
As with the built-in struct.unpack()
, the result is always a tuple even if
it has a single value.
The calcsize()
functions uses the format string to compute the size of a value.
This can be applied to a DDE to compute the offsets and positions of each field.
Note
Alternative Format Strings
The struct
module uses a compact format string describe data.
This string is used unpack text, int, and float values from a sequence of bytes.
See https://docs.python.org/3/library/struct.html#format-characters.
An alternative interface for this module could be to use single-letter codes.
For example:
15x
for display.f
andd
for COMP-1 and COMP-2.9.2p
forPIC 9(9)V99
packed decimal COMP-3.9.2n
for zoned decimal text, DISPLAY instead of computational.h
,i
, andl
for COMP-4 variants.
This seems needless, but it is compact and somewhat more compatible with the
struct
module.
Examples:
>>> import stingray.estruct
>>> stingray.estruct.unpack("USAGE DISPLAY PIC S999V99", ' 12345'.encode("cp037"))
(Decimal('123.45'),)
>>> stingray.estruct.unpack("USAGE DISPLAY PIC X(5)", 'ABCDE'.encode("cp037"))
('ABCDE',)
>>> stingray.estruct.calcsize("USAGE COMP-3 PIC S9(11)V9(2)")
7
File Reading¶
An EBCDIC file can leverage physical “Record Format” (RECFM) assistance. These classes define a number of Z/OS RECFM conversion functions. We recognize four actual RECFM’s plus an additional special case.
F
- Fixed.RECFM_F
FB
- Fixed Blocked.RECFM_FB
V
- Variable, each record is preceded by a 4-byte Record Description Word (RDW).RECFM_V
VB
- Variable Blocked. Blocks have Block Description Word (BDW); each record within a block has a Record Description Word.RECFM_VB
N
- Variable, but without BDW or RDW words. This involves some buffer management magic to recover the records properly. This is required to handleOccurs Depending On
cases where there’s no V or VB header. This requires the consumer of bytes to announce how many bytes were consumed so the reader can advance an appropriate amount.RECFM_N
Each of these has a RECFM_Reader.record_iter()
iterator that emits records stripped of header word(s).
with some_path.open('rb') as source:
for record in RECFM_FB(source, lrecl=80).record_iter():
process(record)
Note
IBM z/Architecture mainframes are all big-endian
COBOL Picture Parsing¶
The Representation
object provides representation details
based on COBOL syntax. This is used by the Struct Unpacker (schema_instance.Struct
) as well as the
EBCDIC Unpacker (schema_instance.EBCDIC
).
In principle, this might be a separate thing, or might be part of the cobol_parser
module.
For now, it’s here and is reused by schema_instance
.
Calcsize Function¶
- stingray.estruct.calcsize(format: str) int ¶
Compute the size, in bytes for an elementary (non-group-level) COBOL DDE format specification.
- Parameters:
format – The COBOL
DISPLAY
andPIC
clauses.- Returns:
integer size of the item in bytes.
Unpack Function¶
- stingray.estruct.unpack(format: str, buffer: bytes) tuple[Any, ...] ¶
Unpack EBCDIC bytes given a COBOL DDE format specification and a buffer of bytes.
USAGE DISPLAY special case: “external decimal” sometimes called “zoned decimal”. The PICTURE character-string of an external decimal item can contain only:
One or more of the symbol 9
The operational-sign, S
The assumed decimal point, V
One or more of the symbol P
External decimal items with USAGE DISPLAY are sometimes referred to as zoned decimal items. Each digit of a number is represented by a single byte. The 4 high-order bits of each byte are zone bits; the 4 high-order bits of the low-order byte represent the sign of the item. The 4 low-order bits of each byte contain the value of the digit.
- Parameters:
format – A format string; a COBOL DDE.
buffer – A bytes object with a value to be unpacked.
- Returns:
A Python object
RECFM_Reader¶
- class stingray.estruct.RECFM_Reader(source: BinaryIO, lrecl: int | None = None)¶
Reads records based on a physical file format.
A subclass can handle details of the various kinds of Block and Record Descriptor Words (BDW, RDW) present a specific format.
- abstract record_iter() Iterator[bytes] ¶
Yields each physical record, stripped of headers.
- used(size: int) None ¶
Used by a row to announce the number of bytes consumed. Supports the rare case of RECFM_N, where records are variable length with no RDW or BDW headers.
RECFM_F¶
- class stingray.estruct.RECFM_F(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=F files.
The schema’s record size is the lrecl, logical record length.
- rdw_iter() Iterator[bytes] ¶
- Yields:
records with RDW injected, these look like RECFM_V format as a standard.
- record_iter() Iterator[bytes] ¶
- Yields:
physical records, stripped of headers.
RECFM_FB¶
RECFM_N¶
- class stingray.estruct.RECFM_N(source: BinaryIO, lrecl: int | None = None)¶
Read variable-length records without RDW (or BDW).
In the case of
Occurs Depending On
, the schema doesn’t have single, fixed size. The client of this class announces how the bytes were actually used.- record_iter() Iterator[bytes] ¶
Provides the entire buffer. The first bytes are a record.
The
used()
method informs this object how many bytes were used. From this, the next record can be returned.- Yields:
blocks of bytes.
RECFM_V¶
- class stingray.estruct.RECFM_V(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=V files.
The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data.
- rdw_iter() Iterator[bytes] ¶
- Yields:
records which include the 4-byte RDW.
- record_iter() Iterator[bytes] ¶
- Yields:
records, stripped of RDW’s.
RECFM_VB¶
- class stingray.estruct.RECFM_VB(source: BinaryIO, lrecl: int | None = None)¶
Read RECFM=VB files.
The schema’s record size is irrelevant. Each record has a 4-byte Record Descriptor Word (RDW) followed by the data. Each block has a 4-byte Block Descriptor Word (BDW) followed by records.
- bdw_iter() Iterator[bytes] ¶
- Yields:
blocks, which include 4-byte BDW and records with 4-byte RDW’s.
- rdw_iter() Iterator[bytes] ¶
- Yields:
records which include the 4-byte RDW.
- record_iter() Iterator[bytes] ¶
- Yields:
records, stripped of RDW’s.
Representation¶
- class stingray.estruct.Representation(usage: str, picture_elements: list[dict[str, str]], picture_size: int)¶
COBOL Representation Details: Usage and Picture.
This is used internally by
unpack()
andcalcsize()
.>>> r = Representation.parse("USAGE DISPLAY PICTURE S9(5)V99") >>> r Representation(usage='DISPLAY', picture_elements=[{'sign': 'S'}, {'digit': '99999'}, {'decimal': 'V'}, {'digit': '99'}], picture_size=8) >>> r.pattern '[ +-]?\\d\\d\\d\\d\\d\\d\\d' >>> r.digit_groups ['S', '99999', 'V', '99']
- property digit_groups: list[str]¶
Parse the Picture into details: [sign, whole, separator, fraction] groups.
- static normalize_picture(source: str) list[dict[str, str]] ¶
Normalizes the
PIC
clause into a sequence of component details. This extractssign
, editing characters inchar
, the decimal place indecimal
, any repeated picture characters withx(n)
, and any non-repeat-count picture characters.The repeat count items are normalized into non-repeat-count.
9(5)
becomes99999
.- Parameters:
source – The string value of a PICTURE clause
- Returns:
a list of dictionaries that decomposes the picture
- classmethod parse(format: str) Representation ¶
Parse the COBOL DDE information. Extract the
USAGE
andPICTURE
details to create aRepresentation
object.- Parameters:
cls – the class being created a subclass of
Representation
format – the format specification string
- Returns:
An instance of the requested class.
- property pattern: str¶
Summarize Picture Clause as a regexp to validate data.
- picture_elements: list[dict[str, str]]¶
The decomposed
PIC
clause, created by thenormalize_picture()
method.
- picture_size: int¶
Summary sizing information.
- usage: str¶
The usage text, words like
DISPLAY
orCOMPUTATIONAL
or any of the numerous variants.
- property zoned_decimal: bool¶
Examine the digit groups to see if this is purely numeric.
DesignError¶
- exception stingray.estruct.DesignError¶
This is a catastrophic design problem. A common root cause is a named REGEX capture clause that’s not properly handled by a class, method, or function.