The Stingray Schema-Based File Reader

The Stingray Reader tackles four fundamental issues in processing a file’s data:

  • How are the bytes organized? What is the Physical Format?

  • Haw are the data objects organized? What is the Logical Layout?

  • What do the bytes mean? What is the Conceptual Content?

  • How can we assure ourselves that our applications will work with this file?

The questions arise because a file’s schema is not always bound to the file, nor is a schema clearly bound to an application program. For example, a spreadsheet that lacks column titles is devoid of useful logical or conceptual schema information. While a physical spreadsheet file format has an internal schema this is only bound to the spreadsheet processing tools, not the application data presented.

An application that reads spreadsheet data can treat each row as a list of objects, and process items by index within the row. In this case a schema is implicit in the code, even if it’s uninformative.

One goal of good software is to cope reasonably well with variability of user-supplied inputs. Providing data by spreadsheet workbook is often a desirable choice for users. Since workbook data can be tweaked manually, it may not have a simple, fixed schema or logical layout.

Data can be encoded in a number of physical formats. These formats include workbooks, which include XLS, XLSX, ODS files. Formats also include generic delimited files like CSV and JSON. CSV files have a number of dialets. Formats also include non-delimited files; those typically used by COBOL programs. We would like our applications to be independent of these physical formats, focusing on the logical layout.

Of course data can suffer from quality issues. We need to be assured that a file actually conforms to the expected schema.

Stingray Reader works well with common spreadsheets as well as COBOL files, allowing some uniformity in processing data from a variety of sources.

Contents

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Indices and Tables