analysis – Track Analysis Application

The analysis application is used to do voyage analysis. This can make a more useful log from manually prepared log data or tracks dumped from OpenCPN (or iNavX.) It’s helpful to add the distance run between waypoints as well as the elapsed time between waypoints.

This additional waypoint-to-waypoint time and distance allows for fine-grained analysis of sailing performance.

Here’s the structure of this application

@startuml
component analysis {
    class LogEntry {
    }
    class LogEntry_Rhumb {
        point : LogEntry
    }
    LogEntry_Rhumb *-> LogEntry
}
@enduml

This module includes several groups of components.

  • The Input Parsing group is the functions and classes that acquire input from the GPX or CSV file.

  • The Computations functions work out range and bearing, magnetic bearing, total distance run, and elapsed time in minutes and hours.

  • The Output Writing group is the functions to write the CSV result.

  • Finally, the Command-Line Interface components are used to build a proper command-line application.

Input Parsing

The purpose of input parsing is to create LogEntry objects from input file sources.

Manually prepared data will be a CSV in the following form

Time,Lat,Lon,COG,SOG,Rig,Engine,windAngle,windSpeed,Location
9:21 AM,37 50.424N,076 16.385W,None,0,None,1200 RPM,,,Cockrell Creek
10:06 AM,37 47.988N,076 16.056W,None,6.6,None,1500 RPM,315,7.0,

This is essentially a deck log entry: time, lat, lon, course over ground, speed over ground, rig configuration, engine RPM, wind information, and any additional notes on the location.

The times for the manual entry are generally local.

iNavX track has the following format for CSV extract

2011-06-04 13:12:32 +0000,37.549225,-76.330536,219,3.6,,,,,,
2011-06-04 13:12:43 +0000,37.549084,-76.330681,186,3.0,,,,,,

The columns with data include date, latitude, longitude, cog, sog, heading, speed, depth, windAngle, windSpeed, comment. Not all of these fields are populated unless OpenCPN (or iNavX) gets an instrument feed.

iNavX track has the following format for GPX extract

<?xml version="1.0" encoding="utf-8"?>
<gpx version="1.1" creator="iNavX"
xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<trk>
<name>TRACK_041812</name>
<trkseg>
<trkpt lat="37.549225" lon="-76.330536">
<time>2011-06-04T13:12:32Z</time>
</trkpt>
<trkpt lat="37.549084" lon="-76.330681">
<time>2011-06-04T13:12:43Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>

The iPhone iNavX can save track information via http://x-traverse.com/. These are standard GPX files, and are identical with the tracks created directly by iNavX.

In effect, the GPX file is a sequence of lat, lon, time triples. This is vaguely similar to the CSV file, but with a slightly different schema.

Date parsing

Parses input dates in a variety of formats.

For more complex situations (i.e., multi-day voyages with partial date-time stamps in the log) this function isn’t quite appropriate. This assumes a single date field is sufficient to fill in all attributes of a date. In some cases, we might have timestamps that roll past midnight without an obvious indicator of date change. Or, we might have some notation like “d1, d2” or “+1d, +2d”.

March through all of the known date formats until we find one that works.

Parameters
  • date – string in some known format

  • default – default date to use when only a time is given, otherwise “today()”

Returns

datetime

March through all of the known date formats until we find one that works.

Parameters
  • date – string in some known format

  • default – default date to use when only a time is given, otherwise “today()”

Returns

datetime

Base Log Entry

A point on a track.

The source_row is the source data. For CSV files, it’s untouched. For GPX files, it’s lightly massaged to flatten out the attributes of the <trkpt> tag.

This is similar to a Waypoint in a plan. The differences are minor. Log Entries generally lack names; they’re not named points, they’re just a piece of data at a point in time. Consequently, log entries always have a timestamp.

CSV input parsing

There are two formats:

  • Standard (i.e., OpenCPN). These files have no header. This function returns the assumed header which must be provided to build a DictReader.

  • Manual. These files must include a heading row for it to be processed. This function returns True. A DictReader can then use the headers that are found.

A heading row must use labels drawn from this domain of known labels:

"date", "latitude", "longitude", "cog", "sog", "heading", "speed",
"depth", "windAngle", "windSpeed", "comment"

This leads to two, separate, csv readers for

  • OpenCPN files without headers; a default header is assumed. See GPS_NAVX_HEADER.

  • Manual files with headers from the defined set of headers.

Parameters

source – Open File

Returns

DictReader instance with the headers present or a default set of headers

Parses a CSV file to yield an iterable sequence of LogEntry objects.

Headers must be provided, otherwise GPS_NAVX_HEADERS will be assumed. This subclass does a complex header-matching dance. This is not optimal. Here are the two cases:

  • GPS NavX doesn’t provide headers. The list GPS_NAVX_HEADER is used as an external schema.

  • Other sources may use [‘Time’, ‘Lat’, ‘Lon’, ‘COG’, ‘SOG’, ‘Rig’, ‘Engine’, ‘windAngle’, ‘windSpeed’, …

We look for headers which are close match in name, irrespective of case.

  • date or time,

  • something starting with lat,

  • something starting with lon.

A better approach to sniffing headers and then locating a subclass of LogEntryReader would be better.

These are the minimum required to compute distance and duration.

Parameters
  • reader – csv.DictReader with proper headers

  • datedatetime.datetime object used to fill in default values for incomplete dates. By default, it’s “now”.

Returns

An iterator over LogEntry objects.

GPX input parsing

Generates LogEntry onjects from a GPX doc. These should perhaps be called “TrackPoints” to better match the GPX tags.

We assume a minimal schema:

  • <trk> contains

    • <trkseg> contains

      • <trkpt lat="" lon=""> contains

        • <time> ISO format timestamp

        • Any other tag values and attribute are preserved as the “source row”

Parameters

source – an open XML file.

Returns

An iterator over LogEntry objects.

Computations

There aren’t many: it’s essentially deducing of range and bearing from log entries. These are part of navigation.

Log Entry With Derived Details

The raw point plus the distance, bearing, and delta-time to the next waypoint.

As a special case, a final waypoint will have no additional distance, bearing, or delta-time.

Alias for field number 0

Alias for field number 1

Alias for field number 2

Alias for field number 3

Return a new dict which maps field names to their values.

Make a new LogEntry_Rhumb object from a sequence or iterable

Return a new LogEntry_Rhumb object replacing specified fields with new values

Computing Details

Transforms a sequence of LogEntry instances into LogEntry_Rhumb instances. The rhumb line distance, bearing, and delta-time are added to each entry.

Each point has range and bearing to the next point. The last point as no range and bearing.

Parameters

log_entry_iter – iterable sequence of LogEntry instances. This can be produced by gpx_to_LogEntry() or csv_to_LogEntry().

Returns

iterable sequence of LogEntry_Rhumb instances.

Output Writing

The waypoints with range and bearing information are written to a CSV file.

Returns a rounded value, properly honoring None objects.

Writes a sequence of LogEntry_Rhumb objects to a given target file. The objects are usually built by the gen_rhumb() function.

Since the source data has a poorly-defined set of columns, we emit just a few additional attributes joined onto the original, untouched row.

Parameters
  • target – File to which to write the analyzed rows.

  • log_entry_rhumb_iter – iterable sequence of LogEntry_Rhumb instances. This is often the output of gen_rhumb().

  • source_headers – Headers from source to which additional details are added.

Note that we apply some rounding rules to these values before writing them to a CSV file. The distances are rounded to \(10^{-5}\) which is about an inch, or 2 cm: more accurate than the GPS position. The bearing is rounded to an 0 places.

Command-Line Interface

Typical use cases for this module include the following:

  • Command Line:

    python -m navtools.analysis '../../Sailing/Cruise History/2011 Reedville/reedville.csv'
    
  • Within a Python Script:

    from navtools.analysis import analyze
    from pathlib import Path
    history = Path("/path/to/history")
    analyze(history/"2011 Reedville"/"jackson.csv", 5.0)
    

The analyze() application

Analyze a log file, writing a new log file with additional calculated values.

The gen_rhumb() calculation is applied to each row.

Parameters
  • log_filepath – Path of a log file to analyze. If the input is some_name.csv or some_name.gpx the output will be some_name Distance.csv.

  • date – Default date to use when incomplete date-time fields are present in the input.

The main() CLI

Parse command-line arguments to get the log file names and the default date to use for partial date strings.

Then use analyze() to process each file, creating a name Distance.csv output file with the detailed analysis.