~mil/mobsql

v0.9.0 27 days ago .tar.gz browse log

This release was focused largely on documentation updates for the Go
library & CLI. In addition, a few small updates involving loading
& caching were made in order to address allowing users to get more
up-to-date GTFS data. Direct agency URLs as specified by the Mobility
Database can now be used in imports (as opposed to using Mobility
Database mirrored GTFS URLs). Note, the database schema has changed
and its recommended to clear your cache between version updates - see
(5). See below for a detailed list of changes for this release:

1. Library & CLI Documentation
The major update in this release is the rework of both the user-facing
CLI documentation and Go library documentation. For the Go library
side of things: all public types, structs, & functions in the top
level git.sr.ht/~mil/mobsql package now have proper docstrings. And
additionally, example usages have been added to the `examples/`
folder. Meanwhile, for the CLI - the userguide has been split from the
CLI doc. Wherein the former provides walkthrough style CLI examples and
the latter provides the CLI full documentation as you get from the `-h`
flag for each subcommand on the `mobsql` CLI.

Additionally, important to note: a new documentation website for the
larger Mobroute project is available and Mobsql's documentation is
published to this site at: https://mr.lrdu.org/mobsql

Go documentation is available at:
  - Go doc: https://pkg.go.dev/git.sr.ht/~mil/mobsql?tab=versions
  - Go examples: https://git.sr.ht/~mil/mobsql/tree/master/item/examples

And updated CLI documentation is available at:
  - CLI userguide: https://mr.lrdu.org/mobsql/doc/userguide_cli/
  - CLI doc: https://mr.lrdu.org/mobsql/doc/doc_cli/

2. Downloads File Caching Rework: Timestamps & Partial Download Bugfix
The filecaching system for HTTP / GTFS downloads has been reworked to
incorporate timestamp support and now defaults to a file-expiry of 1
week. This means past 1 week, attempting to load the Mobility Database CSV
and/or GTFS archives will automatically trigger a redownload to ensure
the most up to date data. This change provides a good pragmatic default
for end CLI & library users and supplements additional new functionality
which allows users to also explicitly update both the Mobility Database &
GTFS archive from source in load (see next section).

In addition to the newly added timestamp support for filecaching;
a longstanding bug related to interrupting filedownloads has been
resolved. Previously if you killed the mobsql process in the middle of
a file download, attempting to download that same file would throw an
error like: `zip: not a valid zip file`. The reason for this was that
the destination file was written to directly and thus the caching system
would think the file was already downloaded (and thus try to load an
invalid incomplete downloaded zip file). This logic has been replaced
and now a partial file is downloaded to and upon download completion a
simple rename / move is executed.

3. FeedLoadMDBGTFS Rename & New Flags: update & agencyDirectUrl
The former FeedLoad function has been renamed to FeedLoadMDBGTFS
to match and have parity with FeedLoadCustomGTFS. Also, in addition
the FeedLoadMDBGTFS function has had two new flags added: update and
agencyDirectURL.

The update flag allows the user to explicitly request that both the
Mobility Database CSV is downloaded on request (rather then using the
system once-weekly updated Mobility Database CSV) and also the GTFS
data for the feed is redownloaded (ignoring once-weekly caching). This
flag is very helpful for feed updates to ensure the latest GTFS data as
indicated by Mobility Database's catalog is being utilized.

And secondly, the agencyDirectURL flag allows the user to indicate that
direct 'agency' URLs should be used in downloads for GTFS feeds. By
default Mobility Database's "latest" URLs are used which point to cloud
bucket CI-generated links. These links are periodically updated but
can occasionally go stale. Allowing the user to use direct agency URLs
can thus bypass the CI-generated links mechanism and allow the user to
get more up-to-date GTFS data (at the cost of agencies having differing
stability guarantees and usage policies). See the go doc or CLI doc for
more details.

4. Removal of Native Go CSV Loading Logic
In the v0.8.0 release, the native Go CSV loading logic was replaced by
logic using SQLite's CSV (virtual table) extension; however the code
for the original native Go CSV logic remained in place. In this update,
the Go CSV loading logic has been *completely* removed and now its only
possible to load CSV data via SQLite's CSV (virtual table) extension
mechanism. This ensure consistent UX & was a small bit of technical debt
which is now removed.

5. Schema Updates
A few small schema updates have been applied for naming & code structure
consistency. It's recommended to clear your database & cache (rm
`~/.cache/mobroute`) before updating from v0.8.* to v0.9.0. Apologies
for the manual user-intervention; long-term I am considering investing
in a standard SQL migration process.

v0.8.3 2 months ago .tar.gz browse log

Minor patch release to address two issues:
  - Add index on stops (feed_id, parent_station, location_type)
  - Add index on agency (agency_timezone); used in core connections loading

v0.8.2 2 months ago .tar.gz browse log

Minor patch release to fix one bug:
  - Fixes bug where H:MM:SS timestamps in stop_times parsed incorrectly
  - Adds unit tests for the same (both HH:MM:SS / H:MM:SS format)

v0.8.1 2 months ago .tar.gz browse log

Minor release to fix one bug:
  - Change convention for _vfeedsmetadata to check for computed tables as _ct*

v0.8.0 2 months ago .tar.gz browse log

This release of Mobsql focuses on a few tooling/usability updates and
also dramatically improves load performance for the CSV-to-SQLite import
process. See below for a detailed list of changes:

1. Show Size of GTFS Archives before Fetching in Logs
Before fetching GTFS archives, Mobsql now explicitly submits a HTTP
HEAD request to the origin GTFS URL to determine Content-Length; and
subsequently logs the archive size of the origin URL GTFS feed filesize
in Mb. This should help users better understand how long they may have
to wait for downloads to complete. In the future this logic could be
used to create a progress bar or similar in the download process.

2. Daterange Metadata for Imported GTFS Archives exposed in FeedStatus
Daterange metadata based on the GTFS calendar and calendar_dates tables
has been added to allow determining the min/max valid routing dates for
loaded GTFS archives. This allows users to quickly determine if a feed
is out-of-date (in which case the user should update) or if the feed is
up-to-date for valid routing. See the new fields min_date and max_date
in the FeedStatus result. For feeds that have not been loaded to the
database yet, FeedStatus will just omit these properties.

3. Computed Tables SQL Init & Deinit Logic Support
The computed tables mechanism (which allows end library users to define
arbitrary SQL select per GTFS feed ID to map to a table) has been modified
to allow instantiation (init) and destruction (deinit) logic. This allows
for computed tables to not just be directly mapped from SQL views but
also utilize temp & virtual tables in creation. Using init/deinit logic
enables performance optimizations not possible by just using SQL selects /
direct view mapping.

4. SQLite CSV Loading via Virtual Table / Performance Improvement
The core logic which loads GTFS source archive's CSVs to the SQLite
database has been wholesale reworked to use SQLite's CSV extension. This
extension allows exposing source CSV files as virtual tables. As such,
imports per archive per table take place now in a single & elegant `insert
into targettable select * from virtualtable` type of statement. This is
opposed to the previous logic which used a *large* transaction and many
insert statements for bulk loading CSV data.

The reason for this update is that it is vastly more efficient to
use the SQLite CSV extension then multiple inserts / batched into a
transaction. There is a conservative performance improvement estimate of
2-3x speedup for loading CSVs to the DB with this mechanism. Also note: as
of this release, sqlite-dev is added as a build dependency as the headers
are required for compilation for the CSV extension. Under the hood, the
CSV extension is built via CGO.  The former of CSV implementation exists
in the codebase for testing purposes but is considered deprecated and
will be removed in a future release. All tests in both Mobsql and Mobroute
downstream are passing with the new CSV loading logic all the same.

5. Ability to Set Customizable Logger in Runtime Initialization
In initializing a Mobsql runtime, the user may now specify a custom
logger (log.Logger) to override the default logger (which prints logs
to STDERR). This should help with downstream consumers & applications
which wish to inspect logs periodically between library calls.

v0.7.0 4 months ago .tar.gz browse log

This release of Mobsql adds custom GTFS loading support and additionally
reworks the CLI's overall UX. See below for a detailed list of changes.

1. Custom GTFS Loading Support
Custom GTFS archives load support has been added in the form the dedicated
FeedLoadCustom function and the CLI subcommand loadcustomgtfs. While
previously there was custom GTFS support for tests (by way of stubbing
the Mobility Database CSV); this new FeedLoadCustom command lets you
explicitly load a custom GTFS zip archive without any stubbing. You
can simply pass the location (either a file:// URI or http:// URI) to
the GTFS zip and the rest is covered. A negative feed ID is required to
be used to distinguish between Mobility Database feeds (feed IDs > 0)
and custom GTFS feeds (feed IDs < 0). All other functionality operating
on a specific feed ID (such as purge, status, compute, and similar)
work identically with either Mobility Database sourced GTFS feeds or
custom GTFS feeds.

2. CLI UX Rework & Documentation
The CLI has been wholesale reworked to be in the form of `mobsql
subcommand` format where the valid subcommands are: load, loadcustomgtfs,
purge, status, and feedsearch. This change matches the same convention on
the Mobroute CLI for consistency and provides a better UX. Previously a
'feed filter specification' was required to be passed on each CLI command;
now the feed filter to feed IDs functionality is just confined to the new
'feedsearch' subcommand and for all other GTFS-feed specific commands,
you pass directly feed IDs as is the case with the library usage itself.

Additionally, the CLI is much better documented now with examples usages and
documentation about each subcommand. Passing no arguments to the CLI for
each subcommand will provide documentation for the given subcommand.

See the updated CLI userguide for general usage:
https://git.sr.ht/~mil/mobsql/tree/master/doc/userguide_cli.md

v0.6.0 6 months ago .tar.gz browse log

This release of Mobsql was focused on a few small tooling & consistency
updates to aide in the release of Mobroute 0.6. See below for a detailed
list of changes.

1. License Passthrough
The license for each GTFS feed indicated by the Mobility Database is
now passed through as URLLicense (matching in convention the renamed
URLGTFS field) in the FeedStatusInfo struct. This change was motivated
by downstream consumers of Mobsql data (such as Transito) needing to
present license(s) to end-users so they may be aware of this information
explicitly before utilizing particular GTFS feeds.

2. MDBID & Source References Renamed to FeedID
All references to 'MDBID' and 'source' have been replaced with the
term 'FeedID'. The previously used MDBID and source terms were used
interchangeably and created ambiguity as to what the difference between
these two terms were. Replacing all references should make it clearer
both from the database & in the Go code what exactly is being referred
to. As well the FeedID term is more self-explanatory to end users then
either of the former terms.

3. Removal of YAML CLI Functionality
A small final update is that the CLI no longer accepts YAML buffers for
feed search filters (via -f) but rather only JSON. This matches a similar
change in the Mobroute CLI, provides consistency, and should make it
easier to test / validate requests in the future. This also removes the
'double-annotations' in the FeedsearchFilter struct and other parts of
the code which was required by providing both JSON & YAML compatibility.

v0.5.0 8 months ago .tar.gz browse log

This release of Mobsql added a few performance-related changes to aide
in the release of Mobroute 0.5. The largest change in this release is
a performance update to change loading GTFS data to use transactions
which improves load performance for large tables (and thus helps with
large GTFS feed archives & aggregates). Details on changes below:

1. Transaction-Based Insertions
The logic for loading GTFS data has been updated to perform multiple
inserts per transaction (as opposed to a single compound insert per
transaction). This improves performance considerably for larger tables
such as the stop_times tables which on some large feeds can be in the
millions of rows. This is one change in a series of related changes to
address (#19) to improve overall performance for importing GTFS data
for large sources and aggregates such as the Netherlands aggregate
(MDBID 1077) and similar.

2. Indexes Cleanup
Several indexes have been modified to allow for better performance
in downstream Mobroute. In the future it may make sense to move the
'specification' of which indexes should be created to Mobroute itself,
and make 'index specification' a part of Mobsql's API.. but for now,
these changes will proceed on a per-release basis in Mobsql.

3. Correct Null agency_id Bug
A small bug related to the agencies table for GTFS specification was
corrected. This bug effected the Calgary transit feed (MDBID 712)
preventing import of data. The way the GTFS specification was coded in
Mobsql's spec was incorrect and mistakenly disallowed agencies with a
null agency_id field. The specification is that agency_id can be null
in case there is a single row; and this edgecase has now been corrected.

4. Allow Arbitrary SQL Initialization in SchemaExtra
As part of the API for Mobsql, the SchemaExtra field now allows specifying
the InitExec field which can run arbitrary SQL at initialization. This
allows for end consumers to execute pragmas & similar performance related
tuning adjustments at boot.

v0.4.0 11 months ago .tar.gz browse log

This release of Mobsql saw: (1) a large API rework, (2) parameterization
of view & computed table logic, and (3) the addition of the ability to
use custom Mobility DB CSV & GTFS archives. See details on changes below.

1. API Rework: Mobsql Runtime & Filter-to-Sourceset Logic API Split
The API for Mobsql overall has been vastly reworked to allow more
flexibility from end-consumers perspective (the main consumer of the
Mobsql library's API being Mobroute currently). Two major changes to the
API were made: (A) Firstly, the API is now 'multistage' in that all API
functions (related to load, compute, purge, and status) now depend on a
`MobsqlRuntime`; which may be initialized with the new `InitializeRuntime`
function. The `MobsqlRuntime` allows the end user to store database
connection & configuration parameters related to usage. (B) Secondly,
filter-to-sourceset logic has been split into its own API (`apifilter`)
and compute/load/purge/status functionality now directly takes MDBIDs
rather then a filter as a parameter. Overall with these two changes (A)
and (B) the end-user API is substantially cleaner & more flexible for
consumers of Mobsql.

2. Parameterization of View & Computed Table Logic / Removal of Mobroute-Specific-Logic
A longstanding wart in Mobsql was that the codebase contained SQL views
(and computed tables) purely related to Mobroute's routing API (_vconnd,
_vtransferstbl, _vtransfersgen, etc.). This has been corrected in that all
SQL-view logic related to Mobroute alone have been moved to Mobroute's
codebase proper.  This change was facilitated by the addition of a new
`SchemaExtra` property definable on Mobsql's runtime initialization. This
changes allows for much cleaner integration with Mobroute and potentially
other applications which may want to build on Mobsql & use Mobsql as a
library in the future.

3. Allow Local GTFS Zip Imports (Parameterized Mobility DB CSV)
Previously Mobsql used a hardcoded URL to the Mobility Database's
catalogs CSV. This URL has been modified to be configurable via Mobsql's
Runtime initialization.  The URL used for the Mobility Database CSV,
now customizable, may be either a `http://` or `file://` URI. Passing
a `file://` URI you can use a mock Mobility Database URI on your
local filesystem. This functionality is present both in the CLI &
API. Additionally, adding on `file://` URI support for the Mobility
Database CSV location; you may also now pass `file://` URIs for GTFS zip
archive locations (effectively allowing loading arbitrary non-Mobility
Database sourced GTFS Zip archives). Full documentation for this
functionality will follow in future updates - for now see Mobroute's
routing API tests which depend on this newly built functionality.

v0.3.0 1 year, 24 days ago .tar.gz browse log

This release of Mobsql was focused on performance & compatibility updates
to support associated changes in Mobroute & Transito. Changes include
the addition of a view to map calendar dates to services, SQLite tuning
updates, misc performance & usability updates, and the addition of CI.

1. Addition of _vcaltoservice view to map calendar dates to services
The new _vcaltoservice view has been added which maps individual calendar
dates to the service(s) which are valid for the given day. This takes
into account both the raw GTFS calendar & calendar_dates tables as both
an either/or if one is present and the other missing, or both are present.
This is used by Mobroute's new routing API loading logic which only loads
connections with associated valid service id's for the given input time
in order to produce service respecting / accurate routes.

2. SQLite Tuning: Version pinning &  Performance updates (WAL & memory store)
A number of small SQLite tuning updates were made largely to improve
compatibility and performance on mobile / Android devices. Firstly the
SQLite required version has been pinned explicitly to target 3.28.0. A
query is executed in Mobsql's initialization logic and will cause Mobsql
to now explicitly error out if used with a SQLite version <3.28.0. This
does mean Android 11 / R is required as a minimum version for Mobsql,
Mobroute, and Transito to be functional. This version was chosen as a
minimum requirement as this is the first SQLite version to support window
functions which are made heavy use of in routing timetable calculations.
Other then that strict tables support was dropped as that would require
atleast SQLite 3.37.0.

And secondly two performance related updates have been made, namely
enabling WAL mode by default and using memory for the temp store rather
then disk. Using memory for the temp store fixes issues on mobile devices
with small amounts of disk / tempdir space in creating computed tables;
and WAL mode enables general better IO performance overall.

3. Misc Performance & Usability Updates: Indexes, Active field, Checksums
Several other minor performance & usability related updates have been
applied. Firstly, with respect to performance, indexes used by Mobroute's
routing API have been reexamined to optimize speed of extraction and as
much as possible avoid full table scans. There is more work to be done
on indexes, especially in regards to generated transfers, however the
work done in this pass for overall timetable index optimization is a
good starting point.

Secondly with respect to usability, the 'active' field from Mobility
Database is now passed through, in the filter specification, which
enables Transito downstream to only show end users actively updated
sources. Also this is a helpful filter for end-users to determine which
feeds are 'good' / likely contain 'valid' GTFS data.

And finally with respect to usability, table checksums now include
megabytes in size (of the source tables imported from) which is quite
helpful in recomputing tables and debugging failed or slow operating
imports.

4. Continuous Integration
Continuous integration via srht has been added to the repository by way of
a checked in .build.yml file. Currently the build simply runs some simple
unit tests added earlier on in development, however in the future more
unit tests, automated benchmarks, and integration tests may also be added.
1 / 2

Branches

fs-timestamp-caching
computedtables-select
computed-table-sqlinitdeinit
sqlite-extension-csv
sql-transaction-insert
purge-computedgtfs-split
agency-id-default-value
functional-refactor
custom-mdb-catalog-gtfs