This release was focused largely on documentation updates for the Go library & CLI. In addition, a few small updates involving loading & caching were made in order to address allowing users to get more up-to-date GTFS data. Direct agency URLs as specified by the Mobility Database can now be used in imports (as opposed to using Mobility Database mirrored GTFS URLs). Note, the database schema has changed and its recommended to clear your cache between version updates - see (5). See below for a detailed list of changes for this release: 1. Library & CLI Documentation The major update in this release is the rework of both the user-facing CLI documentation and Go library documentation. For the Go library side of things: all public types, structs, & functions in the top level git.sr.ht/~mil/mobsql package now have proper docstrings. And additionally, example usages have been added to the `examples/` folder. Meanwhile, for the CLI - the userguide has been split from the CLI doc. Wherein the former provides walkthrough style CLI examples and the latter provides the CLI full documentation as you get from the `-h` flag for each subcommand on the `mobsql` CLI. Additionally, important to note: a new documentation website for the larger Mobroute project is available and Mobsql's documentation is published to this site at: https://mr.lrdu.org/mobsql Go documentation is available at: - Go doc: https://pkg.go.dev/git.sr.ht/~mil/mobsql?tab=versions - Go examples: https://git.sr.ht/~mil/mobsql/tree/master/item/examples And updated CLI documentation is available at: - CLI userguide: https://mr.lrdu.org/mobsql/doc/userguide_cli/ - CLI doc: https://mr.lrdu.org/mobsql/doc/doc_cli/ 2. Downloads File Caching Rework: Timestamps & Partial Download Bugfix The filecaching system for HTTP / GTFS downloads has been reworked to incorporate timestamp support and now defaults to a file-expiry of 1 week. This means past 1 week, attempting to load the Mobility Database CSV and/or GTFS archives will automatically trigger a redownload to ensure the most up to date data. This change provides a good pragmatic default for end CLI & library users and supplements additional new functionality which allows users to also explicitly update both the Mobility Database & GTFS archive from source in load (see next section). In addition to the newly added timestamp support for filecaching; a longstanding bug related to interrupting filedownloads has been resolved. Previously if you killed the mobsql process in the middle of a file download, attempting to download that same file would throw an error like: `zip: not a valid zip file`. The reason for this was that the destination file was written to directly and thus the caching system would think the file was already downloaded (and thus try to load an invalid incomplete downloaded zip file). This logic has been replaced and now a partial file is downloaded to and upon download completion a simple rename / move is executed. 3. FeedLoadMDBGTFS Rename & New Flags: update & agencyDirectUrl The former FeedLoad function has been renamed to FeedLoadMDBGTFS to match and have parity with FeedLoadCustomGTFS. Also, in addition the FeedLoadMDBGTFS function has had two new flags added: update and agencyDirectURL. The update flag allows the user to explicitly request that both the Mobility Database CSV is downloaded on request (rather then using the system once-weekly updated Mobility Database CSV) and also the GTFS data for the feed is redownloaded (ignoring once-weekly caching). This flag is very helpful for feed updates to ensure the latest GTFS data as indicated by Mobility Database's catalog is being utilized. And secondly, the agencyDirectURL flag allows the user to indicate that direct 'agency' URLs should be used in downloads for GTFS feeds. By default Mobility Database's "latest" URLs are used which point to cloud bucket CI-generated links. These links are periodically updated but can occasionally go stale. Allowing the user to use direct agency URLs can thus bypass the CI-generated links mechanism and allow the user to get more up-to-date GTFS data (at the cost of agencies having differing stability guarantees and usage policies). See the go doc or CLI doc for more details. 4. Removal of Native Go CSV Loading Logic In the v0.8.0 release, the native Go CSV loading logic was replaced by logic using SQLite's CSV (virtual table) extension; however the code for the original native Go CSV logic remained in place. In this update, the Go CSV loading logic has been *completely* removed and now its only possible to load CSV data via SQLite's CSV (virtual table) extension mechanism. This ensure consistent UX & was a small bit of technical debt which is now removed. 5. Schema Updates A few small schema updates have been applied for naming & code structure consistency. It's recommended to clear your database & cache (rm `~/.cache/mobroute`) before updating from v0.8.* to v0.9.0. Apologies for the manual user-intervention; long-term I am considering investing in a standard SQL migration process.
Minor patch release to address two issues: - Add index on stops (feed_id, parent_station, location_type) - Add index on agency (agency_timezone); used in core connections loading
Minor patch release to fix one bug: - Fixes bug where H:MM:SS timestamps in stop_times parsed incorrectly - Adds unit tests for the same (both HH:MM:SS / H:MM:SS format)
Minor release to fix one bug: - Change convention for _vfeedsmetadata to check for computed tables as _ct*
This release of Mobsql focuses on a few tooling/usability updates and also dramatically improves load performance for the CSV-to-SQLite import process. See below for a detailed list of changes: 1. Show Size of GTFS Archives before Fetching in Logs Before fetching GTFS archives, Mobsql now explicitly submits a HTTP HEAD request to the origin GTFS URL to determine Content-Length; and subsequently logs the archive size of the origin URL GTFS feed filesize in Mb. This should help users better understand how long they may have to wait for downloads to complete. In the future this logic could be used to create a progress bar or similar in the download process. 2. Daterange Metadata for Imported GTFS Archives exposed in FeedStatus Daterange metadata based on the GTFS calendar and calendar_dates tables has been added to allow determining the min/max valid routing dates for loaded GTFS archives. This allows users to quickly determine if a feed is out-of-date (in which case the user should update) or if the feed is up-to-date for valid routing. See the new fields min_date and max_date in the FeedStatus result. For feeds that have not been loaded to the database yet, FeedStatus will just omit these properties. 3. Computed Tables SQL Init & Deinit Logic Support The computed tables mechanism (which allows end library users to define arbitrary SQL select per GTFS feed ID to map to a table) has been modified to allow instantiation (init) and destruction (deinit) logic. This allows for computed tables to not just be directly mapped from SQL views but also utilize temp & virtual tables in creation. Using init/deinit logic enables performance optimizations not possible by just using SQL selects / direct view mapping. 4. SQLite CSV Loading via Virtual Table / Performance Improvement The core logic which loads GTFS source archive's CSVs to the SQLite database has been wholesale reworked to use SQLite's CSV extension. This extension allows exposing source CSV files as virtual tables. As such, imports per archive per table take place now in a single & elegant `insert into targettable select * from virtualtable` type of statement. This is opposed to the previous logic which used a *large* transaction and many insert statements for bulk loading CSV data. The reason for this update is that it is vastly more efficient to use the SQLite CSV extension then multiple inserts / batched into a transaction. There is a conservative performance improvement estimate of 2-3x speedup for loading CSVs to the DB with this mechanism. Also note: as of this release, sqlite-dev is added as a build dependency as the headers are required for compilation for the CSV extension. Under the hood, the CSV extension is built via CGO. The former of CSV implementation exists in the codebase for testing purposes but is considered deprecated and will be removed in a future release. All tests in both Mobsql and Mobroute downstream are passing with the new CSV loading logic all the same. 5. Ability to Set Customizable Logger in Runtime Initialization In initializing a Mobsql runtime, the user may now specify a custom logger (log.Logger) to override the default logger (which prints logs to STDERR). This should help with downstream consumers & applications which wish to inspect logs periodically between library calls.
This release of Mobsql adds custom GTFS loading support and additionally reworks the CLI's overall UX. See below for a detailed list of changes. 1. Custom GTFS Loading Support Custom GTFS archives load support has been added in the form the dedicated FeedLoadCustom function and the CLI subcommand loadcustomgtfs. While previously there was custom GTFS support for tests (by way of stubbing the Mobility Database CSV); this new FeedLoadCustom command lets you explicitly load a custom GTFS zip archive without any stubbing. You can simply pass the location (either a file:// URI or http:// URI) to the GTFS zip and the rest is covered. A negative feed ID is required to be used to distinguish between Mobility Database feeds (feed IDs > 0) and custom GTFS feeds (feed IDs < 0). All other functionality operating on a specific feed ID (such as purge, status, compute, and similar) work identically with either Mobility Database sourced GTFS feeds or custom GTFS feeds. 2. CLI UX Rework & Documentation The CLI has been wholesale reworked to be in the form of `mobsql subcommand` format where the valid subcommands are: load, loadcustomgtfs, purge, status, and feedsearch. This change matches the same convention on the Mobroute CLI for consistency and provides a better UX. Previously a 'feed filter specification' was required to be passed on each CLI command; now the feed filter to feed IDs functionality is just confined to the new 'feedsearch' subcommand and for all other GTFS-feed specific commands, you pass directly feed IDs as is the case with the library usage itself. Additionally, the CLI is much better documented now with examples usages and documentation about each subcommand. Passing no arguments to the CLI for each subcommand will provide documentation for the given subcommand. See the updated CLI userguide for general usage: https://git.sr.ht/~mil/mobsql/tree/master/doc/userguide_cli.md
This release of Mobsql was focused on a few small tooling & consistency updates to aide in the release of Mobroute 0.6. See below for a detailed list of changes. 1. License Passthrough The license for each GTFS feed indicated by the Mobility Database is now passed through as URLLicense (matching in convention the renamed URLGTFS field) in the FeedStatusInfo struct. This change was motivated by downstream consumers of Mobsql data (such as Transito) needing to present license(s) to end-users so they may be aware of this information explicitly before utilizing particular GTFS feeds. 2. MDBID & Source References Renamed to FeedID All references to 'MDBID' and 'source' have been replaced with the term 'FeedID'. The previously used MDBID and source terms were used interchangeably and created ambiguity as to what the difference between these two terms were. Replacing all references should make it clearer both from the database & in the Go code what exactly is being referred to. As well the FeedID term is more self-explanatory to end users then either of the former terms. 3. Removal of YAML CLI Functionality A small final update is that the CLI no longer accepts YAML buffers for feed search filters (via -f) but rather only JSON. This matches a similar change in the Mobroute CLI, provides consistency, and should make it easier to test / validate requests in the future. This also removes the 'double-annotations' in the FeedsearchFilter struct and other parts of the code which was required by providing both JSON & YAML compatibility.
This release of Mobsql added a few performance-related changes to aide in the release of Mobroute 0.5. The largest change in this release is a performance update to change loading GTFS data to use transactions which improves load performance for large tables (and thus helps with large GTFS feed archives & aggregates). Details on changes below: 1. Transaction-Based Insertions The logic for loading GTFS data has been updated to perform multiple inserts per transaction (as opposed to a single compound insert per transaction). This improves performance considerably for larger tables such as the stop_times tables which on some large feeds can be in the millions of rows. This is one change in a series of related changes to address (#19) to improve overall performance for importing GTFS data for large sources and aggregates such as the Netherlands aggregate (MDBID 1077) and similar. 2. Indexes Cleanup Several indexes have been modified to allow for better performance in downstream Mobroute. In the future it may make sense to move the 'specification' of which indexes should be created to Mobroute itself, and make 'index specification' a part of Mobsql's API.. but for now, these changes will proceed on a per-release basis in Mobsql. 3. Correct Null agency_id Bug A small bug related to the agencies table for GTFS specification was corrected. This bug effected the Calgary transit feed (MDBID 712) preventing import of data. The way the GTFS specification was coded in Mobsql's spec was incorrect and mistakenly disallowed agencies with a null agency_id field. The specification is that agency_id can be null in case there is a single row; and this edgecase has now been corrected. 4. Allow Arbitrary SQL Initialization in SchemaExtra As part of the API for Mobsql, the SchemaExtra field now allows specifying the InitExec field which can run arbitrary SQL at initialization. This allows for end consumers to execute pragmas & similar performance related tuning adjustments at boot.
This release of Mobsql saw: (1) a large API rework, (2) parameterization of view & computed table logic, and (3) the addition of the ability to use custom Mobility DB CSV & GTFS archives. See details on changes below. 1. API Rework: Mobsql Runtime & Filter-to-Sourceset Logic API Split The API for Mobsql overall has been vastly reworked to allow more flexibility from end-consumers perspective (the main consumer of the Mobsql library's API being Mobroute currently). Two major changes to the API were made: (A) Firstly, the API is now 'multistage' in that all API functions (related to load, compute, purge, and status) now depend on a `MobsqlRuntime`; which may be initialized with the new `InitializeRuntime` function. The `MobsqlRuntime` allows the end user to store database connection & configuration parameters related to usage. (B) Secondly, filter-to-sourceset logic has been split into its own API (`apifilter`) and compute/load/purge/status functionality now directly takes MDBIDs rather then a filter as a parameter. Overall with these two changes (A) and (B) the end-user API is substantially cleaner & more flexible for consumers of Mobsql. 2. Parameterization of View & Computed Table Logic / Removal of Mobroute-Specific-Logic A longstanding wart in Mobsql was that the codebase contained SQL views (and computed tables) purely related to Mobroute's routing API (_vconnd, _vtransferstbl, _vtransfersgen, etc.). This has been corrected in that all SQL-view logic related to Mobroute alone have been moved to Mobroute's codebase proper. This change was facilitated by the addition of a new `SchemaExtra` property definable on Mobsql's runtime initialization. This changes allows for much cleaner integration with Mobroute and potentially other applications which may want to build on Mobsql & use Mobsql as a library in the future. 3. Allow Local GTFS Zip Imports (Parameterized Mobility DB CSV) Previously Mobsql used a hardcoded URL to the Mobility Database's catalogs CSV. This URL has been modified to be configurable via Mobsql's Runtime initialization. The URL used for the Mobility Database CSV, now customizable, may be either a `http://` or `file://` URI. Passing a `file://` URI you can use a mock Mobility Database URI on your local filesystem. This functionality is present both in the CLI & API. Additionally, adding on `file://` URI support for the Mobility Database CSV location; you may also now pass `file://` URIs for GTFS zip archive locations (effectively allowing loading arbitrary non-Mobility Database sourced GTFS Zip archives). Full documentation for this functionality will follow in future updates - for now see Mobroute's routing API tests which depend on this newly built functionality.
This release of Mobsql was focused on performance & compatibility updates to support associated changes in Mobroute & Transito. Changes include the addition of a view to map calendar dates to services, SQLite tuning updates, misc performance & usability updates, and the addition of CI. 1. Addition of _vcaltoservice view to map calendar dates to services The new _vcaltoservice view has been added which maps individual calendar dates to the service(s) which are valid for the given day. This takes into account both the raw GTFS calendar & calendar_dates tables as both an either/or if one is present and the other missing, or both are present. This is used by Mobroute's new routing API loading logic which only loads connections with associated valid service id's for the given input time in order to produce service respecting / accurate routes. 2. SQLite Tuning: Version pinning & Performance updates (WAL & memory store) A number of small SQLite tuning updates were made largely to improve compatibility and performance on mobile / Android devices. Firstly the SQLite required version has been pinned explicitly to target 3.28.0. A query is executed in Mobsql's initialization logic and will cause Mobsql to now explicitly error out if used with a SQLite version <3.28.0. This does mean Android 11 / R is required as a minimum version for Mobsql, Mobroute, and Transito to be functional. This version was chosen as a minimum requirement as this is the first SQLite version to support window functions which are made heavy use of in routing timetable calculations. Other then that strict tables support was dropped as that would require atleast SQLite 3.37.0. And secondly two performance related updates have been made, namely enabling WAL mode by default and using memory for the temp store rather then disk. Using memory for the temp store fixes issues on mobile devices with small amounts of disk / tempdir space in creating computed tables; and WAL mode enables general better IO performance overall. 3. Misc Performance & Usability Updates: Indexes, Active field, Checksums Several other minor performance & usability related updates have been applied. Firstly, with respect to performance, indexes used by Mobroute's routing API have been reexamined to optimize speed of extraction and as much as possible avoid full table scans. There is more work to be done on indexes, especially in regards to generated transfers, however the work done in this pass for overall timetable index optimization is a good starting point. Secondly with respect to usability, the 'active' field from Mobility Database is now passed through, in the filter specification, which enables Transito downstream to only show end users actively updated sources. Also this is a helpful filter for end-users to determine which feeds are 'good' / likely contain 'valid' GTFS data. And finally with respect to usability, table checksums now include megabytes in size (of the source tables imported from) which is quite helpful in recomputing tables and debugging failed or slow operating imports. 4. Continuous Integration Continuous integration via srht has been added to the repository by way of a checked in .build.yml file. Currently the build simply runs some simple unit tests added earlier on in development, however in the future more unit tests, automated benchmarks, and integration tests may also be added.