update readme
add license.txt
update README
2024-03-24 Update: It appears Lyft has changed the way they distribute the datasets in their S3 bucket. As such, the scraper
script is currently out of order until I have some time/energy to update it. If you're thinking about using this tool feel free to send an email to ~vesto/feedon@lists.sr.ht and maybe I can speed up the timeline :).
This repository is a collection of utilities for working with Citibike data. It allows you to easily download all of Citibike's ride history archives, transform them as you see fit, and throw them into a SQLite database for easy querying.
This repository is what I use to build the SQLite database used in Citibike Explorer. It is also potentially useful if you don't feel like re-writing your own scraper to download, unzip, and load trip history archives into a pd.DataFrame
.
Clone the repository, cd into the directory, and run:
$ python -m virtualenv .venv
$ source .venv/bin/activate
$ pip install -r ./requirements.txt
Once requirements are installed, you can use ./bin/scraper
to download the trip archives individually or all in one swoop. See ./bin/scraper --help
for details.
There is also ./bin/hourly-volume-rollup
which will parse through all available archives and roll up the trip data into an hourly timeseries. Note that this requires provisioning a sqlite database, which can be done by running yoyo apply
.
If you're just looking to load an archive into pandas, here's the code snippet you're looking for:
import forerad.scrapers.historical as historical
archives = historical.HistoricalTripArchive.list_cached()
df = archives[0].fetch_df()
print(df)
I originally wanted to build a forecast of daily trip volume but ended up scaling back my ambitions (maybe just for now). Fore
is for forecast, rad
is for das Fahrrad, the German word for bike.