A simple toolkit for scraping and manipulating Citibike trip history in Python
This repository is a collection of utilities for working with Citibike data. It allows you to easily download all of Citibike's ride history archives, transform them as you see fit, and throw them into a SQLite database for easy querying.

This repository is what I use to build the SQLite database used in Citibike Explorer. It is also potentially useful if you don't feel like re-writing your own scraper to download, unzip, and load trip history archives into a pd.DataFrame.

#Installation and usage

Clone the repository, cd into the directory, and run:

$ python -m virtualenv .venv
$ source .venv/bin/activate
$ pip install -r ./requirements.txt

Once requirements are installed, you can use ./bin/scraper to download the trip archives individually or all in one swoop. See ./bin/scraper --help for details.

There is also ./bin/hourly-volume-rollup which will parse through all available archives and roll up the trip data into an hourly timeseries. Note that this requires provisioning a sqlite database, which can be done by running yoyo apply.

If you're just looking to load an archive into pandas, here's the code snippet you're looking for:

import forerad.scrapers.historical as historical

archives = historical.HistoricalTripArchive.list_cached()
df = archives[0].fetch_df()



#What's with the stupid name?

I originally wanted to build a forecast of daily trip volume but ended up scaling back my ambitions (maybe just for now). Fore is for forecast, rad is for das Fahrrad, the German word for bike.