A simple toolkit for scraping and manipulating Citibike trip history in Python
add license.txt
update README
improve reliability of rollup


browse  log 



You can also use your local clone with git send-email.


This repository is a collection of utilities for working with Citibike data. It allows you to easily download all of Citibike's ride history archives, transform them as you see fit, and throw them into a SQLite database for easy querying.

This repository is what I use to build the SQLite database used in Citibike Explorer. It is also potentially useful if you don't feel like re-writing your own scraper to download, unzip, and load trip history archives into a pd.DataFrame.

#Installation and usage

Clone the repository, cd into the directory, and run:

$ python -m virtualenv .venv
$ source .venv/bin/activate
$ pip install -r ./requirements.txt

Once requirements are installed, you can use ./bin/scraper to download the trip archives individually or all in one swoop. See ./bin/scraper --help for details.

There is also ./bin/hourly-volume-rollup which will parse through all available archives and roll up the trip data into an hourly timeseries. Note that this requires provisioning a sqlite database, which can be done by running yoyo apply.

If you're just looking to load an archive into pandas, here's the code snippet you're looking for:

import forerad.scrapers.historical as historical

archives = historical.HistoricalTripArchive.list_cached()
df = archives[0].fetch_df()



#What's with the stupid name?

I originally wanted to build a forecast of daily trip volume but ended up scaling back my ambitions (maybe just for now). Fore is for forecast, rad is for das Fahrrad, the German word for bike.