6056b2f4cf36a6c0e756942def2d1df47b1bbd24 — Steve Gattuso 1 year, 3 months ago 5c8240f
add citibike methodology
1 files changed, 42 insertions(+), 0 deletions(-)

A citibike-methodology.md
A citibike-methodology.md => citibike-methodology.md +42 -0
@@ 0,0 1,42 @@
# Citibike topography methodology

This document outlines the methodology and data sources used in _[Visualizing the topography of Citibike](https://www.stevegattuso.me/2021/11/28/citibike-topography.html)_

## Data Sources
* Used [this](https://gist.github.com/stevenleeg/c9815da685ea0736f77557032b222d48) Python script to download all citibike stations
* Used [NYC Neighborhood Tabluation Area](https://www1.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page?tab=2) geography files.
* Used [MapPLUTO](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) data for household calculations.
* Fetched 2015-2019 ACS data and census tract geographies from the National Historical Geographic Information System [data portal](https://www.nhgis.org/).

## Steps
1. Areas within 0.5km of a Citibike station
	* Imported Citibike station CSV from Python script
	* Reproject to New York/Long Island CRS
	* Create a buffer of 0.5km around each station
	* Dissolve all buffers into a single polygon
	* Clip the polygon using the NTA polygons
2. Households served (within the 0.5km range)
	* Imported MapPLUTO data
	* Clip using the 0.5km station buffers
	* Ran the `Basic statistics` operation on...
		* Unclipped MapPLUTO data to get the total number of households
		* Clipped MapPLUTO data to get the total number of households within 0.5km of a station
	* Calculated percentages based on these values
3. Neighborhood station capacity
	* Imported Citibike station CSV from Python script
	* Ran `Join attributes by location (summary)` operation
		* Summed up `capacity` column of each station per neighborhood
	* Created a new column: `capacity_count / ($area * 100)` to generate `capacity_per_100sqkm`
	* Visualized the column onto the NTA map
4. Neighborhood station capacity in NTAs below the poverty line
	* Fetched and imported census tract geographies sourced from NHGIS
	* Fetched and joined NHGIS 2015-2019 ACS median income per-household data
	* Used NTA geography files
	* Generate centroids of each polygon
	* Run `Join attributes by location (summary)` operation to merge ACS data into NTA polygons
		* Used the median of the median income field
	* Filtered out NTAs below the poverty line of $35k
	* Ran `Join attributes by location (summary)` to merge station data with NTA polygons
		* Summed up `capacity` column
	* Created a new column: `capacity_count / $area * 100` to generate `capacity_per_100sqkm`
	* Visualized the column onto map along with station locations