~etalab/codegouvfr-fetch-data

[EN] Fetch the source data for code.gouv.fr
4ba1451c — Bastien Guerry 17 hours ago
swh.py: Update "User-Agent"
15abc2da — Bastien Guerry 17 hours ago
README*: Update URL
8f1ec96e — Bastien Guerry 22 hours ago
fetch.py: Update fallback URL

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~etalab/codegouvfr-fetch-data
read/write
git@git.sr.ht:~etalab/codegouvfr-fetch-data

You can also use your local clone with git send-email.

Software License goodtables.io

#Presentation

The code in this repository collects data from forges (github.com, gitlab.com and GitLab instances) about accounts (GitHub organizations or GitLab groups) and their repositories.

For example, given this list of account URLs and this csv of supported platforms, we collect the data we need for code.gouv.fr.

#Installation and configuration

  1. Clone this repository: git clone https://git.sr.ht/~etalab/codegouvfr-fetch-data && cd codegouvfr-fetch-data
  2. Install Python dependencies: pip install -r requirements.txt
  3. Create a GitHub Token
  4. Create an account on libraries.io and create an API key on your account page.
  5. Set the following environment variables: GITHUB_TOKEN and LIBRARIESIO_API_KEY. Ex: export GITHUB_TOKEN="your github token" ; export LIBRARIESIO_API_KEY="your libraries.io api key"
  6. Create the folders that will receive the output data: mkdir -p data/organizations/csv && mkdir -p data/organizations/json && mkdir -p data/repositories/csv && mkdir -p data/repositories/json && mkdir -p data/libraries/csv && mkdir -p data/libraries/json
  7. Check the content of the platforms.csv file and update its content if needed.

#Generate JSON and CSV files

Launch the script with python fetch.py. The output files will be available in the subfolders of data.

#Todo

We aim at collecting data from more forges:

SourceHut is our priority because Etalab hosts some of its source code here.

If you are familiar with SourceHut GraphQL APIs and can help with contributing, feel free to send a patch to ~etalab/codegouvfr-devel@lists.sr.ht or to reach us directly.

#Data models

We use Table Schema files.

Please refer to the schema files in this directory.

#Get the data

  • Organizations data as csv and json
  • Repositories data as csv and json

#Contributing

Contributions are welcome!

To send bug reports, patches or to share ideas, please write to the public mailing list: ~etalab/codegouvfr-devel@lists.sr.ht

#License

The source code of this repository is published under MIT.

2018-2021 DINSIC, DINUM, Etalab, Antoine Augusti, Bastien Guerry.

2018-2021 Other contributors, as readable in the history of this repository.