[EN] Fetch the source data for code.gouv.fr
4ba1451c — Bastien Guerry 17 hours ago
swh.py: Update "User-Agent"
15abc2da — Bastien Guerry 17 hours ago
8f1ec96e — Bastien Guerry 22 hours ago
fetch.py: Update fallback URL


browse  log 



You can also use your local clone with git send-email.

Software License goodtables.io


The code in this repository collects data from forges (github.com, gitlab.com and GitLab instances) about accounts (GitHub organizations or GitLab groups) and their repositories.

For example, given this list of account URLs and this csv of supported platforms, we collect the data we need for code.gouv.fr.

#Installation and configuration

  1. Clone this repository: git clone https://git.sr.ht/~etalab/codegouvfr-fetch-data && cd codegouvfr-fetch-data
  2. Install Python dependencies: pip install -r requirements.txt
  3. Create a GitHub Token
  4. Create an account on libraries.io and create an API key on your account page.
  5. Set the following environment variables: GITHUB_TOKEN and LIBRARIESIO_API_KEY. Ex: export GITHUB_TOKEN="your github token" ; export LIBRARIESIO_API_KEY="your libraries.io api key"
  6. Create the folders that will receive the output data: mkdir -p data/organizations/csv && mkdir -p data/organizations/json && mkdir -p data/repositories/csv && mkdir -p data/repositories/json && mkdir -p data/libraries/csv && mkdir -p data/libraries/json
  7. Check the content of the platforms.csv file and update its content if needed.

#Generate JSON and CSV files

Launch the script with python fetch.py. The output files will be available in the subfolders of data.


We aim at collecting data from more forges:

SourceHut is our priority because Etalab hosts some of its source code here.

If you are familiar with SourceHut GraphQL APIs and can help with contributing, feel free to send a patch to ~etalab/codegouvfr-devel@lists.sr.ht or to reach us directly.

#Data models

We use Table Schema files.

Please refer to the schema files in this directory.

#Get the data

  • Organizations data as csv and json
  • Repositories data as csv and json


Contributions are welcome!

To send bug reports, patches or to share ideas, please write to the public mailing list: ~etalab/codegouvfr-devel@lists.sr.ht


The source code of this repository is published under MIT.

2018-2021 DINSIC, DINUM, Etalab, Antoine Augusti, Bastien Guerry.

2018-2021 Other contributors, as readable in the history of this repository.