~etalab/codegouvfr-fetch-data

[EN] Fetch the source data for code.gouv.fr
platforms.csv: Add gitlab.ofb.fr
platforms.csv: Add gitlab.has-sante.fr
platforms.csv: Add new forges

clone

read-only
https://git.sr.ht/~etalab/codegouvfr-fetch-data
read/write
git@git.sr.ht:~etalab/codegouvfr-fetch-data

You can also use your local clone with git send-email.

Software License goodtables.io

#Presentation

The code in this repository collects data from forges (github.com, gitlab.com and GitLab instances) about accounts (GitHub organizations or GitLab groups), their repositories and libraries.

Given this list of accounts URLs and this csv of platforms, we collect the data we need for code.gouv.fr.

#Installation and configuration

  1. Clone this repository: git clone https://git.sr.ht/~etalab/codegouvfr-fetch-data && cd codegouvfr-fetch-data
  2. Install Python dependencies: pip install -r requirements.txt
  3. Create a GitHub Token
  4. Create an account on libraries.io and create an API key on your account page.
  5. Set the following environment variables: GITHUB_TOKEN and LIBRARIESIO_API_KEY. Ex: export GITHUB_TOKEN="your github token" ; export LIBRARIESIO_API_KEY="your libraries.io api key"
  6. Create the folders that will receive the output data: mkdir -p data/organizations/csv && mkdir -p data/organizations/json && mkdir -p data/repositories/csv && mkdir -p data/repositories/json && mkdir -p data/libraries/csv && mkdir -p data/libraries/json
  7. Check the content of the platforms.csv file and update its content if needed.

#Generate JSON and CSV files

Launch the script with python fetch.py. The output files will be available in the subfolders of data.

#Todo

We aim at collecting data from more forges:

SourceHut is our priority because Etalab hosts some of its source code here.

If you are familiar with SourceHut GraphQL APIs and can help with contributing, feel free to send a patch to ~etalab/codegouvfr-devel@lists.sr.ht or to reach us directly.

#Data models

We use Table Schema files.

Please refer to the schema files in this directory.

#Get the data

#Contributing

#License

The source code of this repository is published under MIT.

2018-2022 DINSIC, DINUM, Etalab, Antoine Augusti, Bastien Guerry.

2018-2022 Other contributors, as readable in the history of this repository.