decentralized access to scientific literature
It's like Popcorn Time for science.
scihut is a proof-of-concept to show that it possible to decentralize access to Sci-Hub (in case of emergency or otherwise).
scihut decentralizes Sci-Hub "effortlessly" by simply tapping into the torrents which are published by Library Genesis and seeded by the community. scihut does not require any changes to the servers, or the seeders to work.
Nearly all research papers have at least one DOI (Digital Object Identifier) that uniquely identifies it.
DOIs have the following format:
D is a digit. Beware that DOIs are case-insensitive.
The part before the (first) forward slash (
10.DDD) is called the prefix.
The part after the (first) forward slash (
...) is called the suffix.
Library Genesis publishes torrents of Sci-Hub's research paper collection:
A new torrent is released for every 100,000 research papers, grouped by a monotonically increasing integer ID.
Every torrent consists of 100 zip files grouped (again) by ID, each containing 1,000 papers.
All the zip files have the same directory structure. Imagine three papers with the following DOIs:
10.0001/ ├── abcd.pdf └── efgh.pdf 10.0002/ └── x%2Byz.pdf
Library Genesis also publishes a compact dump of their database:
Given a DOI, scihut finds the ID of the paper using the database dump explained in step 3.
Given the ID, scihut can easily locate the torrent where the paper can be found, as torrents
are grouped by ID; e.g.
The metadata of the torrent is fetched using the DHT (Distributed Hash Table, the same mechanism that powers the magnet links).
Given the metadata of the torrent, scihut can easily locate the ZIP file where the paper can be found, as ZIP files are grouped by ID; e.g.
Both github.com/anacrolix/torrent (the BitTorrent library that powers scihut) and archive/zip support the Reader interface (or ReaderAt) that allows scihut to have "random access" to the torrent, its ZIP file, and hence download the PDF with a small overhead.
On a modest VPS, it takes around 12 seconds to download a paper. Majority of the time is spent discovering and contacting the peers.
BitTorrent protocol is piece oriented, so peers exchange blocks of data rather than arbitrary portions of files. Therefore, regardless of the size of the PDF you request, scihut will (likely) end up downloading 32 MiB (2 x 16 MiB blocks); one for the data, another for metadata (ZIP central directory).
make bin/scihut to build scihut.
It is strongly recommended that you run scihut in the directory of its repository (i.e.
./bin/scihut), as scihut expects its assets (
assets/ directory relative to its current working directory; likewise for the maintenance utilities.
You need to build the database before using scihut, which is not included in the repo due to its size. See the maintenance section below for details.
10.1002/(sici)1097-4628(19960425)60:4<531::aid-app6>3.0.co;2-p for testing if you have not
built the database yet.
$ scihut <doi> <output>
<doi> is the DOI of a paper, and
<output> is the path for the result to be saved (use
- for stdout).
To protect our privacy, we have decided not to maintain scihut, in the hope that one or more forks shall prevail. Think of scihut more as a publicity stunt to prove that it is perfectly possible to decentralise Sci-Hub to ensure its longevity. You may contribute to frrad/skyhub, which aims to be more than a Proof-of-Concept (but at very early stages), or you may fork scihut as your starting point.
Care has been taken to document all the steps for building, developing, using, and updating scihut and/or its assets so that it can survive. It should however be noted that although scihut does not rely on Library Genesis to work, Library Genesis is the only source of updates.
contains the assets: the list of torrents and the database
contains the entry-point of the program
contains the binaries
contains the helper functions that can be used by other programs too
contains utilities for the maintenance of scihut as explained below
make update_torrents to update the list of torrents.
The database allows scihut to map DOIs to integer IDs. Unfortunately it is not possible to update the database incrementally so you would have to "rebuild" it every time to update it.
Also the TSV dump is malformed, therefore SQLite's own
tabs mode cannot be used; instead we use a Go program to parse the TSV leniently and import it to the database.
make update_database to update the database. It might take a while.
Afterwards, you may remove
scimag_files.tsv.bz2 if you wish.
scihut - decentralized access to scientific literature Copyright (C) 2020 scihut developers This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.