Removed all dictionary files to save space in repository
Ignoring tar files
missing
StarDict is an offline dictionary software developed by Hu Zheng. It has a GUI version and a terminal version: SDCV - StarDict Console Version Hu Zhen hosted a large archive of dictionaries for download at http://download.huzheng.org/ The website was not reachable anymore starting around November 2023.
To save the archive, the data was scraped from Wayback Machine, downloaded and parsed into maintainable textfiles that a bash script converts to html files. The project intention is to mirror the dictionary files and to be easy to migrate and deploy on any webserver.
Its repository can be found at https://git.sr.ht/~rostiger/stardict/
To generate the website from the source files a bash script is run that requires at least bash 4.0 onwards, as well as rsync (tested with 3.2.7)
There is a total of four scripts:
getDirs.sh
extract.sh
convert.sh
generate
getDirs
connects to the Wayback Machine and downloads the index.html files from the directories supplied in dirs.txt
These files contain the original HTML code as well as the names of the linked dictionary files.
It also mirrors the directory structure from the original website.
After the index.html
files have been downloaded and the directory structure was created, you can use extract
to download the dictionary files from Wayback Machine. This might take multiple attempts as the archive has a total of about 5GB and the connection is flakey.
Furthermore the convert
script can be executed to convert the index.html
to stripped down index_converted.md
files.
The index.md
files used to generate the website were written with the index_converted.md
as a starting point, everything else was written by hand.
generate
uses the index.md files and builds them into static html files and puts them in the dst
directory. Use generate -s
to sync the dictionary files in the src
directory with the dst
directory. Use generate -d
to deploy the dst
directory to the webserver (make sure to update the webserver data for your ssh setup).
Note: you will likely only ever need to use the generate
script as the repository already contains all the scraped data and the manually edited index.md
files. The other scripts and files are there in case the archive needs to be scraped from Wayback Machine again.