~rostiger/stardict

Website for offline dictionaries
e21984f1 — rostiger 4 months ago
Removed all dictionary files to save space in repository
6f36659a — rostiger 4 months ago
Ignoring tar files
f8651ba8 — rostiger 4 months ago
missing

refs

main
browse  log 

clone

read-only
https://git.sr.ht/~rostiger/stardict
read/write
git@git.sr.ht:~rostiger/stardict

You can also use your local clone with git send-email.

#Stardict Dictionary Files Mirror

StarDict is an offline dictionary software developed by Hu Zheng. It has a GUI version and a terminal version: SDCV - StarDict Console Version Hu Zhen hosted a large archive of dictionaries for download at http://download.huzheng.org/ The website was not reachable anymore starting around November 2023.

To save the archive, the data was scraped from Wayback Machine, downloaded and parsed into maintainable textfiles that a bash script converts to html files. The project intention is to mirror the dictionary files and to be easy to migrate and deploy on any webserver.

Its repository can be found at https://git.sr.ht/~rostiger/stardict/

#Dependencies

To generate the website from the source files a bash script is run that requires at least bash 4.0 onwards, as well as rsync (tested with 3.2.7)

#Scripts

There is a total of four scripts:

  • getDirs.sh
  • extract.sh
  • convert.sh
  • generate

getDirs connects to the Wayback Machine and downloads the index.html files from the directories supplied in dirs.txt These files contain the original HTML code as well as the names of the linked dictionary files. It also mirrors the directory structure from the original website.

After the index.html files have been downloaded and the directory structure was created, you can use extract to download the dictionary files from Wayback Machine. This might take multiple attempts as the archive has a total of about 5GB and the connection is flakey. Furthermore the convert script can be executed to convert the index.html to stripped down index_converted.md files. The index.md files used to generate the website were written with the index_converted.md as a starting point, everything else was written by hand.

generate uses the index.md files and builds them into static html files and puts them in the dst directory. Use generate -s to sync the dictionary files in the src directory with the dst directory. Use generate -d to deploy the dst directory to the webserver (make sure to update the webserver data for your ssh setup). Note: you will likely only ever need to use the generate script as the repository already contains all the scraped data and the manually edited index.md files. The other scripts and files are there in case the archive needs to be scraped from Wayback Machine again.

Do not follow this link