Since starting the cleanup for this project, I have discovered momento, Gwern's page about archiving URLs, and LinkChecker. I need a more comprehensive solution than this program, so I'm abandoning it.
plr is a command-line utility that can help prevent link rot by archiving links to the Internet Archive. It currently works on markdown files.
usage: plr.py [-h] [-i INPUT] [-o OUTPUT] [-l] ... optional arguments: -h, --help show this help message and exit -i INPUT, --input-file INPUT specify an input file (default: stdin) -o OUTPUT, --output-file OUTPUT specify an output file (default: stdout) -l, --list output a list of archive links instead of replacing them in the text
This is a fork of schollz/prevent-link-rot.
add the archive links commented out to the markdown content so the admin can un-comment as needed
automatically ignore links that don't work (like nytimes) or provide alternative methods for them
allow domain whitelists (so you aren't archiving links to your own website automatically)
detect relative links and fill in the original address to be able to convert (need a command-line option for the baseurl)