~wrycode/plr

fee39a65272ef111d77f13c3b6d5e9371accb731 — Nick Econopouly 5 months ago 69a8106 master
Archive project
2 files changed, 22 insertions(+), 47 deletions(-)

M README.md
M plr.py
M README.md => README.md +22 -46
@@ 1,65 1,41 @@
# Description

plr is a command-line utility that can help prevent link rot by
automatically archiving links to the [Internet
Archive](https://archive.org). It works on markdown or plain text.

# Installation

TODO: AUR package, double check the following commands:
# Archived Project [August 2019]

```bash
$ virtualenv -p /usr/bin/python venv
$ source venv/bin/activate
$ pip install -r requirements.txt
```
Since starting the cleanup for this project, I have discovered
[momento](http://timetravel.mementoweb.org/about/), [Gwern's page
about archiving URLs](https://www.gwern.net/Archiving-URLs), and
[LinkChecker](https://wummel.github.io/linkchecker/). I need a more
comprehensive solution than this program, so I'm abandoning it.

# Usage
# Description

```bash
$plr -if inputfile.md -of outputfile.md
```
plr is a command-line utility that can help prevent link rot by
archiving links to the [Internet Archive](https://archive.org). It
currently works on markdown files.

plr's file arguments are optional; it defaults to standard input and
output.
# Usage 

TODO: to return just a list of links, add the command-line flag `-list`
    usage: plr.py [-h] [-i INPUT] [-o OUTPUT] [-l] ...

    optional arguments:
      -h, --help            show this help message and exit
      -i INPUT, --input-file INPUT
                            specify an input file (default: stdin)
      -o OUTPUT, --output-file OUTPUT
                            specify an output file (default: stdout)
      -l, --list            output a list of archive links instead of replacing
                            them in the text
# About

**Link rot** is real, it is common, and it is pervasive. What is
**link rot**? Link rot is essentially *the process by which hyperlinks
cease to function, usually because the web page or server they point
to has moved or has become permanently unavailable*. It has existed
since the internet began. An example of its influence can be seen as
recently as 2000, when [a
study](http://dx.doi.org/10.1002/bmb.2003.494031010165) found that
within 24 months, 50% of .com domains and 20% of .gov domains were no
longer viable. [1] [Another
study](http://dx.doi.org/10.1017/S1472669614000255) found that 70% of
links in Harvard Law Review and 50% of links within the United States
Supreme Court opinions are no longer viable. [2] If we are to take the
internet as a primary resource seriously, then we seriously need to
think about undertaking a better way of long-term preservation of link
contents.

This is a fork of
[schollz/prevent-link-rot](https://github.com/schollz/prevent-link-rot).
Full credit for the original library and logic should go to
[schollz](https://github.com/schollz). This fork cannibalizes the
project with a [KISS](https://en.wikipedia.org/wiki/KISS_principle)
attitude. The goal is to create a small but useful command-line
utility and an accompanying library that can be integrated into other
projects.

## Todo

- return a list of links instead of the full markdown contents with the `-list` command-line flag

- add the archive links *commented out* to the markdown content so the admin can un-comment as needed

- automatically ignore links that don't work (like nytimes) or provide alternative methods for them

- allow domain whitelists (so you aren't archiving links to your own website automatically)

- Detect relative links and fill in the original address to be able to convert (need a command-line option for the baseurl)
- detect relative links and fill in the original address to be able to
  convert (need a command-line option for the baseurl)

M plr.py => plr.py +0 -1
@@ 16,7 16,6 @@ def parseargs():
    parser.add_argument('-l', '--list', action='store_true',
                        help='output a list of archive links instead of replacing them \
                        in the text')
    parser.add_argument('urls', nargs=argparse.REMAINDER)
    return parser.parse_args()

def main():