Add script to display duplicate groups with feh image viewer
Add script to remove deleted photos from database
Assume fingerprint script is in the same folder as show-dupes
This is a very small, very simple tool to help you deduplicate images.
Dependencies: python 3, OpenCV, numpy, shell tools: sort, uniq, cut, find, grep
I'm using the GNU versions of the shell utils with bash. YMMV with others, I've not checked whether all the options etc. I use are POSIX.
show-dupes
maintains a "database" (a file with lines in the format output by
fingerprint-files
). Its first argument is the database file, and then zero or
more folders which will be recursively processed. Only file paths not already
in the database will be added. Note that you need to be consistent about how the
paths are specified - the filenames in the db are just what find
outputs,
nothing clever is done to resolve the paths - they must match exactly for them
not to be re-processed. All files considered to be duplicates are printed; the
output is empty-line separated groups of newline-separated filenames.
Example usage:
./show-dupes photos/db photos/
# Examine the output, delete any you want to etc.
# ... later you add some more photos
./show-dupes photos/db photos/
# any new files now processed and dupes output again
./show-dupes photos/db
# Any duplicates in the db are output without adding any new files
That should be all you need to find duplicates. But read on if you want more info.
clean-db
will delete entries from the database which no longer exist on disk.
./clean-db photos/db photos/
display-dupes
uses feh
to display detected duplicates. Each group of
duplicates will be displayed in a multiwindow feh
invocation. You can use the
built in functions of feh
to delete some of the files if desired. Closing all
the feh
windows (or exit
from the feh
menu) will move onto the next group.
./show-dupes photos/db | ./display-dupes
fingerprint-files.py
reads a list of (newline-separated) filenames from stdin
and outputs the fingerprint and the filename to stdout: the fingerprint is 64
hex characters, then a space, and then the filename (verbatim as read).
Filenames containing newlines won't work (don't do that!). If OpenCV can't read
a file then an error is printed to stderr (but otherwise processing continues).
Method to generate the fingerprint is the same as findimagedupes
Note that whilst the method is the same this implementation uses OpenCV rather than ImageMagick, so fingerprints will not be comparable.