A script to bulk download files from Wikimedia Commons.
Really, if you don't have all the utilities above installed on your machine then you really need to get a better shell or rebuild Busybox or something, they're all pretty basic (other than wget, jq, and xmllint I suppose).
Download the script directly: https://git.sr.ht/~nytpu/commons-downloader/blob/master/commons-downloader
Or clone the repo:
git clone https://git.sr.ht/~nytpu/commons-downloader
Then run the script where it is with something like
symlink it into your
Usage: commons-downloader [-chns] [-o outdir] [-q query]... [-r file] <category> -c Download all images in a given category. -h Display this help information. -n No output or progress information. -o outdir Download all images to the given directory (will be created). -q query Additional queries to add when downloading from a search. -r file Resume downloading URIs from a given file. -s Download all images from a search for the given category and queries. -u agent Change the user agent to use for requests. category The formal category name you wish to download from.
The main options are
-c will download all matches in a category, and
-s will download all
matches for a search; they can be combined, the downloaded files will be
deduplicated so an intersection between them is not an issue.
-r <URL list file> will resume a download given a list of URLs, and is
mutually exclusive with
The URLs for a given download will be automatically saved in
_URLS.txt in the
directory holding the downloaded photos.
At least one of
-r is required to be passed.
-q <add'l query> flags can be added when using
-s to add additional
queries to a search.
It has no effect if
-s is not also passed.
commons-downloader -s -q Q173651 -q "African Wild Dog" Lycaon pictus
is equivalent to the search
"Lycaon pictus" OR "Q173651" OR "African Wild Dog"
-o <out directory> will download all files to the given directory, creating
it if necessary.
The current directory is the default if
-o is not passed.
The mandatory argument is a category.
-s is passed it can be an arbitrary search query, but if
passed then it must be an official Wikimedia Commons
A category can be verified by visiting
You can often find a new category by going to the bottom of a Wikipedia page
and looking for a box that says:
Wikimedia Commons has media related to: <article name> (category)
You can then click the
(category) link to find the Wikimedia Commons
Download all files in the
category and all results for in the search
"Panthera uncia" OR "Q30197" OR "snow leopard" OR "Uncia uncia"
snep/ subdirectory in the current folder:
commons-downloader -cs -o snep -q Q30197 -q "snow leopard" -q "Uncia uncia" Panthera uncia
If the download in the previous command was interrupted, it could be resumed with:
commons-downloader -o snep -r snep/_URLS.txt
The upstream URL of this project is https://sr.ht/~nytpu/commons-downloader. Send suggestions, bugs, patches, and other contributions to ~email@example.com or firstname.lastname@example.org. For help sending a patch through email, see https://git-send-email.io. You can browse the list archives at https://lists.sr.ht/~nytpu/public-inbox.
If you aren't comfortable sending a patch through email, get in touch with a link to your repo and I'll pull the changes down myself!
Written in 2021–2023 by nytpu <alex [at] nytpu.com>
To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.