date: "2020-11-18T18:31:15-08:00" description: Efficient redundancy via repository mirroring with nothing but git. outputs:
This is Part 1 of a series called Resilient Git.
The most important part of a project is its code. Resilient projects should have their code in multiple places of equal weight so that work continues normally if a single remote goes down.
Many projects already do something similar: they have one "primary" remote and several mirrors. I'm suggesting something different. Treating a remote as a "mirror" implies that the remote is a second-class citizen. Mirrors are often out of date and aren't usually the preferred place to fetch code. Instead of setting up a primary remote and mirrors, I propose hydra hosting: setting up multiple primary remotes of equal status and pushing to/fetching from them in parallel.
Having multiple primary remotes of equal status might sound like a bad idea. If there are multiple remotes, how do people know which one to use? Where do they file bug reports, get code, or send patches? Do maintainers need to check multiple places?
No. Of course not. A good distributed system should automatically keep its nodes in sync to avoid the hassle of checking multiple places for updates.
This process should pretty straightforward. You can run
git remote add (see
git-remote(1)) or edit your repo's
[remote "origin"] url = firstname.lastname@example.org:~seirdy/seirdy.one fetch = +refs/heads/*:refs/remotes/origin/* [remote "gl_upstream"] url = email@example.com:seirdy/seirdy.one.git fetch = +refs/heads/*:refs/remotes/gl_upstream/* [remote "gh_upstream"] url = firstname.lastname@example.org:seirdy/seirdy.one.git fetch = +refs/heads/*:refs/remotes/gh_upstream/*
If that's too much work--a perfectly understandable complaint--automating the process is trivial. Here's an example from my dotfiles.
Having multiple remotes is fine, but pushing to and fetching from all of them can be slow. Two simple git aliases fix that:
[alias] pushall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git push --all --follow-tags fetchall = !git remote | grep -E 'origin|upstream' | xargs -L1 -P 0 git fetch
git pushall and
git fetchall will push to and fetch from all remotes in
parallel, respectively. Only one remote needs to be online for project members to
I'd recommend advertising at least three remotes in your README: your personal
favorite and two determined by popularity. Tell users to run
git remote set-url to
switch remote locations if one goes down.
Q: Why not use a cloud service to automate mirroring?
A: Such a setup depends upon the cloud service and a primary repo for that service to watch, defeating the purpose (resiliency). Hydra hosting automates this without introducing new tools, dependencies, or closed platforms to the mix.
Q: What about issues, patches, etc.?
A: Stay tuned for Parts 2 and 3, coming soon to a weblog/gemlog near you™.
Q: Why did you call this "hydra hosting"?
A: It's a reference to the Hydra of Lerna from Greek Mythology, famous for keeping its brain in a nested RAID array to protect against disk failures and beheading. It could also be a reference to a fictional organization of the same name from Marvel Comics named after the Greek monster for similar reasons (direct webm).