~adnano/adnano.co

6679e3ad28d9672ab147de372f470cc9ad84032e — Adnan Maolood 16 days ago 3b0d3aa
Debugging a Git clone issue
2 files changed, 179 insertions(+), 0 deletions(-)

A content/2022-11-12-git-clone.md
A static/media/2022-11-12-clone-button.png
A content/2022-11-12-git-clone.md => content/2022-11-12-git-clone.md +179 -0
@@ 0,0 1,179 @@
---
title: Debugging a Git clone issue
---

In January of 2022, an update to [git.sr.ht] added the ability to create new Git repositories by cloning an existing repository[^1]. The feature takes a URL to clone from, initializes the repository, and completes the clone in the background. The feature was implemented with [go-git], a Git implementation in Go.

[^1]: Note that cloning repositories within the same git.sr.ht instance was already implemented. This was a generalization of that feature to allow cloning external repositories as well.

[git.sr.ht]: https://git.sr.ht
[go-git]: https://github.com/go-git/go-git

![Screenshot of the clone web interface](/media/2022-11-12-clone-button.png)

After a while, we started to get reports of users being unable to clone repositories which had been created using the new feature. Cloning these repositories with `git clone` would lead to strange errors. Oddly enough, these errors only appeared when cloning over HTTPS, not SSH.

	$ git clone https://git.sr.ht/~user/goguma
	Cloning into 'goguma'...
	fatal: expected 'packfile'

Our first thought was that the cloned repositories produced by go-git were somehow invalid. We [reported the issue][issue] on the go-git issue tracker, but the maintainers were not responsive. Meanwhile, we continued investigating internally.

[issue]: https://github.com/go-git/go-git/issues/528

As we investigated the problem we noticed that clones would start working again after a few days. This was deemed to be the result of `git gc`, which runs periodically for all Git repositories on git.sr.ht. We confirmed that `git gc` fixes the issue by running it manually on an affected repository.

Let's debug the issue. We'll start by cloning the repository again but with verbose output.

	$ git clone -v https://git.sr.ht/~user/goguma
	Cloning into 'goguma'...
	POST git-upload-pack (175 bytes)
	POST git-upload-pack (452 bytes)
	fatal: expected 'packfile'

Not very helpful. Let's increase the verbosity.

	$ git clone -vv https://git.sr.ht/~user/goguma
	Cloning into 'goguma'...
	POST git-upload-pack (175 bytes)
	want 49de801ae8ac0865e4fef50a311ba44b36a52250 (HEAD)
	want 49de801ae8ac0865e4fef50a311ba44b36a52250 (refs/heads/master)
	want 9768a49c170142b888c8980944303c2ba794a826 (refs/tags/v0.1.0)
	want 7303c46eb27ac22b5de34fb8d867d82d7d06121f (refs/tags/v0.2.0)
	want e7e6a1bf11431a37f45ff9cb1abd90bec9124b74 (refs/tags/v0.3.0)
	want aa9980534db4bd25e2b78d360f7170e21ca01c21 (refs/tags/v0.4.0)
	want 1638a79dcc58127a08f3d81732169b536f6f5546 (refs/tags/v0.4.1)
	POST git-upload-pack (452 bytes)
	fatal: expected 'packfile'

This is a little more helpful. We can also inspect the Git packets with `GIT_TRACE_PACKET`.

	$ env GIT_TRACE_PACKET=1 git clone -v https://git.sr.ht/~user/goguma
	Cloning into 'goguma'...
		packet:          git< version 2
		...
	POST git-upload-pack (175 bytes)
		packet:          git> 0002
		packet:        clone< 49de801ae8ac0865e4fef50a311ba44b36a52250 HEAD symref-target:refs/heads/master
		...
	POST git-upload-pack (467 bytes)
		packet:          git> 0002
		packet:        clone< 0002
	fatal: expected 'packfile'

Compare this to the output for a successful clone:

	$ env GIT_TRACE_PACKET=1 git clone -v https://git.sr.ht/~emersion/goguma
	Cloning into 'goguma'...
		packet:          git< version 2
		...
	POST git-upload-pack (175 bytes)
		packet:          git> 0002
		packet:        clone< 49de801ae8ac0865e4fef50a311ba44b36a52250 HEAD symref-target:refs/heads/master
		...
	POST git-upload-pack (gzip 1117 to 597 bytes)
		packet:        clone< packfile
		packet:     sideband< PACK ...
		packet:     sideband< 0000
		packet:          git> 0002
		packet:        clone< 0002

Notice how the failed clone is completely missing the `packfile` packet. That's why the clone fails with `fatal: expected 'packfile'`. The odd thing is that the git-upload-pack endpoint returns a status code of 200 and there are no errors in the logs. It is failing silently.

In an attempt to reproduce the issue, I wrote a simple script which would clone a repository with go-git, serve the cloned repository with nginx and git-http-backend, and then clone it again with `git clone`. Surprisingly, the clone succeeded! I was unable to reproduce the issue this way.

I thought there must be something else at play. I obtained a tarball of an affected repository from production before and after `git gc` had run. I extracted the tarballs and investigated the repository, expecting to find something wrong. Except, nothing was obviously wrong.

I decided to try to reproduce the issue again. I spun up an Alpine Linux image in qemu and installed meta.sr.ht and git.sr.ht from packages. I edited the nginx configuration so that I could connect over HTTP. I forwarded port 80 from the guest to the host. I then created a user, logged in, and cloned a repository from the web interface. This time, I was able to reproduce the issue.

	$ git clone http://git.sr.ht.local/~user/goguma
	fatal: expected 'packfile'

Now I needed to determine the cause. I compared the nginx configuration to the one I had used previously. Eventually, I narrowed it down to one line: the fcgiwrap socket path.

	fastcgi_pass unix:/run/fcgiwrap/fcgiwrap.sock;

In my previous attempt to reproduce the issue, I had created an fcgiwrap socket manually instead of relying on the socket created by OpenRC. Let's take a look at the fcgiwrap init script used by OpenRC at `/etc/init.d/fcgiwrap`:

	$ cat /etc/init.d/fcgiwrap
	#!/sbin/openrc-run
	
	name="fcgiwrap"
	description="fcgiwrap cgi daemon"
	
	command="/usr/bin/fcgiwrap"
	command_background="yes"
	user="fcgiwrap"
	group="www-data"
	: ${socket:=unix:/run/fcgiwrap/fcgiwrap.sock}
	
	...

As you can see, OpenRC will execute fcgiwrap as the user `fcgiwrap` and the group `www-data`. nginx is also in the `www-data` group, so it will have access to the fcgiwrap socket.

Perhaps the issue has to do with permissions. The Git repositories are stored in the `/var/lib/git` directory, which is owned by the user `git`. Let's run fcgiwrap as the user `git` and see what happens.

	su git -c 'fcgiwrap -f -s unix:/tmp/fcgiwrap.sock' &
	chgrp www-data /tmp/fcgiwrap.sock
	chmod g+w /tmp/fcgiwrap.sock

The nginx configuration needs to be edited to point at our new fcgiwrap socket. Now we can try cloning again.

	$ git clone http://git.sr.ht.local/~user/goguma
	Cloning into 'goguma'...
	remote: Enumerating objects: 4077, done.
	remote: Total 4077 (delta 0), reused 0 (delta 0), pack-reused 4077
	Receiving objects: 100% (4077/4077), 630.40 KiB | 2.49 MiB/s, done.
	Resolving deltas: 100% (2970/2970), done.

This works! But why is this an issue in the first place? `/var/lib/git` should be accessible to other users. Let's take a look at the problematic repository.

	$ cd /var/lib/git/~user/goguma
	$ stat -c '%a %n' **/*
	644 config
	644 git-daemon-export-ok
	644 HEAD
	755 objects
	755 objects/info
	755 objects/pack
	644 objects/pack/pack-1c673b53da2f0bfe8a3399cee03e82b17247a69a.idx
	600 objects/pack/pack-1c673b53da2f0bfe8a3399cee03e82b17247a69a.pack
	755 refs
	755 refs/heads
	644 refs/heads/master
	755 refs/remotes
	755 refs/remotes/origin
	...

Compare this to the output after `git gc`.

	$ git gc
	$ stat -c '%a %n' **/*
	644 config
	644 git-daemon-export-ok
	644 HEAD
	755 info
	644 info/refs
	755 objects
	755 objects/info
	444 objects/info/commit-graph
	644 objects/info/packs
	755 objects/pack
	444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.bitmap
	444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.idx
	444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.pack
	644 packed-refs
	755 refs
	755 refs/heads
	755 refs/remotes
	755 refs/tags

Notice how the permissions on the .pack file in objects/pack change from 600 to 444. To test if the 600 permissions are the source of the clone errors, we can try to change the permissions on a freshly cloned repository and see if the errors disappear.

	$ chmod 0644 objects/pack/*.pack

This fixes the issue! We have now identified the cause. go-git sets the wrong permissions on packfiles, which means that git-http-backend will be unable to read them. This also explains why the issue could not be reproduced previously, since fcgiwrap was running as the same user that owned the git repository files.

To fix the issue, [a patch for go-git][patch] is needed to create packfiles with the proper permissions. git-upload-pack should also be patched so that it errors out when this happens instead of failing silently. A clear error message from git-upload-pack would have made debugging this issue much easier.

[patch]: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/36771

A static/media/2022-11-12-clone-button.png => static/media/2022-11-12-clone-button.png +0 -0