~arx10/openpgpcard-wireguard-go

build: sourcehut build script

Builds custom binaries for linux from this repo.

Signed-off-by: Justin Ludwig <justin@arcemtene.com>
README: how to use with private key stored on card

Change the readme to explain how to use with an OpenPGP card and SSH
agent, and to document the custom SSH agent extensions it relies on.

Signed-off-by: Justin Ludwig <justin@arcemtene.com>
script: make-owg-quick.sh to run custom binary

Script to make a new owg-quick executable that runs the wireguard-go
binary even on systems that already have a native WireGuard kernel
module. This script does 3 things:

1. Copies your newly built wireguard-go binary (if available) to
   /usr/local/bin/openpgpcard-wireguard-go
2. Copies your existing wg-quick executable to /usr/local/bin/owg-quick,
   and updates it to always use openpgpcard-wireguard-go to start up the
   WireGuard interface
3. Copies your existing systemd wg-quick@.service (if available) to
   /usr/local/lib/systemd/system/owg-quick@.service, and updates it to
   run owg-quick instead of wg-quick

Signed-off-by: Justin Ludwig <justin@arcemtene.com>
device: interact with private key stored on card

Extracts the interaction with device's private key into publicKey and
sharedSecret methods of the Device type itself.

Adds an IsOnCard method to the NoisePrivateKey type to check for a fake
private key, where the first byte of the private key is 1. When this is
the case, the Device.publicKey method attempts to connect via socket to
an SSH agent to retrieve the public key from an OpenPGP card; and
Device.sharedSecret attempts to do the same for each DH operation.

Signed-off-by: Justin Ludwig <justin@arcemtene.com>
device: helper fns for interacting with card

Adds helper fns for interacting with an OpenPGP card via a socket to an
SSH agent. The agent must support two custom extension types:

1. "x25519/pub", to return the public key for the specified card reader
   ID
2. "x25519/dh", to perform a DH operation for the specified card reader
   ID and other party's public key

These helper fns mimic the behavior of the publicKey and sharedSecret
methods of the NoisePrivateKey type, but are passed a fake private key.
This fake private key allows a few minor customizations to be made to
SSH agent requests:

1. The SSH agent socket path can be customized via the second byte of
   the fake private key. If the second byte is 0, these helper fns will
   try to connect to the socket at /var/run/wireguard/agent0. If the
   second byte is 1, they will try to connect to
   /var/run/wireguard/agent1; and so on.

2. The card reader ID can be customized via the third byte of the fake
   private key. If the third byte is 0, these helper fns will tell the
   SSH agent to use the reader ID "0" to access the private key. If the
   third byte is 1, they will tell it to use reader ID "1" to access the
   private key; and so on.

Signed-off-by: Justin Ludwig <justin@arcemtene.com>
12269c27 — Martin Basovnik 1 year, 2 months ago
device: fix possible deadlock in close method

There is a possible deadlock in `device.Close()` when you try to close
the device very soon after its start. The problem is that two different
methods acquire the same locks in different order:

1. device.Close()
 - device.ipcMutex.Lock()
 - device.state.Lock()

2. device.changeState(deviceState)
 - device.state.Lock()
 - device.ipcMutex.Lock()

Reproducer:

    func TestDevice_deadlock(t *testing.T) {
    	d := randDevice(t)
    	d.Close()
    }

Problem:

    $ go clean -testcache && go test -race -timeout 3s -run TestDevice_deadlock ./device | grep -A 10 sync.runtime_SemacquireMutex
    sync.runtime_SemacquireMutex(0xc000117d20?, 0x94?, 0x0?)
            /usr/local/opt/go/libexec/src/runtime/sema.go:77 +0x25
    sync.(*Mutex).lockSlow(0xc000130518)
            /usr/local/opt/go/libexec/src/sync/mutex.go:171 +0x213
    sync.(*Mutex).Lock(0xc000130518)
            /usr/local/opt/go/libexec/src/sync/mutex.go:90 +0x55
    golang.zx2c4.com/wireguard/device.(*Device).Close(0xc000130500)
            /Users/martin.basovnik/git/basovnik/wireguard-go/device/device.go:373 +0xb6
    golang.zx2c4.com/wireguard/device.TestDevice_deadlock(0x0?)
            /Users/martin.basovnik/git/basovnik/wireguard-go/device/device_test.go:480 +0x2c
    testing.tRunner(0xc00014c000, 0x131d7b0)
    --
    sync.runtime_SemacquireMutex(0xc000130564?, 0x60?, 0xc000130548?)
            /usr/local/opt/go/libexec/src/runtime/sema.go:77 +0x25
    sync.(*Mutex).lockSlow(0xc000130750)
            /usr/local/opt/go/libexec/src/sync/mutex.go:171 +0x213
    sync.(*Mutex).Lock(0xc000130750)
            /usr/local/opt/go/libexec/src/sync/mutex.go:90 +0x55
    sync.(*RWMutex).Lock(0xc000130750)
            /usr/local/opt/go/libexec/src/sync/rwmutex.go:147 +0x45
    golang.zx2c4.com/wireguard/device.(*Device).upLocked(0xc000130500)
            /Users/martin.basovnik/git/basovnik/wireguard-go/device/device.go:179 +0x72
    golang.zx2c4.com/wireguard/device.(*Device).changeState(0xc000130500, 0x1)

Signed-off-by: Martin Basovnik <martin.basovnik@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
542e565b — Jason A. Donenfeld 1 year, 1 month ago
device: do atomic 64-bit add outside of vector loop

Only bother updating the rxBytes counter once we've processed a whole
vector, since additions are atomic.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
7c20311b — Jordan Whited 1 year, 3 months ago
device: reduce redundant per-packet overhead in RX path

Peer.RoutineSequentialReceiver() deals with packet vectors and does not
need to perform timer and endpoint operations for every packet in a
given vector. Changing these per-packet operations to per-vector
improves throughput by as much as 10% in some environments.

Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4ffa9c20 — Jordan Whited 1 year, 2 months ago
device: change Peer.endpoint locking to reduce contention

Access to Peer.endpoint was previously synchronized by Peer.RWMutex.
This has now moved to Peer.endpoint.Mutex. Peer.SendBuffers() is now the
sole caller of Endpoint.ClearSrc(), which is signaled via a new bool,
Peer.endpoint.clearSrcOnTx. Previous Callers of Endpoint.ClearSrc() now
set this bool, primarily via peer.markEndpointSrcForClearing().
Peer.SetEndpointFromPacket() clears Peer.endpoint.clearSrcOnTx when an
updated conn.Endpoint is stored. This maintains the same event order as
before, i.e. a conn.Endpoint received after peer.endpoint.clearSrcOnTx
is set, but before the next Peer.SendBuffers() call results in the
latest conn.Endpoint source being used for the next packet transmission.

These changes result in throughput improvements for single flow,
parallel (-P n) flow, and bidirectional (--bidir) flow iperf3 TCP/UDP
tests as measured on both Linux and Windows. Latency under load improves
especially for high throughput Linux scenarios. These improvements are
likely realized on all platforms to some degree, as the changes are not
platform-specific.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
d0bc03c7 — Jordan Whited 1 year, 3 months ago
tun: implement UDP GSO/GRO for Linux

Implement UDP GSO and GRO for the Linux tun.Device, which is made
possible by virtio extensions in the kernel's TUN driver starting in
v6.2.

secnetperf, a QUIC benchmark utility from microsoft/msquic@8e1eb1a, is
used to demonstrate the effect of this commit between two Linux
computers with i5-12400 CPUs. There is roughly ~13us of round trip
latency between them. secnetperf was invoked with the following command
line options:
-stats:1 -exec:maxtput -test:tput -download:10000 -timed:1 -encrypt:0

The first result is from commit 2e0774f without UDP GSO/GRO on the TUN.

[conn][0x55739a144980] STATS: EcnCapable=0 RTT=3973 us
SendTotalPackets=55859 SendSuspectedLostPackets=61
SendSpuriousLostPackets=59 SendCongestionCount=27
SendEcnCongestionCount=0 RecvTotalPackets=2779122
RecvReorderedPackets=0 RecvDroppedPackets=0
RecvDuplicatePackets=0 RecvDecryptionFailures=0
Result: 3654977571 bytes @ 2922821 kbps (10003.972 ms).

The second result is with UDP GSO/GRO on the TUN.

[conn][0x56493dfd09a0] STATS: EcnCapable=0 RTT=1216 us
SendTotalPackets=165033 SendSuspectedLostPackets=64
SendSpuriousLostPackets=61 SendCongestionCount=53
SendEcnCongestionCount=0 RecvTotalPackets=11845268
RecvReorderedPackets=25267 RecvDroppedPackets=0
RecvDuplicatePackets=0 RecvDecryptionFailures=0
Result: 15574671184 bytes @ 12458214 kbps (10001.222 ms).

Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
1cf89f53 — Jordan Whited 1 year, 3 months ago
tun: fix Device.Read() buf length assumption on Windows

The length of a packet read from the underlying TUN device may exceed
the length of a supplied buffer when MTU exceeds device.MaxMessageSize.

Reviewed-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2e0774f2 — Jason A. Donenfeld 1 year, 3 months ago
device: ratchet up max segment size on android

GRO requires big allocations to be efficient. This isn't great, as there
might be Android memory usage issues. So we should revisit this commit.
But at least it gets things working again.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
b3df23dc — Jason A. Donenfeld 1 year, 3 months ago
conn: set unused OOB to zero length

Otherwise in the event that we're using GSO without sticky sockets, we
pass garbage OOB buffers to sendmmsg, making a EINVAL, when GSO doesn't
set its header.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
f502ec3f — Jason A. Donenfeld 1 year, 3 months ago
conn: fix cmsg data padding calculation for gso

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
5d37bd24 — Jason A. Donenfeld 1 year, 3 months ago
conn: separate gso and sticky control

Android wants GSO but not sticky.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
24ea1335 — Jason A. Donenfeld 1 year, 3 months ago
conn: harmonize GOOS checks between "linux" and "android"

Otherwise GRO gets enabled on Android, but the conn doesn't use it,
resulting in bundled packets being discarded.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
177caa7e — Jason A. Donenfeld 1 year, 3 months ago
conn: simplify supportsUDPOffload

This allows a kernel to support UDP_GRO while not supporting
UDP_SEGMENT.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
42ec952e — James Tucker 1 year, 4 months ago
go.mod,tun/netstack: bump gvisor

Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
ec8f6f82 — James Tucker 1 year, 4 months ago
tun: fix crash when ForceMTU is called after close

Close closes the events channel, resulting in a panic from send on
closed channel.

Reported-By: Brad Fitzpatrick <brad@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
1ec454f2 — Jordan Whited 1 year, 4 months ago
device: move Queue{In,Out}boundElement Mutex to container type

Queue{In,Out}boundElement locking can contribute to significant
overhead via sync.Mutex.lockSlow() in some environments. These types
are passed throughout the device package as elements in a slice, so
move the per-element Mutex to a container around the slice.

Reviewed-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Next
Do not follow this link