build: sourcehut build script
Builds custom binaries for linux from this repo.
Signed-off-by: Justin Ludwig <justin@arcemtene.com>
README: how to use with private key stored on card
Change the readme to explain how to use with an OpenPGP card and SSH
agent, and to document the custom SSH agent extensions it relies on.
Signed-off-by: Justin Ludwig <justin@arcemtene.com>
script: make-owg-quick.sh to run custom binary
Script to make a new owg-quick executable that runs the wireguard-go
binary even on systems that already have a native WireGuard kernel
module. This script does 3 things:
1. Copies your newly built wireguard-go binary (if available) to
/usr/local/bin/openpgpcard-wireguard-go
2. Copies your existing wg-quick executable to /usr/local/bin/owg-quick,
and updates it to always use openpgpcard-wireguard-go to start up the
WireGuard interface
3. Copies your existing systemd wg-quick@.service (if available) to
/usr/local/lib/systemd/system/owg-quick@.service, and updates it to
run owg-quick instead of wg-quick
Signed-off-by: Justin Ludwig <justin@arcemtene.com>
device: interact with private key stored on card
Extracts the interaction with device's private key into publicKey and
sharedSecret methods of the Device type itself.
Adds an IsOnCard method to the NoisePrivateKey type to check for a fake
private key, where the first byte of the private key is 1. When this is
the case, the Device.publicKey method attempts to connect via socket to
an SSH agent to retrieve the public key from an OpenPGP card; and
Device.sharedSecret attempts to do the same for each DH operation.
Signed-off-by: Justin Ludwig <justin@arcemtene.com>
device: helper fns for interacting with card
Adds helper fns for interacting with an OpenPGP card via a socket to an
SSH agent. The agent must support two custom extension types:
1. "x25519/pub", to return the public key for the specified card reader
ID
2. "x25519/dh", to perform a DH operation for the specified card reader
ID and other party's public key
These helper fns mimic the behavior of the publicKey and sharedSecret
methods of the NoisePrivateKey type, but are passed a fake private key.
This fake private key allows a few minor customizations to be made to
SSH agent requests:
1. The SSH agent socket path can be customized via the second byte of
the fake private key. If the second byte is 0, these helper fns will
try to connect to the socket at /var/run/wireguard/agent0. If the
second byte is 1, they will try to connect to
/var/run/wireguard/agent1; and so on.
2. The card reader ID can be customized via the third byte of the fake
private key. If the third byte is 0, these helper fns will tell the
SSH agent to use the reader ID "0" to access the private key. If the
third byte is 1, they will tell it to use reader ID "1" to access the
private key; and so on.
Signed-off-by: Justin Ludwig <justin@arcemtene.com>
device: fix possible deadlock in close method
There is a possible deadlock in `device.Close()` when you try to close
the device very soon after its start. The problem is that two different
methods acquire the same locks in different order:
1. device.Close()
- device.ipcMutex.Lock()
- device.state.Lock()
2. device.changeState(deviceState)
- device.state.Lock()
- device.ipcMutex.Lock()
Reproducer:
func TestDevice_deadlock(t *testing.T) {
d := randDevice(t)
d.Close()
}
Problem:
$ go clean -testcache && go test -race -timeout 3s -run TestDevice_deadlock ./device | grep -A 10 sync.runtime_SemacquireMutex
sync.runtime_SemacquireMutex(0xc000117d20?, 0x94?, 0x0?)
/usr/local/opt/go/libexec/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc000130518)
/usr/local/opt/go/libexec/src/sync/mutex.go:171 +0x213
sync.(*Mutex).Lock(0xc000130518)
/usr/local/opt/go/libexec/src/sync/mutex.go:90 +0x55
golang.zx2c4.com/wireguard/device.(*Device).Close(0xc000130500)
/Users/martin.basovnik/git/basovnik/wireguard-go/device/device.go:373 +0xb6
golang.zx2c4.com/wireguard/device.TestDevice_deadlock(0x0?)
/Users/martin.basovnik/git/basovnik/wireguard-go/device/device_test.go:480 +0x2c
testing.tRunner(0xc00014c000, 0x131d7b0)
--
sync.runtime_SemacquireMutex(0xc000130564?, 0x60?, 0xc000130548?)
/usr/local/opt/go/libexec/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc000130750)
/usr/local/opt/go/libexec/src/sync/mutex.go:171 +0x213
sync.(*Mutex).Lock(0xc000130750)
/usr/local/opt/go/libexec/src/sync/mutex.go:90 +0x55
sync.(*RWMutex).Lock(0xc000130750)
/usr/local/opt/go/libexec/src/sync/rwmutex.go:147 +0x45
golang.zx2c4.com/wireguard/device.(*Device).upLocked(0xc000130500)
/Users/martin.basovnik/git/basovnik/wireguard-go/device/device.go:179 +0x72
golang.zx2c4.com/wireguard/device.(*Device).changeState(0xc000130500, 0x1)
Signed-off-by: Martin Basovnik <martin.basovnik@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: do atomic 64-bit add outside of vector loop
Only bother updating the rxBytes counter once we've processed a whole
vector, since additions are atomic.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: reduce redundant per-packet overhead in RX path
Peer.RoutineSequentialReceiver() deals with packet vectors and does not
need to perform timer and endpoint operations for every packet in a
given vector. Changing these per-packet operations to per-vector
improves throughput by as much as 10% in some environments.
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: change Peer.endpoint locking to reduce contention
Access to Peer.endpoint was previously synchronized by Peer.RWMutex.
This has now moved to Peer.endpoint.Mutex. Peer.SendBuffers() is now the
sole caller of Endpoint.ClearSrc(), which is signaled via a new bool,
Peer.endpoint.clearSrcOnTx. Previous Callers of Endpoint.ClearSrc() now
set this bool, primarily via peer.markEndpointSrcForClearing().
Peer.SetEndpointFromPacket() clears Peer.endpoint.clearSrcOnTx when an
updated conn.Endpoint is stored. This maintains the same event order as
before, i.e. a conn.Endpoint received after peer.endpoint.clearSrcOnTx
is set, but before the next Peer.SendBuffers() call results in the
latest conn.Endpoint source being used for the next packet transmission.
These changes result in throughput improvements for single flow,
parallel (-P n) flow, and bidirectional (--bidir) flow iperf3 TCP/UDP
tests as measured on both Linux and Windows. Latency under load improves
especially for high throughput Linux scenarios. These improvements are
likely realized on all platforms to some degree, as the changes are not
platform-specific.
Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
tun: implement UDP GSO/GRO for Linux
Implement UDP GSO and GRO for the Linux tun.Device, which is made
possible by virtio extensions in the kernel's TUN driver starting in
v6.2.
secnetperf, a QUIC benchmark utility from microsoft/msquic@8e1eb1a, is
used to demonstrate the effect of this commit between two Linux
computers with i5-12400 CPUs. There is roughly ~13us of round trip
latency between them. secnetperf was invoked with the following command
line options:
-stats:1 -exec:maxtput -test:tput -download:10000 -timed:1 -encrypt:0
The first result is from commit 2e0774f without UDP GSO/GRO on the TUN.
[conn][0x55739a144980] STATS: EcnCapable=0 RTT=3973 us
SendTotalPackets=55859 SendSuspectedLostPackets=61
SendSpuriousLostPackets=59 SendCongestionCount=27
SendEcnCongestionCount=0 RecvTotalPackets=2779122
RecvReorderedPackets=0 RecvDroppedPackets=0
RecvDuplicatePackets=0 RecvDecryptionFailures=0
Result: 3654977571 bytes @ 2922821 kbps (10003.972 ms).
The second result is with UDP GSO/GRO on the TUN.
[conn][0x56493dfd09a0] STATS: EcnCapable=0 RTT=1216 us
SendTotalPackets=165033 SendSuspectedLostPackets=64
SendSpuriousLostPackets=61 SendCongestionCount=53
SendEcnCongestionCount=0 RecvTotalPackets=11845268
RecvReorderedPackets=25267 RecvDroppedPackets=0
RecvDuplicatePackets=0 RecvDecryptionFailures=0
Result: 15574671184 bytes @ 12458214 kbps (10001.222 ms).
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
tun: fix Device.Read() buf length assumption on Windows
The length of a packet read from the underlying TUN device may exceed
the length of a supplied buffer when MTU exceeds device.MaxMessageSize.
Reviewed-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: ratchet up max segment size on android
GRO requires big allocations to be efficient. This isn't great, as there
might be Android memory usage issues. So we should revisit this commit.
But at least it gets things working again.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
conn: set unused OOB to zero length
Otherwise in the event that we're using GSO without sticky sockets, we
pass garbage OOB buffers to sendmmsg, making a EINVAL, when GSO doesn't
set its header.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
conn: fix cmsg data padding calculation for gso
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
conn: separate gso and sticky control
Android wants GSO but not sticky.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
conn: harmonize GOOS checks between "linux" and "android"
Otherwise GRO gets enabled on Android, but the conn doesn't use it,
resulting in bundled packets being discarded.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
conn: simplify supportsUDPOffload
This allows a kernel to support UDP_GRO while not supporting
UDP_SEGMENT.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
tun: fix crash when ForceMTU is called after close
Close closes the events channel, resulting in a panic from send on
closed channel.
Reported-By: Brad Fitzpatrick <brad@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: move Queue{In,Out}boundElement Mutex to container type
Queue{In,Out}boundElement locking can contribute to significant
overhead via sync.Mutex.lockSlow() in some environments. These types
are passed throughout the device package as elements in a slice, so
move the per-element Mutex to a container around the slice.
Reviewed-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>