Age | Commit message (Collapse) | Author |
|
This reverts commit cad80597c7947f0def83caf8cb56aff0149c83a8.
Because this commit has not been backported so far, due to the implications
of building Ubuntu's backport of wireguard in a timely manner.
For now, reverting this fix would allow wireguard-linux-compat CI to work
on Ubuntu 18.04.
A different fix or the same one can be applied again when the time is
right.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
CentOS Stream 8 by now (4.18.0-301.1.el8) reports RHEL_MINOR=5. The
current RHEL 8 minor release is still 3. RHEL 8.4 is in beta. Replace
equal comparison by greater equal to (hopefully) be a little bit more
future proof.
Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This corresponds to the fancier upstream commit that's still on lkml,
which passes a zeroed ip_options struct to __icmp_send.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use
skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu
18.04 default kernel, which is based on linux 4.15.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Having two ring buffers per-peer means that every peer results in two
massive ring allocations. On an 8-core x86_64 machine, this commit
reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
is an 90% reduction. Ninety percent! With some single-machine
deployments approaching 500,000 peers, we're talking about a reduction
from 7 gigs of memory down to 700 megs of memory.
In order to get rid of these per-peer allocations, this commit switches
to using a list-based queueing approach. Currently GSO fragments are
chained together using the skb->next pointer (the skb_list_* singly
linked list approach), so we form the per-peer queue around the unused
skb->prev pointer (which sort of makes sense because the links are
pointing backwards). Use of skb_queue_* is not possible here, because
that is based on doubly linked lists and spinlocks. Multiple cores can
write into the queue at any given time, because its writes occur in the
start_xmit path or in the udp_recv path. But reads happen in a single
workqueue item per-peer, amounting to a multi-producer, single-consumer
paradigm.
The MPSC queue is implemented locklessly and never blocks. However, it
is not linearizable (though it is serializable), with a very tight and
unlikely race on writes, which, when hit (some tiny fraction of the
0.15% of partial adds on a fully loaded 16-core x86_64 system), causes
the queue reader to terminate early. However, because every packet sent
queues up the same workqueue item after it is fully added, the worker
resumes again, and stopping early isn't actually a problem, since at
that point the packet wouldn't have yet been added to the encryption
queue. These properties allow us to avoid disabling interrupts or
spinning. The design is based on Dmitry Vyukov's algorithm [1].
Performance-wise, ordinarily list-based queues aren't preferable to
ringbuffers, because of cache misses when following pointers around.
However, we *already* have to follow the adjacent pointers when working
through fragments, so there shouldn't actually be any change there. A
potential downside is that dequeueing is a bit more complicated, but the
ptr_ring structure used prior had a spinlock when dequeueing, so all and
all the difference appears to be a wash.
Actually, from profiling, the biggest performance hit, by far, of this
commit winds up being atomic_add_unless(count, 1, max) and atomic_
dec(count), which account for the majority of CPU time, according to
perf. In that sense, the previous ring buffer was superior in that it
could check if it was full by head==tail, which the list-based approach
cannot do.
But all and all, this enables us to get massive memory savings, allowing
WireGuard to scale for real world deployments, without taking much of a
performance hit.
[1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
With the 4.4.256 and 4.9.256 kernels, the previous calculation for
integer comparison overflowed. This commit redefines the broken
constants to have more space for the sublevel.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We don't need this in all files, and it just complicates things.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The 5.4 series of -rt kernels moved from PREEMPT_RT_BASE/PREEMPT_RT_FULL
to PREEMPT_RT, so we have to account for it here. Otherwise users get
scheduling-while-atomic splats.
Reported-by: Erik Schuitema <erik@essd.nl>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Now that WireGuard is properly supported by 15.2 and people have had
sufficient time to upgrade, we can drop support for 15.1 in this compat
module.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Vladimir Benes <vbenes@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
These are required for moving wg_examine_packet_protocol out of
wireguard and into upstream.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
>=15.2 is in SUSE's kernel now.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Vladimir Benes <vbenes@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The 42.x series is no longer supported, and the 15.2 kernel is getting
a proper backport, so at the moment, we only care about supporting 15.1.
Eventually we'll drop that too.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Also remove the confusing 119/118 distinction from the Debian clause,
which is no longer as important.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This should help with 8.3 beta rolls being recognized as 8.1 instead of
8.2 quirks.
Reported-by: Vladimir Benes <vbenes@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Pascal Ernster <pascal.ernster@rub.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Link: https://bugs.debian.org/959157
Reported-by: Luca Filipozzi <lfilipoz@debian.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Before we were trying to check for timeconst.h by looking in the kernel
source directory. This isn't quite correct on configurations in which
the object directory is separate from the kernel source directory, for
example when using O="elsewhere" as a make option when building the
kernel. The correct fix is to use $(CURDIR), which should point to
where we want.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Some distros that backported icmp[v6]_ndo_send still try to build the
compat module in some corner case circumstances, resulting in errors.
Work around this with the usual __compat games.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
76ebbe78f7390aee075a7f3768af197ded1bdfbb didn't come until 4.15.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: King DuckZ <dev00@gmx.it>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|