Age | Commit message (Collapse) | Author |
|
Now that wg_examine_packet_protocol has been added for general
consumption as ip_tunnel_parse_protocol, it's possible to remove
wg_examine_packet_protocol and simply use the new
ip_tunnel_parse_protocol function directly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In "queueing: preserve flow hash across packet scrubbing", we were
required to slightly increase the size of the receive replay counter to
something still fairly small, but an increase nonetheless. It turns out
that we can recoup some of the additional memory overhead by splitting
up the prior union type into two distinct types. Before, we used the
same "noise_counter" union for both sending and receiving, with sending
just using a simple atomic64_t, while receiving used the full replay
counter checker. This meant that most of the memory being allocated for
the sending counter was being wasted. Since the old "noise_counter" type
increased in size in the prior commit, now is a good time to split up
that union type into a distinct "noise_replay_ counter" for receiving
and a boring atomic64_t for sending, each using neither more nor less
memory than required.
Also, since sometimes the replay counter is accessed without
necessitating additional accesses to the bitmap, we can reduce cache
misses by hoisting the always-necessary lock above the bitmap in the
struct layout. We also change a "noise_replay_counter" stack allocation
to kmalloc in a -DDEBUG selftest so that KASAN doesn't trigger a stack
frame warning.
All and all, removing a bit of abstraction in this commit makes the code
simpler and smaller, in addition to the motivating memory usage
recuperation. For example, passing around raw "noise_symmetric_key"
structs is something that really only makes sense within noise.c, in the
one place where the sending and receiving keys can safely be thought of
as the same type of object; subsequent to that, it's important that we
uniformly access these through keypair->{sending,receiving}, where their
distinct roles are always made explicit. So this patch allows us to draw
that distinction clearly as well.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's important that we clear most header fields during encapsulation and
decapsulation, because the packet is substantially changed, and we don't
want any info leak or logic bug due to an accidental correlation. But,
for encapsulation, it's wrong to clear skb->hash, since it's used by
fq_codel and flow dissection in general. Without it, classification does
not proceed as usual. This change might make it easier to estimate the
number of innerflows by examining clustering of out of order packets,
but this shouldn't open up anything that can't already be inferred
otherwise (e.g. syn packet size inference), and fq_codel can be disabled
anyway.
Furthermore, it might be the case that the hash isn't used or queried at
all until after wireguard transmits the encrypted UDP packet, which
means skb->hash might still be zero at this point, and thus no hash
taken over the inner packet data. In order to address this situation, we
force a calculation of skb->hash before encrypting packet data.
Of course this means that fq_codel might transmit packets slightly more
out of order than usual. Toke did some testing on beefy machines with
high quantities of parallel flows and found that increasing the
reply-attack counter to 8192 takes care of the most pathological cases
pretty well.
Reported-by: Dave Taht <dave.taht@gmail.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.
What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.
The generated code is the same.
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The situation in which we wind up hitting the default case here
indicates a major bug in earlier parsing code. It is not a usual thing
that should ever happen, which means a "friendly" message for it doesn't
make sense. Rather, replace this with a WARN_ON, just like we do earlier
in the file for a similar situation, so that somebody sends us a bug
report and we can fix it.
Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We carry out checks to the effect of:
if (skb->protocol != wg_examine_packet_protocol(skb))
goto err;
By having wg_skb_examine_untrusted_ip_hdr return 0 on failure, this
means that the check above still passes in the case where skb->protocol
is zero, which is possible to hit with AF_PACKET:
struct sockaddr_pkt saddr = { .spkt_device = "wg0" };
unsigned char buffer[5] = { 0 };
sendto(socket(AF_PACKET, SOCK_PACKET, /* skb->protocol = */ 0),
buffer, sizeof(buffer), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
Additional checks mean that this isn't actually a problem in the code
base, but I could imagine it becoming a problem later if the function is
used more liberally.
I would prefer to fix this by having wg_examine_packet_protocol return a
32-bit ~0 value on failure, which will never match any value of
skb->protocol, which would simply change the generated code from a mov
to a movzx. However, sparse complains, and adding __force casts doesn't
seem like a good idea, so instead we just add a simple helper function
to check for the zero return value. Since wg_examine_packet_protocol
itself gets inlined, this winds up not adding an additional branch to
the generated code, since the 0 return value already happens in a
mergable branch.
Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is a small optimization that prevents more expensive comparisons
from happening when they are no longer necessary, by clearing the
last_under_load variable whenever we wind up in a state where we were
under load but we no longer are.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
I'm not totally comfortable with these changes yet, and it'll require
some more scrutiny. But it's a start.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Coarse ktime is broken until [1] in 5.2 and kernels without the
backport, so we use fallback code there.
The fallback code has also been improved significantly. It now only uses
slower clocks on kernels < 3.17, at the expense of some accuracy we're
not overly concerned about.
[1] https://lore.kernel.org/lkml/tip-e3ff9c3678b4d80e22d2557b68726174578eaf52@git.kernel.org/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous region
that can fit the device struct:
[<0000000000000000>] __switch_to+0x94/0xb8
[<0000000000000000>] __alloc_pages_nodemask+0x764/0x7e8
[<0000000000000000>] kmalloc_order+0x20/0x40
[<0000000000000000>] __kmalloc+0x144/0x1a0
[<0000000000000000>] alloc_netdev_mqs+0x5c/0x368
[<0000000000000000>] rtnl_create_link+0x48/0x180
[<0000000000000000>] rtnl_newlink+0x410/0x708
[<0000000000000000>] rtnetlink_rcv_msg+0x190/0x1f8
[<0000000000000000>] netlink_rcv_skb+0x4c/0xf8
[<0000000000000000>] rtnetlink_rcv+0x30/0x40
[<0000000000000000>] netlink_unicast+0x18c/0x208
[<0000000000000000>] netlink_sendmsg+0x19c/0x348
[<0000000000000000>] sock_sendmsg+0x3c/0x58
[<0000000000000000>] ___sys_sendmsg+0x290/0x2b0
[<0000000000000000>] __sys_sendmsg+0x58/0xa0
[<0000000000000000>] SyS_sendmsg+0x10/0x20
[<0000000000000000>] el0_svc_naked+0x34/0x38
[<0000000000000000>] 0xffffffffffffffff
To fix the allocation stalls, decouple the hashtable allocations from the device
allocation and allocate the hashtables with kvmalloc's implicit __GFP_NORETRY
so that the allocations fall back to vmalloc with little resistance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This means we do less computation on encapsulated payloads.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This required a bit of pruning of our christmas trees.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is done by 259 other files in the kernel tree:
linux $ rg '#include.*\.c' -l | wc -l
259
Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
I understand why this must be done, though I'm not so happy about having
to do it. In some places, it puts us over 80 chars and we have to break
lines up in further ugly ways. And in general, I think this makes things
harder to read. Yet another thing we must do to please upstream.
Maybe this can be replaced in the future by some kind of automatic
module namespacing logic in the linker, or even combined with LTO and
aggressive symbol stripping.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The kernel has very specific rules correlating file type with comment
type, and also SPDX identifiers can't be merged with other comments.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This reduces stack usage to quell warnings on powerpc.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is the worst commit in the whole repo, making the code much less
readable, but so it goes with upstream maintainers.
We are now woefully wrapped at 80 columns.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Completely rework peer removal to ensure peers don't jump between
contexts and create races.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We don't want a consumer to read plaintext when it's supposed to be
reading ciphertext, which means we need to synchronize across cores.
Suggested-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Use RCU reference counts only when we must, and otherwise use a more
reasonably named function.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Thomas Gschwantner <tharre3@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
[Jason: fixed up the flushing of the rx_queue in peer_remove]
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Generally if we're inaccurate by a few nanoseconds, it doesn't matter.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since this is a network protocol, expirations need to be accounted for,
even across system suspend. On real systems, this isn't a problem, since
we're clearing all keys before suspend. But on Android, where we don't
do that, this is something of a problem. So, we switch to using boottime
instead of jiffies.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This had a bad performance impact. We'll probably need to revisit this
later, but for now, let's not introduce a regression.
Reported-by: Lonnie Abelbeck <lonnie@abelbeck.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Otherwise it's too easy to trigger cookie reply messages.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since these are the only consumers, there's no need for locking.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Receiving any type of authenticated data is a receive and a traversal.
When it isn't a keepalive it's a data. That's our rule. Whether or not
it's the correct type of data or has the right IP header shouldn't
influence timer decisions.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|