Age | Commit message (Collapse) | Author |
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This (mostly) preserves the performance (as measured on Haswell and
*lake) of last commit, but it drastically reduces code size.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In every odd-numbered round, instead of operating over the state
x00 x01 x02 x03
x05 x06 x07 x04
x10 x11 x08 x09
x15 x12 x13 x14
we operate over the rotated state
x03 x00 x01 x02
x04 x05 x06 x07
x09 x10 x11 x08
x14 x15 x12 x13
The advantage here is that this requires no changes to the
'x04 x05 x06 x07' row, which is in the critical path. This
results in a noticeable latency improvement of roughly R
cycles, for R diagonal rounds in the primitive.
In the case of BLAKE2s, which I also moved from requiring AVX
to only requiring SSSE3, we save approximately 30 cycles per
compression function call on Haswell and Skylake. In other
words, this is an improvement of ~0.6 cpb.
This idea was pointed out to me by Shunsuke Shimizu, though
it appears to have been around for longer.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It insta-crashes on x86.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Apparently cdd750bfb1f76fe9be8cfb53cbe77b2e811081ab changed things, so
we fall back onto this hack.
Reported-by: Alex Xu <alex@alxu.ca>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This allows the kernel to generate ipv6 fragments. Apply the same
to ipv4 for consistency.
Signed-off-by: Joe Holden <jwh@zorins.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reference: https://lists.zx2c4.com/pipermail/wireguard/2019-April/004081.html
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Otherwise if this list item is later reused, we'll crash on list poison
or worse.
Also, add a version of Mimka's reproducer to netns.sh to catch these
types of bugs in the future.
Reported-by: Mimka <mikma.wg@lists.m7n.se>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
DaveM doth forbid.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In d2c5c103b133 ("netfilter: nat: remove nf_nat_l3proto.h and
nf_nat_core.h").
Signed-off-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This makes `wg show` and `wg showconf` and the like significantly
faster, since we don't have to iterate through every node of the trie
for every single peer. It also makes netlink cursor resumption much less
problematic, since we're just iterating through a list, rather than
having to save a traversal stack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This causes needless traversal of the trie.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Luis Ressel <aranea@aixah.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Luis Ressel <aranea@aixah.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
On ancient kernels, ipv6_stub is sometimes null in cases where IPv6 has
been disabled with a command line flag or other failures.
Reported-by: Anatoli <me@anatoli.ws>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The map allocations required to fix this are mostly slower than
unaligned paths.
Reported-by: Louis Sautier <sbraz@gentoo.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous region
that can fit the device struct:
[<0000000000000000>] __switch_to+0x94/0xb8
[<0000000000000000>] __alloc_pages_nodemask+0x764/0x7e8
[<0000000000000000>] kmalloc_order+0x20/0x40
[<0000000000000000>] __kmalloc+0x144/0x1a0
[<0000000000000000>] alloc_netdev_mqs+0x5c/0x368
[<0000000000000000>] rtnl_create_link+0x48/0x180
[<0000000000000000>] rtnl_newlink+0x410/0x708
[<0000000000000000>] rtnetlink_rcv_msg+0x190/0x1f8
[<0000000000000000>] netlink_rcv_skb+0x4c/0xf8
[<0000000000000000>] rtnetlink_rcv+0x30/0x40
[<0000000000000000>] netlink_unicast+0x18c/0x208
[<0000000000000000>] netlink_sendmsg+0x19c/0x348
[<0000000000000000>] sock_sendmsg+0x3c/0x58
[<0000000000000000>] ___sys_sendmsg+0x290/0x2b0
[<0000000000000000>] __sys_sendmsg+0x58/0xa0
[<0000000000000000>] SyS_sendmsg+0x10/0x20
[<0000000000000000>] el0_svc_naked+0x34/0x38
[<0000000000000000>] 0xffffffffffffffff
To fix the allocation stalls, decouple the hashtable allocations from the device
allocation and allocate the hashtables with kvmalloc's implicit __GFP_NORETRY
so that the allocations fall back to vmalloc with little resistance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This mitigates unrelated sidechannel attacks that think they can turn
WireGuard into a useful time oracle.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is a change for Linux 5.0.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Raf Czlonka <rczlonka@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Alex Xu <alex@alxu.ca>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|