Age | Commit message (Collapse) | Author |
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The previous code had been proved in Z3, but this new code from upstream
KreMLin is directly generated from the F*, which is preferable. The
assembly generated is identical.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is not useful for WireGuard, but for the general use case we
probably want it this way, and the speed difference is mostly lost in
the noise.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is the worst commit in the whole repo, making the code much less
readable, but so it goes with upstream maintainers.
We are now woefully wrapped at 80 columns.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Matt Layher <mdlayher@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Remove signed right shifts. Previously u64_gte_mask was only
correct for x < 2^63.
Z3 script proving correctness:
>>> from z3 import *
>>>
>>> x = BitVec("x", 64)
>>> y = BitVec("y", 64)
>>>
>>> t = LShR(x^((x^y)|((x-y)^y)), 63) - 1
>>>
>>> prove(If(UGE(x, y), BitVecVal(-1, 64), BitVecVal(0, 64)) == t)
proved
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Avoid signed right shift.
Z3 script showing equivalence:
>>> from z3 import *
>>>
>>> x = BitVec("x", 64)
>>> y = BitVec("y", 64)
>>>
>>> # Before
... x_ = ~(x ^ y)
>>> x_ &= x_ << 32
>>> x_ &= x_ << 16
>>> x_ &= x_ << 8
>>> x_ &= x_ << 4
>>> x_ &= x_ << 2
>>> x_ &= x_ << 1
>>> x_ >>= 63
>>>
>>> # After
... y_ = x ^ y
>>> y_ = y_ | -y_
>>> y_ = LShR(y_, 63) - 1
>>>
>>> prove(x_ == y_)
proved
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This causes problems with RAP and KERNEXEC for PaX, as r12 is a
reserved register.
Suggested-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Rather than abusing the handshake lock, we're much better off just using
a boring atomic64 for this. It's simpler and performs better.
Also, while we're at it, we set the handshake stamp both before and
after the calculations, in case the calculations block for a really long
time waiting for the RNG to initialize. Otherwise it's possible that
when the RNG finally initializes, two handshakes are sent back to back,
which isn't sensible.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
A mailing list interlocutor argues that sharing the same macro name
might lead to errors down the road.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Completely rework peer removal to ensure peers don't jump between
contexts and create races.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
And in general it's good to prefer dereferencing entry.peer from a
handshake object rather than a keypair object, when possible, since
keypairs could disappear before their underlying peer.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We don't want a consumer to read plaintext when it's supposed to be
reading ciphertext, which means we need to synchronize across cores.
Suggested-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
And in general tighten up the logic of peer creation.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
After we atomic_set, the peer is allowed to be freed, which means if we
want to continue to reference it, we need to bump the reference count.
This was introduced a few commits ago by b713ab0e when implementing some
simplification suggestions.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This reduces the amount of call_rcu invocations considerably.
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
If a peer is removed, it's possible for a lookup to momentarily return
NULL, resulting in needless -ENOKEY returns.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Blocks like:
if (node_placement(*trie, key, cidr, bits, &node, lock)) {
node->peer = peer;
return 0;
}
May result in a double read when adjusting the refcount, in the highly
unlikely case of LTO and an overly smart compiler.
While we're at it, replace rcu_assign_pointer(X, NULL); with
RCU_INIT_POINTER.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
docs/protocol.md hasn't existed for 3 years.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Use RCU reference counts only when we must, and otherwise use a more
reasonably named function.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Fixes a classic ABA problem that isn't actually reachable because of
rtnl_lock, but it's good to be correct anyway.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
At this stage the value if C[4] is at most ((2^256-1) + 38*(2^256-1)) / 2^256 = 38,
so there is no need to use a wide multiplication.
Change inspired by Andy Polyakov's OpenSSL implementation.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Correctness can be quickly verified with the following z3py script:
>>> from z3 import *
>>> x = BitVec("x", 256) # any 256-bit value
>>> ref = URem(x, 2**255 - 19) # correct value
>>> t = Extract(255, 255, x); x &= 2**255 - 1; # btrq $63, %3
>>> u = If(t != 0, BitVecVal(38, 256), BitVecVal(19, 256)) # cmovncl %k5, %k4
>>> x += u # addq %4, %0; adcq $0, %1; adcq $0, %2; adcq $0, %3;
>>> t = Extract(255, 255, x); x &= 2**255 - 1; # btrq $63, %3
>>> u = If(t != 0, BitVecVal(0, 256), BitVecVal(19, 256)) # cmovncl %k5, %k4
>>> x -= u # subq %4, %0; sbbq $0, %1; sbbq $0, %2; sbbq $0, %3;
>>> prove(x == ref)
proved
Change inspired by Andy Polyakov's OpenSSL implementation.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The wide multiplication by 38 in mul_a24_eltfp25519_1w is redundant:
(2^256-1) * 121666 / 2^256 is at most 121665, and therefore a 64-bit
multiplication can never overflow.
Change inspired by Andy Polyakov's OpenSSL implementation.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This avoids adding one reference per peer to the napi_hash hashtable, as
normally done by netif_napi_add(). Since we potentially could have up to
2^20 peers this would make busy polling very slow globally.
This approach is preferable to having only a single napi struct because
we get one gro_list per peer, which means packets can be combined nicely
even if we have a large number of peers.
This is also done by gro_cells_init() in net/core/gro_cells.c .
Signed-off-by: Thomas Gschwantner <tharre3@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|