Age | Commit message (Collapse) | Author |
|
Change places that open-code NLA_POLICY_MIN_LEN() to
use the macro instead, giving us flexibility in how we
handle the details of the macro.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Change places that open-code NLA_POLICY_EXACT_LEN() to
use the macro instead, giving us flexibility in how we
handle the details of the macro.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We precompute the static-static ECDH during configuration time, in order
to save an expensive computation later when receiving network packets.
However, not all ECDH computations yield a contributory result. Prior,
we were just not letting those peers be added to the interface. However,
this creates a strange inconsistency, since it was still possible to add
other weird points, like a valid public key plus a low-order point, and,
like points that result in zeros, a handshake would not complete. In
order to make the behavior more uniform and less surprising, simply
allow all peers to be added. Then, we'll error out later when doing the
crypto if there's an issue. This also adds more separation between the
crypto layer and the configuration layer.
Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Our static-static calculation returns a failure if the public key is of
low order. We check for this when peers are added, and don't allow them
to be added if they're low order, except in the case where we haven't
yet been given a private key. In that case, we would defer the removal
of the peer until we're given a private key, since at that point we're
doing new static-static calculations which incur failures we can act on.
This meant, however, that we wound up removing peers rather late in the
configuration flow.
Syzkaller points out that peer_remove calls flush_workqueue, which in
turn might then wait for sending a handshake initiation to complete.
Since handshake initiation needs the static identity lock, holding the
static identity lock while calling peer_remove can result in a rare
deadlock. We have precisely this case in this situation of late-stage
peer removal based on an invalid public key. We can't drop the lock when
removing, because then incoming handshakes might interact with a bogus
static-static calculation.
While the band-aid patch for this would involve breaking up the peer
removal into two steps like wg_peer_remove_all does, in order to solve
the locking issue, there's actually a much more elegant way of fixing
this:
If the static-static calculation succeeds with one private key, it
*must* succeed with all others, because all 32-byte strings map to valid
private keys, thanks to clamping. That means we can get rid of this
silly dance and locking headaches of removing peers late in the
configuration flow, and instead just reject them early on, regardless of
whether the device has yet been assigned a private key. For the case
where the device doesn't yet have a private key, we safely use zeros
just for the purposes of checking for low order points by way of
checking the output of the calculation.
The following PoC will trigger the deadlock:
ip link add wg0 type wireguard
ip addr add 10.0.0.1/24 dev wg0
ip link set wg0 up
ping -f 10.0.0.2 &
while true; do
wg set wg0 private-key /dev/null peer AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= allowed-ips 10.0.0.0/24 endpoint 10.0.0.3:1234
wg set wg0 private-key <(echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=)
done
[ 0.949105] ======================================================
[ 0.949550] WARNING: possible circular locking dependency detected
[ 0.950143] 5.5.0-debug+ #18 Not tainted
[ 0.950431] ------------------------------------------------------
[ 0.950959] wg/89 is trying to acquire lock:
[ 0.951252] ffff8880333e2128 ((wq_completion)wg-kex-wg0){+.+.}, at: flush_workqueue+0xe3/0x12f0
[ 0.951865]
[ 0.951865] but task is already holding lock:
[ 0.952280] ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[ 0.953011]
[ 0.953011] which lock already depends on the new lock.
[ 0.953011]
[ 0.953651]
[ 0.953651] the existing dependency chain (in reverse order) is:
[ 0.954292]
[ 0.954292] -> #2 (&wg->static_identity.lock){++++}:
[ 0.954804] lock_acquire+0x127/0x350
[ 0.955133] down_read+0x83/0x410
[ 0.955428] wg_noise_handshake_create_initiation+0x97/0x700
[ 0.955885] wg_packet_send_handshake_initiation+0x13a/0x280
[ 0.956401] wg_packet_handshake_send_worker+0x10/0x20
[ 0.956841] process_one_work+0x806/0x1500
[ 0.957167] worker_thread+0x8c/0xcb0
[ 0.957549] kthread+0x2ee/0x3b0
[ 0.957792] ret_from_fork+0x24/0x30
[ 0.958234]
[ 0.958234] -> #1 ((work_completion)(&peer->transmit_handshake_work)){+.+.}:
[ 0.958808] lock_acquire+0x127/0x350
[ 0.959075] process_one_work+0x7ab/0x1500
[ 0.959369] worker_thread+0x8c/0xcb0
[ 0.959639] kthread+0x2ee/0x3b0
[ 0.959896] ret_from_fork+0x24/0x30
[ 0.960346]
[ 0.960346] -> #0 ((wq_completion)wg-kex-wg0){+.+.}:
[ 0.960945] check_prev_add+0x167/0x1e20
[ 0.961351] __lock_acquire+0x2012/0x3170
[ 0.961725] lock_acquire+0x127/0x350
[ 0.961990] flush_workqueue+0x106/0x12f0
[ 0.962280] peer_remove_after_dead+0x160/0x220
[ 0.962600] wg_set_device+0xa24/0xcc0
[ 0.962994] genl_rcv_msg+0x52f/0xe90
[ 0.963298] netlink_rcv_skb+0x111/0x320
[ 0.963618] genl_rcv+0x1f/0x30
[ 0.963853] netlink_unicast+0x3f6/0x610
[ 0.964245] netlink_sendmsg+0x700/0xb80
[ 0.964586] __sys_sendto+0x1dd/0x2c0
[ 0.964854] __x64_sys_sendto+0xd8/0x1b0
[ 0.965141] do_syscall_64+0x90/0xd9a
[ 0.965408] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 0.965769]
[ 0.965769] other info that might help us debug this:
[ 0.965769]
[ 0.966337] Chain exists of:
[ 0.966337] (wq_completion)wg-kex-wg0 --> (work_completion)(&peer->transmit_handshake_work) --> &wg->static_identity.lock
[ 0.966337]
[ 0.967417] Possible unsafe locking scenario:
[ 0.967417]
[ 0.967836] CPU0 CPU1
[ 0.968155] ---- ----
[ 0.968497] lock(&wg->static_identity.lock);
[ 0.968779] lock((work_completion)(&peer->transmit_handshake_work));
[ 0.969345] lock(&wg->static_identity.lock);
[ 0.969809] lock((wq_completion)wg-kex-wg0);
[ 0.970146]
[ 0.970146] *** DEADLOCK ***
[ 0.970146]
[ 0.970531] 5 locks held by wg/89:
[ 0.970908] #0: ffffffff827433c8 (cb_lock){++++}, at: genl_rcv+0x10/0x30
[ 0.971400] #1: ffffffff82743480 (genl_mutex){+.+.}, at: genl_rcv_msg+0x642/0xe90
[ 0.971924] #2: ffffffff827160c0 (rtnl_mutex){+.+.}, at: wg_set_device+0x9f/0xcc0
[ 0.972488] #3: ffff888032819de0 (&wg->device_update_lock){+.+.}, at: wg_set_device+0xb0/0xcc0
[ 0.973095] #4: ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[ 0.973653]
[ 0.973653] stack backtrace:
[ 0.973932] CPU: 1 PID: 89 Comm: wg Not tainted 5.5.0-debug+ #18
[ 0.974476] Call Trace:
[ 0.974638] dump_stack+0x97/0xe0
[ 0.974869] check_noncircular+0x312/0x3e0
[ 0.975132] ? print_circular_bug+0x1f0/0x1f0
[ 0.975410] ? __kernel_text_address+0x9/0x30
[ 0.975727] ? unwind_get_return_address+0x51/0x90
[ 0.976024] check_prev_add+0x167/0x1e20
[ 0.976367] ? graph_lock+0x70/0x160
[ 0.976682] __lock_acquire+0x2012/0x3170
[ 0.976998] ? register_lock_class+0x1140/0x1140
[ 0.977323] lock_acquire+0x127/0x350
[ 0.977627] ? flush_workqueue+0xe3/0x12f0
[ 0.977890] flush_workqueue+0x106/0x12f0
[ 0.978147] ? flush_workqueue+0xe3/0x12f0
[ 0.978410] ? find_held_lock+0x2c/0x110
[ 0.978662] ? lock_downgrade+0x6e0/0x6e0
[ 0.978919] ? queue_rcu_work+0x60/0x60
[ 0.979166] ? netif_napi_del+0x151/0x3b0
[ 0.979501] ? peer_remove_after_dead+0x160/0x220
[ 0.979871] peer_remove_after_dead+0x160/0x220
[ 0.980232] wg_set_device+0xa24/0xcc0
[ 0.980516] ? deref_stack_reg+0x8e/0xc0
[ 0.980801] ? set_peer+0xe10/0xe10
[ 0.981040] ? __ww_mutex_check_waiters+0x150/0x150
[ 0.981430] ? __nla_validate_parse+0x163/0x270
[ 0.981719] ? genl_family_rcv_msg_attrs_parse+0x13f/0x310
[ 0.982078] genl_rcv_msg+0x52f/0xe90
[ 0.982348] ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[ 0.982690] ? register_lock_class+0x1140/0x1140
[ 0.983049] netlink_rcv_skb+0x111/0x320
[ 0.983298] ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[ 0.983645] ? netlink_ack+0x880/0x880
[ 0.983888] genl_rcv+0x1f/0x30
[ 0.984168] netlink_unicast+0x3f6/0x610
[ 0.984443] ? netlink_detachskb+0x60/0x60
[ 0.984729] ? find_held_lock+0x2c/0x110
[ 0.984976] netlink_sendmsg+0x700/0xb80
[ 0.985220] ? netlink_broadcast_filtered+0xa60/0xa60
[ 0.985533] __sys_sendto+0x1dd/0x2c0
[ 0.985763] ? __x64_sys_getpeername+0xb0/0xb0
[ 0.986039] ? sockfd_lookup_light+0x17/0x160
[ 0.986397] ? __sys_recvmsg+0x8c/0xf0
[ 0.986711] ? __sys_recvmsg_sock+0xd0/0xd0
[ 0.987018] __x64_sys_sendto+0xd8/0x1b0
[ 0.987283] ? lockdep_hardirqs_on+0x39b/0x5a0
[ 0.987666] do_syscall_64+0x90/0xd9a
[ 0.987903] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 0.988223] RIP: 0033:0x7fe77c12003e
[ 0.988508] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 4
[ 0.989666] RSP: 002b:00007fffada2ed58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 0.990137] RAX: ffffffffffffffda RBX: 00007fe77c159d48 RCX: 00007fe77c12003e
[ 0.990583] RDX: 0000000000000040 RSI: 000055fd1d38e020 RDI: 0000000000000004
[ 0.991091] RBP: 000055fd1d38e020 R08: 000055fd1cb63358 R09: 000000000000000c
[ 0.991568] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002c
[ 0.992014] R13: 0000000000000004 R14: 000055fd1d38e020 R15: 0000000000000001
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This enables race-free updates for wg-dynamic and similar tools.
Suggested-by: Thomas Gschwantner <tharre3@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Derrick Pallas <derrick@pallas.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This eliminates the headache of managing cb->args[??].
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This makes `wg show` and `wg showconf` and the like significantly
faster, since we don't have to iterate through every node of the trie
for every single peer. It also makes netlink cursor resumption much less
problematic, since we're just iterating through a list, rather than
having to save a traversal stack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This causes needless traversal of the trie.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The hashtable allocations are quite large, and cause the device allocation in
the net framework to stall sometimes while it tries to find a contiguous region
that can fit the device struct:
[<0000000000000000>] __switch_to+0x94/0xb8
[<0000000000000000>] __alloc_pages_nodemask+0x764/0x7e8
[<0000000000000000>] kmalloc_order+0x20/0x40
[<0000000000000000>] __kmalloc+0x144/0x1a0
[<0000000000000000>] alloc_netdev_mqs+0x5c/0x368
[<0000000000000000>] rtnl_create_link+0x48/0x180
[<0000000000000000>] rtnl_newlink+0x410/0x708
[<0000000000000000>] rtnetlink_rcv_msg+0x190/0x1f8
[<0000000000000000>] netlink_rcv_skb+0x4c/0xf8
[<0000000000000000>] rtnetlink_rcv+0x30/0x40
[<0000000000000000>] netlink_unicast+0x18c/0x208
[<0000000000000000>] netlink_sendmsg+0x19c/0x348
[<0000000000000000>] sock_sendmsg+0x3c/0x58
[<0000000000000000>] ___sys_sendmsg+0x290/0x2b0
[<0000000000000000>] __sys_sendmsg+0x58/0xa0
[<0000000000000000>] SyS_sendmsg+0x10/0x20
[<0000000000000000>] el0_svc_naked+0x34/0x38
[<0000000000000000>] 0xffffffffffffffff
To fix the allocation stalls, decouple the hashtable allocations from the device
allocation and allocate the hashtables with kvmalloc's implicit __GFP_NORETRY
so that the allocations fall back to vmalloc with little resistance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In WireGuard, the underlying UDP socket lives in the namespace where the
interface was created and doesn't move if the interface is moved. This
allows one to create the interface in some privileged place that has
Internet access, and then move it into a container namespace that only
has the WireGuard interface for egress. Consider the following
situation:
1. Interface created in namespace A. Socket therefore lives in namespace A.
2. Interface moved to namespace B. Socket remains in namespace A.
3. Namespace B now has access to the interface and changes the listen
port and/or fwmark of socket. Change is reflected in namespace A.
This behavior is arguably _fine_ and perhaps even expected or
acceptable. But there's also an argument to be made that B should have
A's cred to do so. So, this patch adds a simple ns_capable check.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This required a bit of pruning of our christmas trees.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's not used for anything, and LKML doesn't like the type being used as
an index value.
Suggested-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
I understand why this must be done, though I'm not so happy about having
to do it. In some places, it puts us over 80 chars and we have to break
lines up in further ugly ways. And in general, I think this makes things
harder to read. Yet another thing we must do to please upstream.
Maybe this can be replaced in the future by some kind of automatic
module namespacing logic in the linker, or even combined with LTO and
aggressive symbol stripping.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The kernel has very specific rules correlating file type with comment
type, and also SPDX identifiers can't be merged with other comments.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
While we don't want people to ever use old protocols, people will
complain if the API "changes", so explicitly make the unset protocol
mean the latest, and add a dummy mechanism of specifying the protocol on
a per-peer basis, which we hope nobody actually ever uses.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is the worst commit in the whole repo, making the code much less
readable, but so it goes with upstream maintainers.
We are now woefully wrapped at 80 columns.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Matt Layher <mdlayher@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Use RCU reference counts only when we must, and otherwise use a more
reasonably named function.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since this is a network protocol, expirations need to be accounted for,
even across system suspend. On real systems, this isn't a problem, since
we're clearing all keys before suspend. But on Android, where we don't
do that, this is something of a problem. So, we switch to using boottime
instead of jiffies.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We don't want the local private key to not correspond with a precomputed
ss or precomputed cookie hash at any intermediate point.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's good to have SPDX identifiers in all files as the Linux kernel
developers are working to add these identifiers to all files.
Update all files with the correct SPDX license identifier based on the license
text of the project or based on the license in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of the
full boiler plate text.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modified-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
At somepoint we may need to wg_ namespace these.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This gets us nanoseconds instead of microseconds, which is better, and
we can do this pretty much without freaking out existing userspace,
which doesn't actually make use of the nano/micro seconds field:
zx2c4@thinkpad ~ $ cat a.c
void main()
{
puts(sizeof(struct timeval) == sizeof(struct timespec) ? "success" : "failure");
}
zx2c4@thinkpad ~ $ gcc a.c -m64 && ./a.out
success
zx2c4@thinkpad ~ $ gcc a.c -m32 && ./a.out
success
This doesn't solve y2038 problem, but timespec64 isn't yet a thing in
userspace.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
When an interface is down, the socket port can change freely. A socket
will be allocated when the interface comes up, and if a socket can't be
allocated, the interface doesn't come up.
However, a socket port can change while the interface is up. In this
case, if a new socket with a new port cannot be allocated, it's
important to keep the interface in a consistent state. The choices are
either to bring down the interface or to preserve the old socket. This
patch implements the latter.
Reported-by: Marc-Antoine Perennou <keruspe@exherbo.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
These aren't actually valid 25519 points pre-normalization, and doing
this is required to make unsetting private keys based on all zeros.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Makes it more clear that this _not_ a routing table replacement.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
One types:
for (i = 0 ...
So one should also type:
for_each_obj (obj ...
But the upstream kernel style guidelines are insane, and so we must
instead do:
for_each_obj(obj ...
Ugly, but one must choose his battles wisely.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since the peer list is protected by the device_update_lock, and since
items are removed from the peer list before putting their final
reference, we don't actually need to take a reference when iterating.
This allows us to simplify the macro considerably.
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|