summaryrefslogtreecommitdiffhomepage
path: root/src/peer.h
AgeCommit message (Collapse)Author
2021-06-04peer: allocate in kmem_cacheJason A. Donenfeld
With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-18queueing: get rid of per-peer ring buffersJason A. Donenfeld
Having two ring buffers per-peer means that every peer results in two massive ring allocations. On an 8-core x86_64 machine, this commit reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which is an 90% reduction. Ninety percent! With some single-machine deployments approaching 500,000 peers, we're talking about a reduction from 7 gigs of memory down to 700 megs of memory. In order to get rid of these per-peer allocations, this commit switches to using a list-based queueing approach. Currently GSO fragments are chained together using the skb->next pointer (the skb_list_* singly linked list approach), so we form the per-peer queue around the unused skb->prev pointer (which sort of makes sense because the links are pointing backwards). Use of skb_queue_* is not possible here, because that is based on doubly linked lists and spinlocks. Multiple cores can write into the queue at any given time, because its writes occur in the start_xmit path or in the udp_recv path. But reads happen in a single workqueue item per-peer, amounting to a multi-producer, single-consumer paradigm. The MPSC queue is implemented locklessly and never blocks. However, it is not linearizable (though it is serializable), with a very tight and unlikely race on writes, which, when hit (some tiny fraction of the 0.15% of partial adds on a fully loaded 16-core x86_64 system), causes the queue reader to terminate early. However, because every packet sent queues up the same workqueue item after it is fully added, the worker resumes again, and stopping early isn't actually a problem, since at that point the packet wouldn't have yet been added to the encryption queue. These properties allow us to avoid disabling interrupts or spinning. The design is based on Dmitry Vyukov's algorithm [1]. Performance-wise, ordinarily list-based queues aren't preferable to ringbuffers, because of cache misses when following pointers around. However, we *already* have to follow the adjacent pointers when working through fragments, so there shouldn't actually be any change there. A potential downside is that dequeueing is a bit more complicated, but the ptr_ring structure used prior had a spinlock when dequeueing, so all and all the difference appears to be a wash. Actually, from profiling, the biggest performance hit, by far, of this commit winds up being atomic_add_unless(count, 1, max) and atomic_ dec(count), which account for the majority of CPU time, according to perf. In that sense, the previous ring buffer was superior in that it could check if it was full by head==tail, which the list-based approach cannot do. But all and all, this enables us to get massive memory savings, allowing WireGuard to scale for real world deployments, without taking much of a performance hit. [1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-08peer: put frequently used members above cache linesJason A. Donenfeld
The is_dead boolean is checked for every single packet, while the internal_id member is used basically only for pr_debug messages. So it makes sense to hoist up is_dead into some space formerly unused by a struct hole, while demoting internal_api to below the lowest struct cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-02-26allowedips: maintain per-peer list of allowedipsJason A. Donenfeld
This makes `wg show` and `wg showconf` and the like significantly faster, since we don't have to iterate through every node of the trie for every single peer. It also makes netlink cursor resumption much less problematic, since we're just iterating through a list, rather than having to save a traversal stack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-23netlink: use __kernel_timespec for handshake timeJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-07global: update copyrightJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-27timers: it is always reasonable to remove a timerJason A. Donenfeld
If struct timer_list has not been setup, it is zeroed, in which case timer_pending is false, so calling del_timer is safe. Calling del_timer is also safe on a timer that has already been del_timer'd. And calling del_timer is safe after a peer is dead, since the whole point of it being dead is that no more timers are created and all contexts eventually stop. Finally del_timer uses a lock, which means it's safe to call it concurrently. Therefore, we do not need any guards around calls to del_timer. While we're at it, we can get rid of the old lingering timers_enabled boolean which wasn't doing anything anyway anymore. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-08global: rename struct wireguard_ to struct wg_Jason A. Donenfeld
This required a bit of pruning of our christmas trees. Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-02global: prefix all functions with wg_Jason A. Donenfeld
I understand why this must be done, though I'm not so happy about having to do it. In some places, it puts us over 80 chars and we have to break lines up in further ugly ways. And in general, I think this makes things harder to read. Yet another thing we must do to please upstream. Maybe this can be replaced in the future by some kind of automatic module namespacing logic in the linker, or even combined with LTO and aggressive symbol stripping. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-09-20global: put SPDX identifier on its own lineJason A. Donenfeld
The kernel has very specific rules correlating file type with comment type, and also SPDX identifiers can't be merged with other comments. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-08-28global: run through clang-formatJason A. Donenfeld
This is the worst commit in the whole repo, making the code much less readable, but so it goes with upstream maintainers. We are now woefully wrapped at 80 columns. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-08-04send: switch handshake stamp to an atomicJason A. Donenfeld
Rather than abusing the handshake lock, we're much better off just using a boring atomic64 for this. It's simpler and performs better. Also, while we're at it, we set the handshake stamp both before and after the calculations, in case the calculations block for a really long time waiting for the RNG to initialize. Otherwise it's possible that when the RNG finally initializes, two handshakes are sent back to back, which isn't sensible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-08-03peer: ensure destruction doesn't raceJason A. Donenfeld
Completely rework peer removal to ensure peers don't jump between contexts and create races. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-08-01queueing: keep reference to peer after setting atomic state bitJason A. Donenfeld
After we atomic_set, the peer is allowed to be freed, which means if we want to continue to reference it, we need to bump the reference count. This was introduced a few commits ago by b713ab0e when implementing some simplification suggestions. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-07-31peer: simplify rcu reference countsJason A. Donenfeld
Use RCU reference counts only when we must, and otherwise use a more reasonably named function. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-07-08receive: use NAPI on the receive pathJonathan Neuschäfer
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> [Jason: fixed up the flushing of the rx_queue in peer_remove] Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-06-23global: use fast boottime instead of normal boottimeJason A. Donenfeld
Generally if we're inaccurate by a few nanoseconds, it doesn't matter. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-06-23global: use ktime boottime instead of jiffiesJason A. Donenfeld
Since this is a network protocol, expirations need to be accounted for, even across system suspend. On real systems, this isn't a problem, since we're clearing all keys before suspend. But on Android, where we don't do that, this is something of a problem. So, we switch to using boottime instead of jiffies. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-01-03global: year bumpJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-12-09global: add SPDX tags to all filesGreg Kroah-Hartman
It's good to have SPDX identifiers in all files as the Linux kernel developers are working to add these identifiers to all files. Update all files with the correct SPDX license identifier based on the license text of the project or based on the license in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Modified-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-11-22global: switch from timeval to timespecJason A. Donenfeld
This gets us nanoseconds instead of microseconds, which is better, and we can do this pretty much without freaking out existing userspace, which doesn't actually make use of the nano/micro seconds field: zx2c4@thinkpad ~ $ cat a.c void main() { puts(sizeof(struct timeval) == sizeof(struct timespec) ? "success" : "failure"); } zx2c4@thinkpad ~ $ gcc a.c -m64 && ./a.out success zx2c4@thinkpad ~ $ gcc a.c -m32 && ./a.out success This doesn't solve y2038 problem, but timespec64 isn't yet a thing in userspace. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-10-31peer: store total number of peers instead of iteratingJason A. Donenfeld
This is faster, since it means adding a new peer is O(1) instead of O(n). It's also safe to do because we're holding the device_update_lock on both the ++ and the --. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-10-31peer: get rid of peer_for_each magicJason A. Donenfeld
Since the peer list is protected by the device_update_lock, and since items are removed from the peer list before putting their final reference, we don't actually need to take a reference when iterating. This allows us to simplify the macro considerably. Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-10-31global: accept decent check_patch.pl suggestionsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-10-03global: use _WG prefix for include guardsJason A. Donenfeld
Suggested-by: Sultan Alsawaf <sultanxda@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-09-19peer: rearrange structsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-09-18queue: entirely rework parallel systemJason A. Donenfeld
This removes our dependency on padata and moves to a different mode of multiprocessing that is more efficient. This began as Samuel Holland's GSoC project and was gradually reworked/redesigned/rebased into this present commit, which is a combination of his initial contribution and my subsequent rewriting and redesigning. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-08-23socket: improve reply-to-src algorithmJason A. Donenfeld
We store the destination IP of incoming packets as the source IP of outgoing packets. When we send outgoing packets, we then ask the routing table for which interface to use and which source address, given our inputs of the destination address and a suggested source address. This all is good and fine, since it means we'll successfully reply using the correct source address, correlating with the destination address for incoming packets. However, what happens when default routes change? Or when interface IP addresses change? Prior to this commit, after getting the response from the routing table of the source address, destination address, and interface, we would then make sure that the source address actually belonged to the outbound interface. If it didn't, we'd reset our source address to zero and re-ask the routing table, in which case the routing table would then give us the default IP address for sending that packet. This worked mostly fine for most purposes, but there was a problem: what if WireGuard legitimately accepted an inbound packet on a default interface using an IP of another interface? In this case, falling back to asking for the default source IP was not a good strategy, since it'd nearly always mean we'd fail to reply using the right source. So, this commit changes the algorithm slightly. Rather than falling back to using the default IP if the preferred source IP doesn't belong to the outbound interface, we have two checks: we make sure that the source IP address belongs to _some_ interface on the system, no matter which one (so long as it's within the network namespace), and we check whether or not the interface of an incoming packet matches the returned interface for the outbound traffic. If both these conditions are true, then we proceed with using this source IP address. If not, we fall back to the default IP address. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-08-04timers: rename confusingly named functions and variablesJason A. Donenfeld
Suggested-by: Mathias Hall-Andersen <mathias@hall-andersen.dk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-05-30peer: use iterator macro instead of callbackJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-05-17noise: redesign preshared key modeJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-02-13compat: backport siphash & dst_cache from mainlineJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-02-07timers: use simpler uninit sync techniqueJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2017-01-10Update copyrightJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-12-13peer: don't use sockaddr_storage to reduce memory usageJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-12-11global: move to consistent use of uN instead of uintN_t for kernel codeJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-21headers: cleanup noticesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-15socket: keep track of src address in sending packetsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-06data: we care about per-peer, not per-device, inflight encryptionsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-05c89: the static keyword is okay in c99, but not in c89Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-04compat: stub out dst_cache for old kernelsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-04socket: use dst_cache instead of handrolled cacheJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-11-03timers: take reference like a lookup tableJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-10-19timers: only have initiator rekeyJason A. Donenfeld
If it's time to rekey, and the responder sends a message, the initator will begin the rekeying when sending his response message. In the worst case, this response message will actually just be the keepalive. This generally works well, with the one edge case of the message arriving less than 10 seconds before key expiration, in which the keepalive is not sufficient. In this case, we simply rehandshake immediately. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-10-19timers: always delay handshakes for responderJason A. Donenfeld
With the prior behavior, when sending a packet, we checked to see if it was about time to start a new handshake, and if we were past a certain time, we started it. For the responder, we made that time a bit further in the future than for the initiator, to prevent the thundering herd problem of them both starting at the same time. However, this was flawed. If both parties stopped communicating after 2.2 minutes, and then one party decided to initiate a TCP connection before the 3 minute mark, the currently open session would be used. However, because it was after the 2.2 minute mark, both peers would try to initiate a handshake upon sending their first packet. The errant flow was as follows: 1. Peer A sends SYN. 2. Peer A sees that his key is getting old and initiates new handshake. 3. Peer B receives SYN and sends ACK. 4. Peer B sees that his key is getting old and initiates new handshake. Since these events happened after the 2.2 minute mark, there's no delay between handshake initiations, and problems begin. The new behavior is changed to: 1. Peer A sends SYN. 2. Peer A sees that his key is getting old and initiates new handshake. 3. Peer B receives SYN and sends ACK. 4. Peer B sees that his key is getting old and schedules a delayed handshake for 12.5 seconds in the future. 5. Peer B receives handshake initiation and cancels scheduled handshake. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-10-05send: requeue jobs for later if padata is fullJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-09-29Rework headers and includesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-08-02c: specify static array size in function paramsJason A. Donenfeld
The C standard states: A declaration of a parameter as ``array of type'' shall be adjusted to ``qualified pointer to type'', where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression. By changing void func(int array[4]) to void func(int array[static 4]), we automatically get the compiler checking argument sizes for us, which is quite nice. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-07-10persistent keepalive: use unsigned long to avoid multiplication in hotpathJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2016-07-08persistent keepalive: add kernel mechanismJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>