Age | Commit message (Collapse) | Author |
|
When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.
In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.
The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.
We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.
This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.
The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers. We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.
Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This makes `wg show` and `wg showconf` and the like significantly
faster, since we don't have to iterate through every node of the trie
for every single peer. It also makes netlink cursor resumption much less
problematic, since we're just iterating through a list, rather than
having to save a traversal stack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This required a bit of pruning of our christmas trees.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
I understand why this must be done, though I'm not so happy about having
to do it. In some places, it puts us over 80 chars and we have to break
lines up in further ugly ways. And in general, I think this makes things
harder to read. Yet another thing we must do to please upstream.
Maybe this can be replaced in the future by some kind of automatic
module namespacing logic in the linker, or even combined with LTO and
aggressive symbol stripping.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The kernel has very specific rules correlating file type with comment
type, and also SPDX identifiers can't be merged with other comments.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is the worst commit in the whole repo, making the code much less
readable, but so it goes with upstream maintainers.
We are now woefully wrapped at 80 columns.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's good to have SPDX identifiers in all files as the Linux kernel
developers are working to add these identifiers to all files.
Update all files with the correct SPDX license identifier based on the license
text of the project or based on the license in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of the
full boiler plate text.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modified-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Makes it more clear that this _not_ a routing table replacement.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|