wireguard-linux-compat - WireGuard Linux compat

Age	Commit message (Collapse)	Author
2021-08-08	crypto: curve25519-x86_64: solve register constraints with reserved registersHEAD master	Mathias Krause
	The register constraints for the inline assembly in fsqr() and fsqr2() are pretty tight on what the compiler may assign to the remaining three register variables. The clobber list only allows the following to be used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left so the compiler rightfully complains about impossible constraints. Provide alternatives that'll allow a memory reference for 'out' to solve the allocation constraint dilemma for this configuration. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-07	compat: remove unused version.h headers	Jason A. Donenfeld
	We don't need this in all files, and it just complicates things. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-04-14	crypto: do not export symbols	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-02-19	curve25519-x86_64: avoid use of r12	Jason A. Donenfeld
	This causes problems with RAP and KERNEXEC for PaX, as r12 is a reserved register. It also leads to a more compact instruction encoding, saving about 100 cycles. Suggested-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-02-06	chacha20poly1305: defensively protect against large inputs	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-01-21	curve25519: x86_64: replace with formally verified implementation	Jason A. Donenfeld
	This comes from INRIA's HACL*/Vale. It implements the same algorithm and implementation strategy as the code it replaces, only this code has been formally verified, sans the base point multiplication, which uses code similar to prior, only it uses the formally verified field arithmetic alongside reproducable ladder generation steps. This doesn't have a pure-bmi2 version, which means haswell no longer benefits, but the increased (doubled) code complexity is not worth it for a single generation of chips that's already old. Performance-wise, this is around 1% slower on older microarchitectures, and slightly faster on newer microarchitectures, mainly 10nm ones or backports of 10nm to 14nm. This implementation is "everest" below: Xeon E5-2680 v4 (Broadwell) armfazh: 133340 cycles per call everest: 133436 cycles per call Xeon Gold 5120 (Sky Lake Server) armfazh: 112636 cycles per call everest: 113906 cycles per call Core i5-6300U (Sky Lake Client) armfazh: 116810 cycles per call everest: 117916 cycles per call Core i7-7600U (Kaby Lake) armfazh: 119523 cycles per call everest: 119040 cycles per call Core i7-8750H (Coffee Lake) armfazh: 113914 cycles per call everest: 113650 cycles per call Core i9-9880H (Coffee Lake Refresh) armfazh: 112616 cycles per call everest: 114082 cycles per call Core i3-8121U (Cannon Lake) armfazh: 113202 cycles per call everest: 111382 cycles per call Core i7-8265U (Whiskey Lake) armfazh: 127307 cycles per call everest: 127697 cycles per call Core i7-8550U (Kaby Lake Refresh) armfazh: 127522 cycles per call everest: 127083 cycles per call Xeon Platinum 8275CL (Cascade Lake) armfazh: 114380 cycles per call everest: 114656 cycles per call Achieving these kind of results with formally verified code is quite remarkable, especialy considering that performance is favorable for newer chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-12	global: fix up spelling	Josh Soref
	Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-06	chacha20poly1305: double check the sgmiter logic with test	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-05	crypto: use new assembler macros for 5.5	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-05	chacha20poly1305: port to sgmitter for 5.5	Jason A. Donenfeld
	I'm not totally comfortable with these changes yet, and it'll require some more scrutiny. But it's a start. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-06-03	blake2s: spacing	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-06-02	curve25519: not all linkers support bmi2 and adx	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-31	blake2s: add ssse3 to nobs	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-31	blake2s: do not use xgetbv for ssse3 detection	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29	zinc: update copyright	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29	blake2s: shorten ssse3 loop	Samuel Neves
	This (mostly) preserves the performance (as measured on Haswell and *lake) of last commit, but it drastically reduces code size. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29	blake2s,chacha: latency tweak	Samuel Neves
	In every odd-numbered round, instead of operating over the state x00 x01 x02 x03 x05 x06 x07 x04 x10 x11 x08 x09 x15 x12 x13 x14 we operate over the rotated state x03 x00 x01 x02 x04 x05 x06 x07 x09 x10 x11 x08 x14 x15 x12 x13 The advantage here is that this requires no changes to the 'x04 x05 x06 x07' row, which is in the critical path. This results in a noticeable latency improvement of roughly R cycles, for R diagonal rounds in the primitive. In the case of BLAKE2s, which I also moved from requiring AVX to only requiring SSSE3, we save approximately 30 cycles per compression function call on Haswell and Skylake. In other words, this is an improvement of ~0.6 cpb. This idea was pointed out to me by Shunsuke Shimizu, though it appears to have been around for longer. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29	zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-03-27	blake2s: remove outlen parameter from final	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-03-27	blake2s: simplify	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-02-03	noise: store clamped key instead of raw key	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-02-03	chacha20poly1305: permit unaligned strides on certain platforms	Jason A. Donenfeld
	The map allocations required to fix this are mostly slower than unaligned paths. Reported-by: Louis Sautier <sbraz@gentoo.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-23	global: normalize -> clamp	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-07	global: update copyright	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-12-07	chacha20: do not define unused asm function	Jason A. Donenfeld
	This causes RAP to be unhappy, and we're not using it anyway. Reported-by: Ivan J. <parazyd@dyne.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-12-07	chacha20,poly1305: simplify perlasm fanciness	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-19	chacha20,poly1305: do not use xlate	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-17	poly1305: make frame pointers for auxiliary calls	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	chacha20,poly1305: don't do compiler testing in generator and remove xor helper	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	poly1305: cleanup leftover debugging changes	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	poly1305: only export neon symbols when in use	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	chacha20,poly1305: fix up for win64	Samuel Neves
	These don't help us, but it is important to keep this working for when it's re-added to cryptogams. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	perlasm: avoid rep ret	Jason A. Donenfeld
	The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret". We replace this by "ret". "rep ret" was meant to help with AMD K8 chips, cf. http://repzret.org/p/repzret. It makes no sense to continue to use this kludge for code that won't even run on ancient AMD chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	poly1305: specialize to wireguard	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	chacha20: specialize to wireguard	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	perlasm: cleanup whitespace	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15	poly1305: adjust to kernel	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: cleaner function declarations	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: normalize names	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: fixup win64 stack offsets	Samuel Neves
	We don't need to do this for kernel purposes, but it's polite to leave things unbroken. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: simplify stack unwinding on ChaCha20_ctr32	Samuel Neves
	objtool did not quite understand the stack arithmetic employed here. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: use DRAP idiom	Samuel Neves
	This effectively means swapping the usage of %r9 and %r10 globally. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: add hchacha_ssse3	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20: begin adapting to kernel setting	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20,poly1305: switch to perlasm originals on x86_64	Samuel Neves
	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20,poly1305: use CONFIG_KERNEL_MODE_NEON in .pl on arm	Jason A. Donenfeld
	While Andy is right to desire a separation between compiler defines and project defines, there are simply too many odd kernel configurations and we require testing for CONFIG_KERNEL_MODE_NEON. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14	chacha20,poly1305: switch to perlasm originals on mips and arm	Jason A. Donenfeld
	We also separate out Eric Biggers' Cortex A7 implementation into its own file. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-13	global: various formatting tweeks	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-27	curve25519-x86_64: this was relicensed to BSD-3-Clause upstream	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-27	poly1305-donna64: mark large constants as ULL	Jason A. Donenfeld
	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>