summaryrefslogtreecommitdiffhomepage
path: root/src/crypto/zinc
AgeCommit message (Collapse)Author
2021-08-08crypto: curve25519-x86_64: solve register constraints with reserved registersHEADmasterMathias Krause
The register constraints for the inline assembly in fsqr() and fsqr2() are pretty tight on what the compiler may assign to the remaining three register variables. The clobber list only allows the following to be used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left so the compiler rightfully complains about impossible constraints. Provide alternatives that'll allow a memory reference for 'out' to solve the allocation constraint dilemma for this configuration. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-07compat: remove unused version.h headersJason A. Donenfeld
We don't need this in all files, and it just complicates things. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-04-14crypto: do not export symbolsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-02-19curve25519-x86_64: avoid use of r12Jason A. Donenfeld
This causes problems with RAP and KERNEXEC for PaX, as r12 is a reserved register. It also leads to a more compact instruction encoding, saving about 100 cycles. Suggested-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-02-06chacha20poly1305: defensively protect against large inputsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-01-21curve25519: x86_64: replace with formally verified implementationJason A. Donenfeld
This comes from INRIA's HACL*/Vale. It implements the same algorithm and implementation strategy as the code it replaces, only this code has been formally verified, sans the base point multiplication, which uses code similar to prior, only it uses the formally verified field arithmetic alongside reproducable ladder generation steps. This doesn't have a pure-bmi2 version, which means haswell no longer benefits, but the increased (doubled) code complexity is not worth it for a single generation of chips that's already old. Performance-wise, this is around 1% slower on older microarchitectures, and slightly faster on newer microarchitectures, mainly 10nm ones or backports of 10nm to 14nm. This implementation is "everest" below: Xeon E5-2680 v4 (Broadwell) armfazh: 133340 cycles per call everest: 133436 cycles per call Xeon Gold 5120 (Sky Lake Server) armfazh: 112636 cycles per call everest: 113906 cycles per call Core i5-6300U (Sky Lake Client) armfazh: 116810 cycles per call everest: 117916 cycles per call Core i7-7600U (Kaby Lake) armfazh: 119523 cycles per call everest: 119040 cycles per call Core i7-8750H (Coffee Lake) armfazh: 113914 cycles per call everest: 113650 cycles per call Core i9-9880H (Coffee Lake Refresh) armfazh: 112616 cycles per call everest: 114082 cycles per call Core i3-8121U (Cannon Lake) armfazh: 113202 cycles per call everest: 111382 cycles per call Core i7-8265U (Whiskey Lake) armfazh: 127307 cycles per call everest: 127697 cycles per call Core i7-8550U (Kaby Lake Refresh) armfazh: 127522 cycles per call everest: 127083 cycles per call Xeon Platinum 8275CL (Cascade Lake) armfazh: 114380 cycles per call everest: 114656 cycles per call Achieving these kind of results with formally verified code is quite remarkable, especialy considering that performance is favorable for newer chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-12global: fix up spellingJosh Soref
Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-06chacha20poly1305: double check the sgmiter logic with testJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-05crypto: use new assembler macros for 5.5Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-12-05chacha20poly1305: port to sgmitter for 5.5Jason A. Donenfeld
I'm not totally comfortable with these changes yet, and it'll require some more scrutiny. But it's a start. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-06-03blake2s: spacingJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-06-02curve25519: not all linkers support bmi2 and adxJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-31blake2s: add ssse3 to nobsJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-31blake2s: do not use xgetbv for ssse3 detectionJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29zinc: update copyrightJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29blake2s: shorten ssse3 loopSamuel Neves
This (mostly) preserves the performance (as measured on Haswell and *lake) of last commit, but it drastically reduces code size. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29blake2s,chacha: latency tweakSamuel Neves
In every odd-numbered round, instead of operating over the state x00 x01 x02 x03 x05 x06 x07 x04 x10 x11 x08 x09 x15 x12 x13 x14 we operate over the rotated state x03 x00 x01 x02 x04 x05 x06 x07 x09 x10 x11 x08 x14 x15 x12 x13 The advantage here is that this requires no changes to the 'x04 x05 x06 x07' row, which is in the critical path. This results in a noticeable latency improvement of roughly R cycles, for R diagonal rounds in the primitive. In the case of BLAKE2s, which I also moved from requiring AVX to only requiring SSSE3, we save approximately 30 cycles per compression function call on Haswell and Skylake. In other words, this is an improvement of ~0.6 cpb. This idea was pointed out to me by Shunsuke Shimizu, though it appears to have been around for longer. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-05-29zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-03-27blake2s: remove outlen parameter from finalJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-03-27blake2s: simplifySamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-02-03noise: store clamped key instead of raw keyJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-02-03chacha20poly1305: permit unaligned strides on certain platformsJason A. Donenfeld
The map allocations required to fix this are mostly slower than unaligned paths. Reported-by: Louis Sautier <sbraz@gentoo.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-23global: normalize -> clampJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2019-01-07global: update copyrightJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-12-07chacha20: do not define unused asm functionJason A. Donenfeld
This causes RAP to be unhappy, and we're not using it anyway. Reported-by: Ivan J. <parazyd@dyne.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-12-07chacha20,poly1305: simplify perlasm fancinessJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-19chacha20,poly1305: do not use xlateJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-17poly1305: make frame pointers for auxiliary callsSamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15chacha20,poly1305: don't do compiler testing in generator and remove xor helperJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15poly1305: cleanup leftover debugging changesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15poly1305: only export neon symbols when in useJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15chacha20,poly1305: fix up for win64Samuel Neves
These don't help us, but it is important to keep this working for when it's re-added to cryptogams. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15perlasm: avoid rep retJason A. Donenfeld
The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret". We replace this by "ret". "rep ret" was meant to help with AMD K8 chips, cf. http://repzret.org/p/repzret. It makes no sense to continue to use this kludge for code that won't even run on ancient AMD chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15poly1305: specialize to wireguardJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15chacha20: specialize to wireguardJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15perlasm: cleanup whitespaceJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-15poly1305: adjust to kernelSamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: cleaner function declarationsSamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: normalize namesSamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: fixup win64 stack offsetsSamuel Neves
We don't need to do this for kernel purposes, but it's polite to leave things unbroken. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: simplify stack unwinding on ChaCha20_ctr32Samuel Neves
objtool did not quite understand the stack arithmetic employed here. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: use DRAP idiomSamuel Neves
This effectively means swapping the usage of %r9 and %r10 globally. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: add hchacha_ssse3Samuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20: begin adapting to kernel settingSamuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20,poly1305: switch to perlasm originals on x86_64Samuel Neves
Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20,poly1305: use CONFIG_KERNEL_MODE_NEON in .pl on armJason A. Donenfeld
While Andy is right to desire a separation between compiler defines and project defines, there are simply too many odd kernel configurations and we require testing for CONFIG_KERNEL_MODE_NEON. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-14chacha20,poly1305: switch to perlasm originals on mips and armJason A. Donenfeld
We also separate out Eric Biggers' Cortex A7 implementation into its own file. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-11-13global: various formatting tweeksJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-27curve25519-x86_64: this was relicensed to BSD-3-Clause upstreamJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2018-10-27poly1305-donna64: mark large constants as ULLJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>