Age | Commit message (Collapse) | Author |
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In every odd-numbered round, instead of operating over the state
x00 x01 x02 x03
x05 x06 x07 x04
x10 x11 x08 x09
x15 x12 x13 x14
we operate over the rotated state
x03 x00 x01 x02
x04 x05 x06 x07
x09 x10 x11 x08
x14 x15 x12 x13
The advantage here is that this requires no changes to the
'x04 x05 x06 x07' row, which is in the critical path. This
results in a noticeable latency improvement of roughly R
cycles, for R diagonal rounds in the primitive.
In the case of BLAKE2s, which I also moved from requiring AVX
to only requiring SSSE3, we save approximately 30 cycles per
compression function call on Haswell and Skylake. In other
words, this is an improvement of ~0.6 cpb.
This idea was pointed out to me by Shunsuke Shimizu, though
it appears to have been around for longer.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This causes RAP to be unhappy, and we're not using it anyway.
Reported-by: Ivan J. <parazyd@dyne.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
These don't help us, but it is important to keep this working for when
it's re-added to cryptogams.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We don't need to do this for kernel purposes, but it's polite to leave things unbroken.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
objtool did not quite understand the stack arithmetic employed here.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This effectively means swapping the usage of %r9 and %r10 globally.
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
While Andy is right to desire a separation between compiler defines and
project defines, there are simply too many odd kernel configurations and
we require testing for CONFIG_KERNEL_MODE_NEON.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
We also separate out Eric Biggers' Cortex A7 implementation into its own
file.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This is done by 259 other files in the kernel tree:
linux $ rg '#include.*\.c' -l | wc -l
259
Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This avoids casts and makes counter increments obvious.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
If it's a built-in and initcall fails, it won't be fatal. So we
should at least be loud.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Also we make selftest errors of type err, so that they're obvious in
dmesg.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Andy Polyakov <appro@openssl.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
handling
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It turns out this is faster than NEON on some chips, so it's nice to
have.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since we now set up the block in the generic code, we can rely on that
to use fewer variables and reduce stack pressure within the MIPS code.
This in turn means we have more registers and more uniformity, so we're
able to rewrite quite a bit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This requires some minimal rearranging to make work, but for the most
part as does the right thing, provided we pass it an optimization flag.
Suggested-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Reported-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The kernel has very specific rules correlating file type with comment
type, and also SPDX identifiers can't be merged with other comments.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|