go/src/crypto/cipher
Vlad Krasnov 4f1f503373 crypto/aes: implement AES-GCM AEAD for arm64
Use the dedicated AES* and PMULL* instructions to accelerate AES-GCM

name              old time/op    new time/op      delta
AESGCMSeal1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.66%  (p=0.000 n=9+10)
AESGCMOpen1K-46     12.1µs ± 0%       0.9µs ± 0%    -92.43%  (p=0.000 n=10+10)
AESGCMSign8K-46     58.6µs ± 0%       2.1µs ± 0%    -96.41%  (p=0.000 n=9+8)
AESGCMSeal8K-46     92.8µs ± 0%       5.7µs ± 0%    -93.86%  (p=0.000 n=9+9)
AESGCMOpen8K-46     92.9µs ± 0%       5.7µs ± 0%    -93.84%  (p=0.000 n=8+9)

name              old speed      new speed        delta
AESGCMSeal1K-46   84.7MB/s ± 0%  1153.4MB/s ± 0%  +1262.21%  (p=0.000 n=9+10)
AESGCMOpen1K-46   84.4MB/s ± 0%  1115.2MB/s ± 0%  +1220.53%  (p=0.000 n=10+10)
AESGCMSign8K-46    140MB/s ± 0%    3894MB/s ± 0%  +2687.50%  (p=0.000 n=9+10)
AESGCMSeal8K-46   88.2MB/s ± 0%  1437.5MB/s ± 0%  +1529.30%  (p=0.000 n=9+9)
AESGCMOpen8K-46   88.2MB/s ± 0%  1430.5MB/s ± 0%  +1522.01%  (p=0.000 n=8+9)

This change mirrors the current amd64 implementation, and provides optimal performance
on a range of arm64 processors including Centriq 2400 and Apple A12. By and large it is
implicitly tested by the robustness of the already existing amd64 implementation.

The implementation interleaves GHASH with CTR mode to achieve the highest possible
throughput, it also aggregates GHASH with a factor of 8, to decrease the cost of the
reduction step.

Even thought there is a significant amount of assembly, the code reuses the go
code for the amd64 implementation, so there is little additional go code.

Since AES-GCM is critical for performance of all web servers, this change is
required to level the playfield for arm64 CPUs, where amd64 currently enjoys an
unfair advantage.

Ideally both amd64 and arm64 codepaths could be replaced by hypothetical AES and
CLMUL intrinsics, with a few additional vector instructions.

Fixes #18498
Fixes #19840

Change-Id: Icc57b868cd1f67ac695c1ac163a8e215f74c7910
Reviewed-on: https://go-review.googlesource.com/107298
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-20 03:30:04 +00:00
..
benchmark_test.go
cbc.go crypto: panic on illegal input and output overlap 2018-06-19 21:06:50 +00:00
cbc_aes_test.go
cfb.go crypto: panic on illegal input and output overlap 2018-06-19 21:06:50 +00:00
cfb_test.go all: update comment URLs from HTTP to HTTPS, where possible 2018-06-01 21:52:00 +00:00
cipher.go all: update comment URLs from HTTP to HTTPS, where possible 2018-06-01 21:52:00 +00:00
cipher_test.go
common_test.go
ctr.go crypto: panic on illegal input and output overlap 2018-06-19 21:06:50 +00:00
ctr_aes_test.go
ctr_test.go
example_test.go crypto/cipher: use raw bytes for keys in docs 2017-11-16 00:40:00 +00:00
gcm.go crypto: panic on illegal input and output overlap 2018-06-19 21:06:50 +00:00
gcm_test.go crypto/aes: implement AES-GCM AEAD for arm64 2018-07-20 03:30:04 +00:00
io.go all: join some chained ifs to unindent code 2017-08-29 20:57:41 +00:00
ofb.go crypto: panic on illegal input and output overlap 2018-06-19 21:06:50 +00:00
ofb_test.go
xor.go
xor_test.go