go/src/internal/bytealg
Mauri de Souza Meneguzzo 66b776b025 runtime: short path for equal pointers in arm64 memequal
If memequal is invoked with the same pointers as arguments it ends up
comparing the whole memory contents, instead of just comparing the pointers.

This effectively makes an operation that could be O(1) into O(n). All the
other architectures already have this optimization in place. For
instance, arm64 also have it, in memequal_varlen.

Such optimization is very specific, one case that it will probably benefit is
programs that rely heavily on interning of strings.

goos: darwin
goarch: arm64
pkg: bytes
                 │      old.txt       │               new.txt                │
                 │       sec/op       │    sec/op     vs base                │
Equal/same/1-8           2.678n ± ∞ ¹   2.400n ± ∞ ¹   -10.38% (p=0.008 n=5)
Equal/same/6-8           3.267n ± ∞ ¹   2.431n ± ∞ ¹   -25.59% (p=0.008 n=5)
Equal/same/9-8           2.981n ± ∞ ¹   2.385n ± ∞ ¹   -19.99% (p=0.008 n=5)
Equal/same/15-8          2.974n ± ∞ ¹   2.390n ± ∞ ¹   -19.64% (p=0.008 n=5)
Equal/same/16-8          2.983n ± ∞ ¹   2.380n ± ∞ ¹   -20.21% (p=0.008 n=5)
Equal/same/20-8          3.567n ± ∞ ¹   2.384n ± ∞ ¹   -33.17% (p=0.008 n=5)
Equal/same/32-8          3.568n ± ∞ ¹   2.385n ± ∞ ¹   -33.16% (p=0.008 n=5)
Equal/same/4K-8         78.040n ± ∞ ¹   2.378n ± ∞ ¹   -96.95% (p=0.008 n=5)
Equal/same/4M-8      78713.000n ± ∞ ¹   2.385n ± ∞ ¹  -100.00% (p=0.008 n=5)
Equal/same/64M-8   1348095.000n ± ∞ ¹   2.381n ± ∞ ¹  -100.00% (p=0.008 n=5)
geomean                  43.52n         2.390n         -94.51%
¹ need >= 6 samples for confidence interval at level 0.95

                 │    old.txt    │                     new.txt                      │
                 │      B/s      │         B/s          vs base                     │
Equal/same/1-8     356.1Mi ± ∞ ¹         397.3Mi ± ∞ ¹        +11.57% (p=0.008 n=5)
Equal/same/6-8     1.711Gi ± ∞ ¹         2.298Gi ± ∞ ¹        +34.35% (p=0.008 n=5)
Equal/same/9-8     2.812Gi ± ∞ ¹         3.515Gi ± ∞ ¹        +24.99% (p=0.008 n=5)
Equal/same/15-8    4.698Gi ± ∞ ¹         5.844Gi ± ∞ ¹        +24.41% (p=0.008 n=5)
Equal/same/16-8    4.995Gi ± ∞ ¹         6.260Gi ± ∞ ¹        +25.34% (p=0.008 n=5)
Equal/same/20-8    5.222Gi ± ∞ ¹         7.814Gi ± ∞ ¹        +49.63% (p=0.008 n=5)
Equal/same/32-8    8.353Gi ± ∞ ¹        12.496Gi ± ∞ ¹        +49.59% (p=0.008 n=5)
Equal/same/4K-8    48.88Gi ± ∞ ¹       1603.96Gi ± ∞ ¹      +3181.17% (p=0.008 n=5)
Equal/same/4M-8    49.63Gi ± ∞ ¹    1637911.85Gi ± ∞ ¹   +3300381.91% (p=0.008 n=5)
Equal/same/64M-8   46.36Gi ± ∞ ¹   26253069.97Gi ± ∞ ¹  +56626517.99% (p=0.008 n=5)
geomean            6.737Gi               122.7Gi            +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95

Fixes #64381

Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852
GitHub-Last-Rev: ae8189fafb
GitHub-Pull-Request: golang/go#64419
Reviewed-on: https://go-review.googlesource.com/c/go/+/545416
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2024-01-24 16:07:25 +00:00
..
bytealg.go bytes,internal/bytealg: add func bytealg.LastIndexRabinKarp 2023-11-01 19:02:57 +00:00
compare_386.s
compare_amd64.s runtime: remove dead code and unnecessary checks for amd64 2022-08-18 17:17:01 +00:00
compare_arm.s
compare_arm64.s all: delete ARM64 non-register ABI fallback path 2022-03-18 18:26:13 +00:00
compare_generic.go internal/bytealg: support basic byte operation on loong64 2022-05-17 19:55:37 +00:00
compare_loong64.s internal/bytealg: add regABI support in bytealg functions on loong64 2023-11-21 19:21:41 +00:00
compare_mips64x.s
compare_mipsx.s
compare_native.go internal/bytealg: support basic byte operation on loong64 2022-05-17 19:55:37 +00:00
compare_ppc64x.s internal/bytealg: improve compare on Power10/PPC64 2023-08-28 17:33:20 +00:00
compare_riscv64.s all: clean up addition of constants in riscv64 assembly 2023-11-09 13:57:06 +00:00
compare_s390x.s
compare_wasm.s
count_amd64.s internal/bytealg: process two AVX2 lanes per Count loop 2023-10-06 20:54:43 +00:00
count_arm.s
count_arm64.s internal/bytealg: optimize Count/CountString in arm64 2023-10-31 17:00:27 +00:00
count_generic.go
count_native.go
count_ppc64x.s internal/bytealg: optimize Count/CountString for PPC64/Power10 2023-08-14 20:30:44 +00:00
count_riscv64.s internal/bytealg: optimize Count with PCALIGN in riscv64 2023-11-22 01:59:01 +00:00
count_s390x.s
equal_386.s
equal_amd64.s internal/bytealg: use PCALIGN in memequal 2023-11-17 16:34:40 +00:00
equal_arm.s
equal_arm64.s runtime: short path for equal pointers in arm64 memequal 2024-01-24 16:07:25 +00:00
equal_generic.go
equal_loong64.s internal/bytealg: add regABI support in bytealg functions on loong64 2023-11-21 19:21:41 +00:00
equal_mips64x.s
equal_mipsx.s
equal_native.go
equal_ppc64x.s cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment 2023-04-21 16:47:45 +00:00
equal_riscv64.s all: clean up addition of constants in riscv64 assembly 2023-11-09 13:57:06 +00:00
equal_s390x.s
equal_wasm.s
index_amd64.go
index_amd64.s internal/bytealg: optimize Index/IndexString in amd64 2023-08-07 00:20:48 +00:00
index_arm64.go
index_arm64.s
index_generic.go
index_native.go all: move //go: function directives directly above functions 2023-03-02 22:56:35 +00:00
index_ppc64x.go internal/bytealg: remove aix and linux build tags from ppc64 index code 2023-03-09 05:34:46 +00:00
index_ppc64x.s cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment 2023-04-21 16:47:45 +00:00
index_s390x.go
index_s390x.s
indexbyte_386.s
indexbyte_amd64.s internal/bytealg: optimize indexbyte in amd64 2023-11-01 19:06:01 +00:00
indexbyte_arm.s
indexbyte_arm64.s
indexbyte_generic.go internal/bytealg: use generic IndexByte on plan9/amd64 2023-07-20 17:30:15 +00:00
indexbyte_loong64.s internal/bytealg: add regABI support in bytealg functions on loong64 2023-11-21 19:21:41 +00:00
indexbyte_mips64x.s
indexbyte_mipsx.s
indexbyte_native.go internal/bytealg: use generic IndexByte on plan9/amd64 2023-07-20 17:30:15 +00:00
indexbyte_ppc64x.s internal/bytealg: rewrite indexbytebody on PPC64 2023-04-21 16:10:29 +00:00
indexbyte_riscv64.s all: clean up addition of constants in riscv64 assembly 2023-11-09 13:57:06 +00:00
indexbyte_s390x.s
indexbyte_wasm.s
lastindexbyte_generic.go internal/bytealg: add generic LastIndexByte{,String} 2023-08-25 15:08:28 +00:00