mirror of https://github.com/golang/go.git
70 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
f19e400180 |
all: remove more leftover // +build lines
CL 344955 and CL 359476 removed almost all // +build lines, but leaving some assembly files and generating scripts. Also, some files were added with // +build lines after CL 359476 was merged. Remove these or rename files where more appropriate. For #41184 Change-Id: I7eb85a498ed9788b42a636e775f261d755504ffa Reviewed-on: https://go-review.googlesource.com/c/go/+/361480 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> |
|
|
|
f229e7031a |
all: go fix -fix=buildtag std cmd (except for bootstrap deps, vendor)
When these packages are released as part of Go 1.18, Go 1.16 will no longer be supported, so we can remove the +build tags in these files. Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs as defined in src/cmd/dist/buildtool.go, which need to continue to build with Go 1.4 for now. Also reverted src/vendor and src/cmd/vendor, which will need to be updated in their own repos first. Manual changes in runtime/pprof/mprof_test.go to adjust line numbers. For #41184. Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e Reviewed-on: https://go-review.googlesource.com/c/go/+/344955 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> |
|
|
|
6ec9a1da2d |
internal/bytealg: fix Separator length check for Index/ppc64le
Modified condition in the ASM implementation of indexbody to determine if separator length crosses 16 bytes to BGT from BGE to avoid incorrectly crossing a page. Also fixed IndexString to invoke indexbodyp9 when on the POWER9 platform Change-Id: I0602a797cc75287990eea1972e9e473744f6f5a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/356849 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Keith Randall <khr@golang.org> |
|
|
|
6c3cd5d2eb |
internal/bytealg: port bytes.Index and bytes.Count to reg ABI on ppc64x
This change adds support for the reg ABI to the Index and Count functions for ppc64/ppc64le. Most Index and Count benchmarks show improvement in performance on POWER9 with this change. Similar numbers observed on POWER8 and POWER10. name old time/op new time/op delta Index/32 71.0ns ± 0% 67.9ns ± 0% -4.42% (p=0.001 n=7+6) IndexEasy/10 17.5ns ± 0% 17.2ns ± 0% -1.30% (p=0.001 n=7+7) name old time/op new time/op delta Count/10 26.6ns ± 0% 25.0ns ± 1% -6.02% (p=0.001 n=7+7) Count/32 78.6ns ± 0% 74.7ns ± 0% -4.97% (p=0.001 n=7+7) Count/4K 5.03µs ± 0% 5.03µs ± 0% -0.07% (p=0.000 n=6+7) CountEasy/10 26.9ns ± 0% 25.2ns ± 1% -6.31% (p=0.001 n=7+7) CountSingle/32 11.8ns ± 0% 9.9ns ± 0% -15.70% (p=0.002 n=6+6) Change-Id: Ibd146c04f8107291c55f9e6100b8264dfccc41ae Reviewed-on: https://go-review.googlesource.com/c/go/+/355509 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> |
|
|
|
b8a601756a |
internal/bytealg: port bytealg functions to reg ABI on ppc64x
This adds support for the reg ABI to the bytes functions for ppc64/ppc64le. These are initially under control of the GOEXPERIMENT macro until all changes are in. Change-Id: Id82f31056af8caa8541e27c6735f6b815a5dbf5a Reviewed-on: https://go-review.googlesource.com/c/go/+/351190 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
8157960d7f |
all: replace runtime SSE2 detection with GO386 setting
When GO386=sse2 we can assume sse2 to be present without a runtime check. If GO386=softfloat is set we can avoid the usage of SSE2 even if detected. This might cause a memcpy, memclr and bytealg slowdown of Go binaries compiled with softfloat on machines that support SSE2. Such setups are rare and should use GO386=sse2 instead if performance matters. On targets that support SSE2 we avoid the runtime overhead of dynamic cpu feature dispatch. The removal of runtime sse2 checks also allows to simplify internal/cpu further by removing handling of the required feature option as a followup after this CL. Change-Id: I90a853a8853a405cb665497c6d1a86556947ba17 Reviewed-on: https://go-review.googlesource.com/c/go/+/344350 Trust: Martin Möhrmann <martin@golang.org> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
d7d4f28a06 |
[dev.typeparams] runtime, internal/bytealg: remove regabi fallback code on AMD64
As we commit to always enabling register ABI on AMD64, remove the fallback code. Change-Id: I30556858ba4bac367495fa94f6a8682ecd771196 Reviewed-on: https://go-review.googlesource.com/c/go/+/341152 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
4d6f9d60cf |
[dev.typeparams] all: merge master (785a8f6) into dev.typeparams
- test/run.go CL 328050 added fixedbugs/issue46749.go to -G=3 excluded files list Merge List: + 2021-06-16 |
|
|
|
abc56fd1a0 |
internal/bytealg: remove duplicate go:build line
Change-Id: I6b71bf468b9544820829f02e320673f5edd785fa
GitHub-Last-Rev:
|
|
|
|
5a40fab19f |
[dev.typeparams] runtime, internal/bytealg: port performance-critical functions to register ABI on ARM64
This CL ports a few performance-critical assembly functions to use register arguments directly. This is similar to CL 308931 and CL 310184. Change-Id: I6e30dfff17f76b8578ce8cfd51de21b66610fdb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/324400 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> |
|
|
|
5a008a92e8 |
[dev.typeparams] internal/bytealg: call memeqbody directly in memequal_varlen on ARM64
Currently, memequal_varlen opens up a frame and call memequal, which then tail-calls memeqbody. This CL changes memequal_varlen tail-calls memeqbody directly. This makes it simpler to switch to the register ABI in the next CL. Change-Id: Ia1367c0abb7f4755fe736c404411793fb9e5c04f Reviewed-on: https://go-review.googlesource.com/c/go/+/324399 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
|
|
|
2c76a6f7f8 |
all: add //go:build lines to assembly files
Don't add them to files in vendor and cmd/vendor though. These will be pulled in by updating the respective dependencies. For #41184 Change-Id: Icc57458c9b3033c347124323f33084c85b224c70 Reviewed-on: https://go-review.googlesource.com/c/go/+/319389 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
54af9fd9e6 |
internal/bytealg: add power9 version of bytes index
This adds a power9 version of the bytes.Index function for little endian. Here is the improvement on power9 for some of the Index benchmarks: Index/10 -0.14% Index/32 -3.19% Index/4K -12.66% Index/4M -13.34% Index/64M -13.17% Count/10 -0.59% Count/32 -2.88% Count/4K -12.63% Count/4M -13.35% Count/64M -13.17% IndexHard1 -23.03% IndexHard2 -13.01% IndexHard3 -22.12% IndexHard4 +0.16% CountHard1 -23.02% CountHard2 -13.01% CountHard3 -22.12% IndexPeriodic/IndexPeriodic2 -22.85% IndexPeriodic/IndexPeriodic4 -23.15% Change-Id: Id72353e2771eba2efbb1544d5f0be65f8a9f0433 Reviewed-on: https://go-review.googlesource.com/c/go/+/311380 Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
8009a81f7a |
bytes: add asm implementation for index on ppc64x
This adds an asm implementation of index on ppc64le and ppc64. It results in a significant improvement in some of the benchmarks that use bytes.Index. The implementation is based on a port of the s390x asm implementation. Comments on the design are found with the code. The following improvements occurred on power8: Index/10 70.7ns ± 0% 18.8ns ± 0% -73.4 Index/32 165ns ± 0% 95ns ± 0% -42.6 Index/4K 9.23µs ± 0% 4.91µs ± 0% -46 Index/4M 9.52ms ± 0% 5.10ms ± 0% -46.4 Index/64M 155ms ± 0% 85ms ± 0% -45.1 Count/10 83.0ns ± 0% 32.1ns ± 0% -61.3 Count/32 178ns ± 0% 109ns ± 0% -38.8 Count/4K 9.24µs ± 0% 4.93µs ± 0% -46 Count/4M 9.52ms ± 0% 5.10ms ± 0% -46.4 Count/64M 155ms ± 0% 85ms ± 0% -45.1 IndexHard1 2.36ms ± 0% 0.13ms ± 0% -94.4 IndexHard2 2.36ms ± 0% 1.28ms ± 0% -45.8 IndexHard3 2.36ms ± 0% 1.19ms ± 0% -49.4 IndexHard4 2.36ms ± 0% 2.35ms ± 0% -0.1 CountHard1 2.36ms ± 0% 0.13ms ± 0% -94.4 CountHard2 2.36ms ± 0% 1.28ms ± 0% -45.8 CountHard3 2.36ms ± 0% 1.19ms ± 0% -49.4 IndexPeriodic/IndexPeriodic2 146µs ± 0% 8µs ± 0% -94 IndexPeriodic/IndexPeriodic4 146µs ± 0% 8µs ± 0% -94 Change-Id: I7dd2bb7e278726e27f51825ca8b2f8317d460e60 Reviewed-on: https://go-review.googlesource.com/c/go/+/309730 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
8f4c5068e0 |
internal/bytealg: port more performance-critical functions to ABIInternal
CL 308931 ported several runtime assembly functions to ABIInternal so
that compiler-generated ABIInternal calls don't go through ABI
wrappers, but it missed the runtime assembly functions that are
actually defined in internal/bytealg.
This eliminates the cost of wrappers for the BleveQuery and
GopherLuaKNucleotide benchmarks, but there's still more to do for
Tile38.
0-base 1-wrappers
sec/op sec/op vs base
BleveQuery 6.507 ± 0% 6.477 ± 0% -0.46% (p=0.004 n=20)
GopherLuaKNucleotide 30.39 ± 1% 30.34 ± 0% ~ (p=0.301 n=20)
Tile38IntersectsCircle100kmRequest 1.038m ± 1% 1.080m ± 2% +4.03% (p=0.000 n=20)
For #40724.
Change-Id: I0b722443f684fcb997b1d70802c5ed4b8d8f9829
Reviewed-on: https://go-review.googlesource.com/c/go/+/310184
Trust: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
d4b2638234 |
all: go fmt std cmd (but revert vendor)
Make all our package sources use Go 1.17 gofmt format (adding //go:build lines). Part of //go:build change (#41184). See https://golang.org/design/draft-gobuild Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/294430 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
4f597abe77 |
internal/bytealg: improve mips64x equal on large size
name old time/op new time/op delta Equal/0 9.94ns ± 4% 9.12ns ± 5% -8.26% (p=0.000 n=10+10) Equal/1 24.5ns ± 0% 27.2ns ± 1% +11.22% (p=0.000 n=9+10) Equal/6 28.1ns ± 0% 32.1ns ± 1% +14.20% (p=0.000 n=8+10) Equal/9 37.1ns ± 0% 37.8ns ± 1% +1.95% (p=0.000 n=8+9) Equal/15 47.3ns ± 0% 44.3ns ± 0% -6.34% (p=0.000 n=9+10) Equal/16 42.9ns ± 0% 24.6ns ± 0% -42.66% (p=0.000 n=10+7) Equal/20 44.3ns ± 0% 57.4ns ± 0% +29.57% (p=0.000 n=9+10) Equal/32 63.2ns ± 0% 35.8ns ± 0% -43.35% (p=0.000 n=10+10) Equal/4K 6.49µs ± 0% 0.50µs ± 0% -92.27% (p=0.000 n=10+8) Equal/4M 6.70ms ± 0% 0.48ms ± 0% -92.78% (p=0.000 n=8+10) Equal/64M 110ms ± 0% 8ms ± 0% -92.65% (p=0.000 n=9+9) CompareBytesEqual 36.6ns ± 0% 35.9ns ± 0% -1.83% (p=0.000 n=10+9) name old speed new speed delta Equal/1 40.8MB/s ± 0% 36.7MB/s ± 0% -10.16% (p=0.000 n=10+10) Equal/6 213MB/s ± 0% 187MB/s ± 1% -12.32% (p=0.000 n=10+10) Equal/9 243MB/s ± 0% 238MB/s ± 1% -1.94% (p=0.000 n=9+10) Equal/15 317MB/s ± 0% 339MB/s ± 0% +6.86% (p=0.000 n=9+9) Equal/16 373MB/s ± 0% 651MB/s ± 0% +74.70% (p=0.000 n=8+10) Equal/20 452MB/s ± 0% 348MB/s ± 0% -22.90% (p=0.000 n=8+10) Equal/32 506MB/s ± 0% 893MB/s ± 0% +76.53% (p=0.000 n=10+9) Equal/4K 631MB/s ± 0% 8166MB/s ± 0% +1194.73% (p=0.000 n=10+10) Equal/4M 626MB/s ± 0% 8673MB/s ± 0% +1284.94% (p=0.000 n=8+10) Equal/64M 608MB/s ± 0% 8277MB/s ± 0% +1260.83% (p=0.000 n=9+9) Change-Id: I1cd14ade16390a5097a8d4e9721d5e822fa6218f Reviewed-on: https://go-review.googlesource.com/c/go/+/199597 Run-TryBot: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Meng Zhuo <mzh@golangcn.org> |
|
|
|
19f6422e00 |
internal/bytealg: add assembly implementation of Count/CountString for riscv64
Simple single-byte loop count for now, to be further improved in future CLs. Benchmark on linux/riscv64 (HiFive Unleashed): name old time/op new time/op delta CountSingle/10-4 190ns ± 1% 145ns ± 1% -23.66% (p=0.000 n=10+9) CountSingle/32-4 422ns ± 1% 268ns ± 0% -36.43% (p=0.000 n=10+7) CountSingle/4K-4 43.3µs ± 0% 23.8µs ± 0% -45.09% (p=0.000 n=8+10) CountSingle/4M-4 54.2ms ± 1% 33.3ms ± 1% -38.48% (p=0.000 n=10+10) CountSingle/64M-4 1.52s ± 1% 1.20s ± 1% -21.20% (p=0.000 n=9+9) name old speed new speed delta CountSingle/10-4 52.7MB/s ± 1% 69.1MB/s ± 1% +31.03% (p=0.000 n=10+9) CountSingle/32-4 75.9MB/s ± 1% 119.5MB/s ± 0% +57.34% (p=0.000 n=10+8) CountSingle/4K-4 94.6MB/s ± 0% 172.2MB/s ± 0% +82.10% (p=0.000 n=8+10) CountSingle/4M-4 77.4MB/s ± 1% 125.8MB/s ± 1% +62.54% (p=0.000 n=10+10) CountSingle/64M-4 44.2MB/s ± 1% 56.1MB/s ± 1% +26.91% (p=0.000 n=9+9) Change-Id: I2a6bd50d22d5f598517bb3c5a50066c54280cac5 Reviewed-on: https://go-review.googlesource.com/c/go/+/263541 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> |
|
|
|
e69f6e8393 |
internal/bytealg: fix typo in IndexRabinKarp{,Bytes} godoc
Change-Id: I09ba19e19b195e345a0fe29d542e0d86529b0d31 Reviewed-on: https://go-review.googlesource.com/c/go/+/261359 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
11cdbab9d4 |
bytes, internal/bytealg: fix incorrect IndexString usage
The IndexString implementation in the bytealg package requires that the string passed into it be in the range '2 <= len(s) <= MaxLen' where MaxLen may be any value (including 0). CL 156998 added calls to bytealg.IndexString where MaxLen was not first checked. This led to an illegal instruction on s390x with the vector facility disabled. This CL guards the calls to bytealg.IndexString with a MaxLen check. If the check fails then the code now falls back to the pre CL 156998 implementation (a loop over the runes in the string). Since the MaxLen check is now in place the generic implementation is no longer called so I have returned it to its original unimplemented state. In future we may want to drop MaxLen to prevent this kind of confusion. Fixes #41552. Change-Id: Ibeb3f08720444a05c08d719ed97f6cef2423bbe9 Reviewed-on: https://go-review.googlesource.com/c/go/+/256717 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
99d6e3eec2 |
internal/bytealg: use CBZ instructions
Use CBZ to replace the comparison and jump to the zero instruction in the arm64 assembly file. Change-Id: Ie16fb52e27b4d327343e119ebc0f0ca756437bc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/237477 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
c6a11f0dd2 |
crypto,internal/bytealg: fix assembly that clobbers BP
BP should be callee-save. It will be saved automatically if there is a nonzero frame size. Otherwise, we need to avoid this register. Change-Id: If3f551efa42d830c8793d9f0183cb8daad7a2ab5 Reviewed-on: https://go-review.googlesource.com/c/go/+/248260 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
3af92acb9f |
strings, bytes: improve IndexAny and LastIndexAny performance
For the case of a pattern containing multi-byte rune, the time complexity of the previous algorithm is O(nm), and if both input arguments are long, the search performance will be poor. This CL improves the searching performance for these cases by using IndexRune, which is mainly implemented with IndexByte and Index. As IndexByte and Index are specially optimized with some powerful instructions for short patterns (an UTF8 rune is 1 to 4 bytes), so they can help to reduce the runtime complexity of IndexAny and LastIndexAny. Another optimization method is using hash table, however, the actual test results show that using indexrune is better, and the space complexity is lower. There are two fast paths in IndexAny and LastIndexAny for cases where the length of the input arguements are 1, and their locations are not exactly the same, which is determined based on the actual test results. Benchmarks on arm64 and amd64: name old time/op new time/op delta pkg:strings goos:linux goarch:arm64 IndexAnyASCII/1:1-8 23.7ns ± 3% 28.5ns ± 0% +20.15% (p=0.008 n=5+5) IndexAnyASCII/1:2-8 18.0ns ± 0% 33.1ns ± 0% +83.67% (p=0.008 n=5+5) IndexAnyASCII/1:4-8 20.0ns ± 0% 36.0ns ± 0% +80.00% (p=0.029 n=4+4) IndexAnyASCII/1:8-8 36.1ns ± 0% 36.0ns ± 0% ~ (p=0.095 n=5+4) IndexAnyASCII/1:16-8 48.1ns ± 0% 36.0ns ± 0% -25.19% (p=0.029 n=4+4) IndexAnyASCII/1:32-8 72.1ns ± 0% 36.0ns ± 0% -50.01% (p=0.008 n=5+5) IndexAnyASCII/1:64-8 120ns ± 0% 39ns ± 0% -67.83% (p=0.008 n=5+5) IndexAnyASCII/16:1-8 73.0ns ± 0% 28.5ns ± 0% -60.95% (p=0.008 n=5+5) IndexAnyASCII/16:2-8 76.8ns ± 0% 77.0ns ± 0% ~ (p=1.000 n=5+5) IndexAnyASCII/16:4-8 83.2ns ± 1% 83.0ns ± 0% ~ (p=0.770 n=5+5) IndexAnyASCII/16:8-8 111ns ± 1% 107ns ± 0% -3.25% (p=0.008 n=5+5) IndexAnyASCII/16:16-8 139ns ± 1% 137ns ± 0% -1.58% (p=0.008 n=5+5) IndexAnyASCII/16:32-8 199ns ± 1% 197ns ± 0% -1.20% (p=0.008 n=5+5) IndexAnyASCII/16:64-8 307ns ± 0% 313ns ± 0% +1.82% (p=0.016 n=5+4) IndexAnyASCII/256:1-8 674ns ± 0% 65ns ± 0% -90.31% (p=0.008 n=5+5) IndexAnyASCII/256:2-8 678ns ± 0% 683ns ± 0% +0.68% (p=0.008 n=5+5) IndexAnyASCII/256:4-8 685ns ± 0% 683ns ± 0% -0.29% (p=0.000 n=5+4) IndexAnyASCII/256:8-8 711ns ± 0% 708ns ± 0% -0.48% (p=0.008 n=5+5) IndexAnyASCII/256:16-8 740ns ± 0% 740ns ± 0% ~ (p=0.444 n=5+5) IndexAnyASCII/256:32-8 799ns ± 0% 798ns ± 0% -0.18% (p=0.008 n=5+5) IndexAnyASCII/256:64-8 910ns ± 0% 914ns ± 0% +0.44% (p=0.016 n=4+5) IndexAnyUTF8/1:1-8 27.1ns ± 0% 19.0ns ± 0% -29.79% (p=0.008 n=5+5) IndexAnyUTF8/1:2-8 44.1ns ± 0% 33.0ns ± 0% -25.17% (p=0.008 n=5+5) IndexAnyUTF8/1:4-8 46.1ns ± 0% 33.1ns ± 0% -28.29% (p=0.016 n=4+5) IndexAnyUTF8/1:8-8 85.1ns ± 0% 33.0ns ± 0% -61.18% (p=0.008 n=5+5) IndexAnyUTF8/1:16-8 110ns ± 1% 36ns ± 0% -67.27% (p=0.008 n=5+5) IndexAnyUTF8/1:32-8 188ns ± 0% 36ns ± 0% -80.85% (p=0.008 n=5+5) IndexAnyUTF8/1:64-8 332ns ± 0% 39ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/16:1-8 293ns ± 0% 54ns ± 0% -81.56% (p=0.008 n=5+5) IndexAnyUTF8/16:2-8 563ns ± 0% 349ns ± 0% -37.98% (p=0.008 n=5+5) IndexAnyUTF8/16:4-8 546ns ± 1% 349ns ± 0% -36.10% (p=0.000 n=5+4) IndexAnyUTF8/16:8-8 1.22µs ± 0% 0.35µs ± 0% -71.39% (p=0.008 n=5+5) IndexAnyUTF8/16:16-8 1.63µs ± 1% 0.42µs ± 0% -73.98% (p=0.008 n=5+5) IndexAnyUTF8/16:32-8 2.87µs ± 0% 0.42µs ± 0% -85.22% (p=0.008 n=5+5) IndexAnyUTF8/16:64-8 5.18µs ± 0% 0.47µs ± 0% -90.98% (p=0.008 n=5+5) IndexAnyUTF8/256:1-8 4.26µs ± 0% 0.47µs ± 0% -88.85% (p=0.000 n=4+5) IndexAnyUTF8/256:2-8 8.62µs ± 0% 5.15µs ± 0% -40.21% (p=0.008 n=5+5) IndexAnyUTF8/256:4-8 8.25µs ± 0% 5.15µs ± 0% -37.50% (p=0.016 n=5+4) IndexAnyUTF8/256:8-8 19.2µs ± 1% 5.2µs ± 0% -73.08% (p=0.016 n=5+4) IndexAnyUTF8/256:16-8 25.6µs ± 1% 6.3µs ± 0% -75.32% (p=0.008 n=5+5) IndexAnyUTF8/256:32-8 45.6µs ± 0% 6.3µs ± 0% -86.15% (p=0.008 n=5+5) IndexAnyUTF8/256:64-8 82.4µs ± 0% 7.0µs ± 0% -91.53% (p=0.016 n=5+4) LastIndexAnyASCII/1:1-8 23.0ns ± 0% 33.5ns ± 0% +45.65% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-8 24.5ns ± 0% 33.5ns ± 0% +36.73% (p=0.016 n=4+5) LastIndexAnyASCII/1:4-8 27.5ns ± 0% 35.5ns ± 0% +29.09% (p=0.008 n=5+5) LastIndexAnyASCII/1:8-8 44.5ns ± 0% 35.5ns ± 0% -20.13% (p=0.008 n=5+5) LastIndexAnyASCII/1:16-8 56.5ns ± 0% 35.5ns ± 0% -37.15% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-8 80.3ns ± 0% 35.5ns ± 0% -55.79% (p=0.000 n=5+4) LastIndexAnyASCII/1:64-8 129ns ± 0% 40ns ± 0% -68.85% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-8 72.8ns ± 0% 72.7ns ± 0% -0.19% (p=0.016 n=4+5) LastIndexAnyASCII/16:2-8 75.4ns ± 0% 75.1ns ± 0% ~ (p=0.127 n=5+5) LastIndexAnyASCII/16:4-8 81.9ns ± 1% 80.2ns ± 0% -2.00% (p=0.008 n=5+5) LastIndexAnyASCII/16:8-8 110ns ± 1% 108ns ± 0% -1.46% (p=0.008 n=5+5) LastIndexAnyASCII/16:16-8 138ns ± 1% 134ns ± 0% -3.18% (p=0.008 n=5+5) LastIndexAnyASCII/16:32-8 198ns ± 0% 197ns ± 0% -0.51% (p=0.008 n=5+5) LastIndexAnyASCII/16:64-8 309ns ± 0% 313ns ± 0% +1.30% (p=0.008 n=5+5) LastIndexAnyASCII/256:1-8 652ns ± 0% 653ns ± 0% +0.21% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-8 656ns ± 0% 656ns ± 0% ~ (all equal) LastIndexAnyASCII/256:4-8 663ns ± 0% 663ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyASCII/256:8-8 691ns ± 0% 690ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/256:16-8 719ns ± 0% 715ns ± 0% -0.53% (p=0.000 n=5+4) LastIndexAnyASCII/256:32-8 779ns ± 0% 780ns ± 0% +0.13% (p=0.029 n=4+4) LastIndexAnyASCII/256:64-8 890ns ± 0% 894ns ± 0% +0.45% (p=0.008 n=5+5) LastIndexAnyUTF8/1:1-8 31.6ns ± 0% 33.5ns ± 0% +6.01% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-8 48.6ns ± 0% 33.5ns ± 0% -30.99% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-8 48.6ns ± 0% 33.5ns ± 0% -31.13% (p=0.000 n=5+4) LastIndexAnyUTF8/1:8-8 89.6ns ± 0% 33.5ns ± 0% -62.56% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-8 113ns ± 1% 36ns ± 0% -68.47% (p=0.000 n=5+4) LastIndexAnyUTF8/1:32-8 190ns ± 0% 36ns ± 0% -81.26% (p=0.029 n=4+4) LastIndexAnyUTF8/1:64-8 327ns ± 0% 40ns ± 0% -87.77% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-8 364ns ± 0% 158ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyUTF8/16:2-8 636ns ± 0% 472ns ± 0% -25.79% (p=0.000 n=5+4) LastIndexAnyUTF8/16:4-8 630ns ± 0% 472ns ± 0% -25.03% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-8 1.28µs ± 0% 0.47µs ± 0% -63.09% (p=0.016 n=5+4) LastIndexAnyUTF8/16:16-8 1.66µs ± 0% 0.53µs ± 0% -68.39% (p=0.016 n=5+4) LastIndexAnyUTF8/16:32-8 2.88µs ± 0% 0.53µs ± 0% -81.72% (p=0.008 n=5+5) LastIndexAnyUTF8/16:64-8 5.08µs ± 0% 0.57µs ± 0% -88.79% (p=0.008 n=5+5) LastIndexAnyUTF8/256:1-8 5.41µs ± 0% 2.03µs ± 0% -62.46% (p=0.016 n=4+5) LastIndexAnyUTF8/256:2-8 9.77µs ± 0% 7.14µs ± 0% -26.97% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-8 9.63µs ± 0% 7.14µs ± 0% -25.86% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-8 20.0µs ± 0% 7.1µs ± 0% -64.30% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-8 26.1µs ± 1% 8.0µs ± 0% -69.40% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-8 45.6µs ± 1% 8.0µs ± 0% -82.51% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-8 80.8µs ± 0% 8.6µs ± 0% -89.33% (p=0.016 n=5+4) pkg:bytes goos:linux goarch:arm64 IndexAnyASCII/1:1-8 26.2ns ± 1% 26.5ns ± 0% +1.30% (p=0.016 n=5+4) IndexAnyASCII/1:2-8 18.5ns ± 0% 26.5ns ± 0% +43.24% (p=0.008 n=5+5) IndexAnyASCII/1:4-8 21.0ns ± 0% 26.5ns ± 0% +26.38% (p=0.008 n=5+5) IndexAnyASCII/1:8-8 37.5ns ± 0% 26.5ns ± 0% -29.33% (p=0.000 n=5+4) IndexAnyASCII/1:16-8 49.6ns ± 0% 26.5ns ± 0% -46.49% (p=0.008 n=5+5) IndexAnyASCII/1:32-8 73.6ns ± 0% 30.1ns ± 0% -59.16% (p=0.008 n=5+5) IndexAnyASCII/1:64-8 122ns ± 0% 33ns ± 0% -73.23% (p=0.008 n=5+5) IndexAnyASCII/16:1-8 73.7ns ± 0% 33.4ns ± 0% -54.71% (p=0.008 n=5+5) IndexAnyASCII/16:2-8 79.1ns ± 0% 78.9ns ± 0% -0.30% (p=0.016 n=4+5) IndexAnyASCII/16:4-8 84.8ns ± 0% 86.1ns ± 0% +1.58% (p=0.016 n=5+4) IndexAnyASCII/16:8-8 111ns ± 0% 111ns ± 0% ~ (all equal) IndexAnyASCII/16:16-8 139ns ± 0% 144ns ± 0% +3.60% (p=0.016 n=4+5) IndexAnyASCII/16:32-8 196ns ± 0% 207ns ± 0% +5.61% (p=0.016 n=5+4) IndexAnyASCII/16:64-8 311ns ± 0% 320ns ± 0% +2.89% (p=0.016 n=4+5) IndexAnyASCII/256:1-8 674ns ± 0% 65ns ± 1% -90.35% (p=0.008 n=5+5) IndexAnyASCII/256:2-8 680ns ± 0% 680ns ± 0% ~ (p=0.444 n=5+5) IndexAnyASCII/256:4-8 686ns ± 0% 687ns ± 0% ~ (p=0.167 n=5+5) IndexAnyASCII/256:8-8 713ns ± 0% 712ns ± 0% -0.14% (p=0.008 n=5+5) IndexAnyASCII/256:16-8 740ns ± 0% 744ns ± 0% +0.54% (p=0.016 n=5+4) IndexAnyASCII/256:32-8 797ns ± 0% 808ns ± 0% +1.43% (p=0.008 n=5+5) IndexAnyASCII/256:64-8 912ns ± 0% 921ns ± 0% +0.99% (p=0.016 n=4+5) IndexAnyUTF8/1:1-8 27.5ns ± 0% 26.5ns ± 0% -3.64% (p=0.008 n=5+5) IndexAnyUTF8/1:2-8 44.5ns ± 0% 26.5ns ± 0% -40.50% (p=0.008 n=5+5) IndexAnyUTF8/1:4-8 45.6ns ± 0% 26.5ns ± 0% -41.89% (p=0.000 n=5+4) IndexAnyUTF8/1:8-8 85.8ns ± 1% 26.5ns ± 0% -69.11% (p=0.008 n=5+5) IndexAnyUTF8/1:16-8 110ns ± 1% 26ns ± 0% -76.00% (p=0.016 n=5+4) IndexAnyUTF8/1:32-8 188ns ± 0% 30ns ± 0% -84.04% (p=0.008 n=5+5) IndexAnyUTF8/1:64-8 333ns ± 0% 33ns ± 0% -90.20% (p=0.008 n=5+5) IndexAnyUTF8/16:1-8 294ns ± 0% 235ns ± 0% -20.07% (p=0.008 n=5+5) IndexAnyUTF8/16:2-8 563ns ± 0% 309ns ± 0% -45.12% (p=0.008 n=5+5) IndexAnyUTF8/16:4-8 558ns ± 1% 309ns ± 0% -44.60% (p=0.000 n=5+4) IndexAnyUTF8/16:8-8 1.23µs ± 0% 0.31µs ± 0% -74.79% (p=0.008 n=5+5) IndexAnyUTF8/16:16-8 1.62µs ± 2% 0.31µs ± 0% -80.93% (p=0.008 n=5+5) IndexAnyUTF8/16:32-8 2.86µs ± 0% 0.38µs ± 0% -86.87% (p=0.008 n=5+5) IndexAnyUTF8/16:64-8 5.18µs ± 0% 0.42µs ± 0% -91.86% (p=0.008 n=5+5) IndexAnyUTF8/256:1-8 4.27µs ± 1% 3.30µs ± 1% -22.75% (p=0.008 n=5+5) IndexAnyUTF8/256:2-8 8.61µs ± 0% 4.45µs ± 0% -48.31% (p=0.016 n=4+5) IndexAnyUTF8/256:4-8 8.44µs ± 0% 4.45µs ± 0% -47.23% (p=0.008 n=5+5) IndexAnyUTF8/256:8-8 19.2µs ± 0% 4.5µs ± 0% -76.78% (p=0.008 n=5+5) IndexAnyUTF8/256:16-8 25.6µs ± 0% 4.5µs ± 0% -82.63% (p=0.008 n=5+5) IndexAnyUTF8/256:32-8 45.4µs ± 0% 5.5µs ± 0% -87.85% (p=0.016 n=4+5) IndexAnyUTF8/256:64-8 82.5µs ± 0% 6.2µs ± 0% -92.49% (p=0.008 n=5+5) LastIndexAnyASCII/1:1-8 23.0ns ± 0% 26.5ns ± 0% +15.02% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-8 24.5ns ± 0% 26.5ns ± 0% +8.16% (p=0.008 n=5+5) LastIndexAnyASCII/1:4-8 27.8ns ± 0% 26.5ns ± 0% -4.68% (p=0.029 n=4+4) LastIndexAnyASCII/1:8-8 45.1ns ± 1% 26.5ns ± 0% -41.29% (p=0.000 n=5+4) LastIndexAnyASCII/1:16-8 57.1ns ± 0% 26.5ns ± 0% -53.61% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-8 81.5ns ± 0% 30.0ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/1:64-8 129ns ± 0% 32ns ± 0% -74.81% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-8 72.6ns ± 0% 72.1ns ± 0% -0.63% (p=0.000 n=4+5) LastIndexAnyASCII/16:2-8 77.2ns ± 0% 77.2ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/16:4-8 83.1ns ± 0% 83.2ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyASCII/16:8-8 109ns ± 1% 108ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/16:16-8 136ns ± 0% 136ns ± 0% ~ (all equal) LastIndexAnyASCII/16:32-8 195ns ± 0% 197ns ± 0% +0.82% (p=0.008 n=5+5) LastIndexAnyASCII/16:64-8 309ns ± 0% 309ns ± 0% ~ (all equal) LastIndexAnyASCII/256:1-8 653ns ± 0% 657ns ± 0% +0.61% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-8 659ns ± 0% 658ns ± 0% ~ (p=0.167 n=5+5) LastIndexAnyASCII/256:4-8 664ns ± 0% 663ns ± 0% ~ (p=0.095 n=5+4) LastIndexAnyASCII/256:8-8 698ns ± 0% 689ns ± 0% -1.29% (p=0.008 n=5+5) LastIndexAnyASCII/256:16-8 726ns ± 0% 717ns ± 0% -1.24% (p=0.008 n=5+5) LastIndexAnyASCII/256:32-8 777ns ± 0% 779ns ± 0% ~ (p=0.079 n=5+4) LastIndexAnyASCII/256:64-8 889ns ± 0% 890ns ± 0% ~ (p=0.444 n=5+5) LastIndexAnyUTF8/1:1-8 32.1ns ± 0% 26.5ns ± 0% -17.45% (p=0.000 n=5+4) LastIndexAnyUTF8/1:2-8 48.6ns ± 0% 26.5ns ± 0% -45.52% (p=0.000 n=5+4) LastIndexAnyUTF8/1:4-8 49.6ns ± 0% 26.5ns ± 0% -46.62% (p=0.008 n=5+5) LastIndexAnyUTF8/1:8-8 91.9ns ± 0% 26.5ns ± 0% -71.18% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-8 114ns ± 1% 26ns ± 0% -76.84% (p=0.000 n=5+4) LastIndexAnyUTF8/1:32-8 203ns ± 6% 30ns ± 0% -85.25% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-8 330ns ± 0% 33ns ± 0% -90.14% (p=0.000 n=4+5) LastIndexAnyUTF8/16:1-8 365ns ± 0% 164ns ± 0% -55.04% (p=0.008 n=5+5) LastIndexAnyUTF8/16:2-8 638ns ± 0% 296ns ± 0% -53.58% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-8 634ns ± 0% 296ns ± 0% -53.31% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-8 1.30µs ± 0% 0.30µs ± 0% -77.18% (p=0.000 n=4+5) LastIndexAnyUTF8/16:16-8 1.66µs ± 0% 0.30µs ± 0% -82.19% (p=0.008 n=5+5) LastIndexAnyUTF8/16:32-8 2.90µs ± 0% 0.38µs ± 0% -87.00% (p=0.029 n=4+4) LastIndexAnyUTF8/16:64-8 5.10µs ± 0% 0.42µs ± 0% -91.78% (p=0.008 n=5+5) LastIndexAnyUTF8/256:1-8 5.42µs ± 0% 2.12µs ± 0% -60.92% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-8 9.79µs ± 0% 4.26µs ± 0% -56.47% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-8 9.66µs ± 0% 4.26µs ± 0% -55.87% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-8 20.4µs ± 0% 4.3µs ± 0% -79.10% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-8 26.0µs ± 1% 4.3µs ± 0% -83.62% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-8 46.0µs ± 0% 5.5µs ± 0% -88.09% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-8 81.1µs ± 0% 6.2µs ± 0% -92.38% (p=0.008 n=5+5) name old time/op new time/op delta pkg:strings goos:linux goarch:amd64 IndexAnyASCII/1:1-48 10.0ns ± 0% 13.3ns ± 0% +33.00% (p=0.008 n=5+5) IndexAnyASCII/1:2-48 11.0ns ± 0% 15.5ns ± 0% +40.55% (p=0.016 n=4+5) IndexAnyASCII/1:4-48 12.9ns ± 0% 15.4ns ± 0% +19.69% (p=0.008 n=5+5) IndexAnyASCII/1:8-48 18.6ns ± 0% 15.5ns ± 0% -16.45% (p=0.000 n=4+5) IndexAnyASCII/1:16-48 30.1ns ± 0% 16.9ns ± 0% ~ (p=0.079 n=4+5) IndexAnyASCII/1:32-48 53.1ns ± 0% 18.6ns ± 0% -64.95% (p=0.000 n=5+4) IndexAnyASCII/1:64-48 98.9ns ± 0% 17.4ns ± 0% -82.41% (p=0.000 n=5+4) IndexAnyASCII/16:1-48 35.0ns ± 0% 14.2ns ± 0% -59.47% (p=0.000 n=5+4) IndexAnyASCII/16:2-48 35.5ns ± 0% 35.6ns ± 0% ~ (p=0.238 n=5+4) IndexAnyASCII/16:4-48 40.8ns ± 0% 40.7ns ± 1% ~ (p=0.643 n=5+5) IndexAnyASCII/16:8-48 50.8ns ± 0% 50.9ns ± 1% ~ (p=1.000 n=4+5) IndexAnyASCII/16:16-48 64.0ns ± 1% 64.5ns ± 1% ~ (p=0.071 n=5+5) IndexAnyASCII/16:32-48 98.3ns ± 0% 100.8ns ± 1% +2.52% (p=0.008 n=5+5) IndexAnyASCII/16:64-48 156ns ± 0% 157ns ± 0% ~ (p=0.238 n=4+5) IndexAnyASCII/256:1-48 299ns ± 0% 24ns ± 3% -92.12% (p=0.008 n=5+5) IndexAnyASCII/256:2-48 303ns ± 0% 304ns ± 0% ~ (p=0.762 n=5+5) IndexAnyASCII/256:4-48 311ns ± 0% 311ns ± 0% ~ (p=0.476 n=5+5) IndexAnyASCII/256:8-48 321ns ± 0% 321ns ± 0% ~ (p=0.429 n=4+5) IndexAnyASCII/256:16-48 334ns ± 0% 335ns ± 0% ~ (p=0.079 n=5+4) IndexAnyASCII/256:32-48 367ns ± 0% 365ns ± 0% ~ (p=0.079 n=4+5) IndexAnyASCII/256:64-48 431ns ± 1% 421ns ± 0% -2.27% (p=0.008 n=5+5) IndexAnyUTF8/1:1-48 17.2ns ± 0% 10.8ns ± 0% -37.21% (p=0.029 n=4+4) IndexAnyUTF8/1:2-48 26.7ns ± 0% 15.6ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/1:4-48 28.2ns ± 0% 15.6ns ± 0% -44.68% (p=0.000 n=5+4) IndexAnyUTF8/1:8-48 48.8ns ± 0% 15.6ns ± 0% -68.03% (p=0.029 n=4+4) IndexAnyUTF8/1:16-48 58.3ns ± 0% 16.2ns ± 0% ~ (p=0.079 n=4+5) IndexAnyUTF8/1:32-48 103ns ± 0% 18ns ± 0% -82.27% (p=0.008 n=5+5) IndexAnyUTF8/1:64-48 182ns ± 0% 17ns ± 0% -90.53% (p=0.008 n=5+5) IndexAnyUTF8/16:1-48 197ns ± 0% 25ns ± 0% -87.34% (p=0.000 n=5+4) IndexAnyUTF8/16:2-48 348ns ± 0% 163ns ± 0% -53.11% (p=0.000 n=5+4) IndexAnyUTF8/16:4-48 374ns ± 0% 163ns ± 0% -56.37% (p=0.000 n=5+4) IndexAnyUTF8/16:8-48 716ns ± 0% 163ns ± 0% -77.22% (p=0.000 n=5+4) IndexAnyUTF8/16:16-48 859ns ± 0% 175ns ± 0% -79.63% (p=0.000 n=5+4) IndexAnyUTF8/16:32-48 1.58µs ± 0% 0.20µs ± 0% -87.01% (p=0.029 n=4+4) IndexAnyUTF8/16:64-48 2.84µs ± 0% 0.19µs ± 1% -93.34% (p=0.008 n=5+5) IndexAnyUTF8/256:1-48 2.61µs ± 0% 0.27µs ± 0% -89.81% (p=0.008 n=5+5) IndexAnyUTF8/256:2-48 4.95µs ± 0% 2.23µs ± 0% -54.91% (p=0.016 n=5+4) IndexAnyUTF8/256:4-48 5.55µs ± 0% 2.23µs ± 0% -59.72% (p=0.008 n=5+5) IndexAnyUTF8/256:8-48 10.8µs ± 0% 2.2µs ± 0% -79.39% (p=0.008 n=5+5) IndexAnyUTF8/256:16-48 13.1µs ± 0% 2.5µs ± 0% -81.21% (p=0.016 n=4+5) IndexAnyUTF8/256:32-48 24.7µs ± 0% 2.8µs ± 0% -88.49% (p=0.008 n=5+5) IndexAnyUTF8/256:64-48 45.0µs ± 0% 2.6µs ± 1% -94.23% (p=0.008 n=5+5) LastIndexAnyASCII/1:1-48 13.9ns ± 0% 15.2ns ± 0% +9.35% (p=0.008 n=5+5) LastIndexAnyASCII/1:2-48 14.4ns ± 0% 15.2ns ± 0% +5.56% (p=0.008 n=5+5) LastIndexAnyASCII/1:4-48 16.7ns ± 0% 15.2ns ± 0% -8.98% (p=0.008 n=5+5) LastIndexAnyASCII/1:8-48 24.0ns ± 0% 15.2ns ± 0% -36.67% (p=0.008 n=5+5) LastIndexAnyASCII/1:16-48 35.6ns ± 0% 15.0ns ± 0% -57.82% (p=0.008 n=5+5) LastIndexAnyASCII/1:32-48 68.9ns ± 0% 16.7ns ± 0% -75.75% (p=0.008 n=5+5) LastIndexAnyASCII/1:64-48 104ns ± 0% 17ns ± 1% -83.81% (p=0.008 n=5+5) LastIndexAnyASCII/16:1-48 35.0ns ± 0% 35.0ns ± 0% ~ (all equal) LastIndexAnyASCII/16:2-48 35.6ns ± 0% 35.6ns ± 0% ~ (all equal) LastIndexAnyASCII/16:4-48 41.0ns ± 0% 40.8ns ± 0% -0.49% (p=0.032 n=5+5) LastIndexAnyASCII/16:8-48 50.9ns ± 0% 50.7ns ± 1% ~ (p=0.397 n=5+5) LastIndexAnyASCII/16:16-48 64.3ns ± 1% 64.4ns ± 1% ~ (p=1.000 n=4+5) LastIndexAnyASCII/16:32-48 100ns ± 0% 100ns ± 0% +0.38% (p=0.016 n=4+5) LastIndexAnyASCII/16:64-48 157ns ± 1% 163ns ± 0% +3.82% (p=0.008 n=5+5) LastIndexAnyASCII/256:1-48 302ns ± 0% 300ns ± 0% -0.53% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-48 305ns ± 0% 303ns ± 0% -0.66% (p=0.000 n=5+4) LastIndexAnyASCII/256:4-48 313ns ± 0% 307ns ± 0% -2.04% (p=0.000 n=4+5) LastIndexAnyASCII/256:8-48 323ns ± 0% 315ns ± 0% -2.48% (p=0.029 n=4+4) LastIndexAnyASCII/256:16-48 333ns ± 0% 332ns ± 0% -0.30% (p=0.048 n=5+5) LastIndexAnyASCII/256:32-48 366ns ± 0% 367ns ± 0% ~ (p=0.238 n=4+5) LastIndexAnyASCII/256:64-48 430ns ± 0% 430ns ± 0% ~ (p=1.000 n=5+5) LastIndexAnyUTF8/1:1-48 21.1ns ± 0% 13.9ns ± 0% -34.00% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-48 29.5ns ± 0% 13.9ns ± 0% -52.95% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-48 31.6ns ± 0% 13.9ns ± 0% -55.96% (p=0.008 n=5+5) LastIndexAnyUTF8/1:8-48 51.1ns ± 0% 13.9ns ± 0% -72.81% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-48 58.9ns ± 0% 14.6ns ± 0% -75.23% (p=0.016 n=5+4) LastIndexAnyUTF8/1:32-48 103ns ± 0% 16ns ± 1% -84.12% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-48 177ns ± 0% 17ns ± 1% -90.62% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-48 275ns ± 1% 105ns ± 0% -61.85% (p=0.000 n=5+4) LastIndexAnyUTF8/16:2-48 406ns ± 0% 216ns ± 0% -46.70% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-48 458ns ± 0% 216ns ± 0% -52.75% (p=0.000 n=4+5) LastIndexAnyUTF8/16:8-48 753ns ± 0% 216ns ± 0% -71.31% (p=0.029 n=4+4) LastIndexAnyUTF8/16:16-48 902ns ± 0% 221ns ± 0% -75.50% (p=0.016 n=5+4) LastIndexAnyUTF8/16:32-48 1.57µs ± 0% 0.24µs ± 0% -84.46% (p=0.008 n=5+5) LastIndexAnyUTF8/16:64-48 2.77µs ± 0% 0.24µs ± 0% -91.22% (p=0.000 n=5+4) LastIndexAnyUTF8/256:1-48 4.06µs ± 0% 1.53µs ± 0% -62.26% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-48 5.92µs ± 0% 3.04µs ± 0% -48.55% (p=0.016 n=4+5) LastIndexAnyUTF8/256:4-48 6.82µs ± 0% 3.04µs ± 0% -55.34% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-48 11.5µs ± 0% 3.0µs ± 0% -73.48% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-48 14.1µs ± 0% 3.1µs ± 0% -77.85% (p=0.008 n=5+5) LastIndexAnyUTF8/256:32-48 24.5µs ± 0% 3.5µs ± 0% -85.85% (p=0.016 n=5+4) LastIndexAnyUTF8/256:64-48 44.0µs ± 0% 3.5µs ± 0% -92.12% (p=0.008 n=5+5) pkg:bytes goos:linux goarch:amd64 IndexAnyASCII/1:1-48 9.56ns ± 0% 11.00ns ± 0% +15.06% (p=0.016 n=5+4) IndexAnyASCII/1:2-48 11.0ns ± 0% 10.8ns ± 2% -1.64% (p=0.048 n=5+5) IndexAnyASCII/1:4-48 13.9ns ± 0% 11.0ns ± 1% -21.15% (p=0.008 n=5+5) IndexAnyASCII/1:8-48 19.6ns ± 0% 10.8ns ± 3% -44.90% (p=0.008 n=5+5) IndexAnyASCII/1:16-48 31.1ns ± 0% 11.5ns ± 0% -63.02% (p=0.008 n=5+5) IndexAnyASCII/1:32-48 54.0ns ± 0% 11.8ns ± 0% -78.15% (p=0.000 n=5+4) IndexAnyASCII/1:64-48 100ns ± 0% 13ns ± 0% -86.89% (p=0.008 n=5+5) IndexAnyASCII/16:1-48 35.5ns ± 0% 14.8ns ± 0% -58.26% (p=0.008 n=5+5) IndexAnyASCII/16:2-48 36.2ns ± 1% 36.0ns ± 1% ~ (p=0.087 n=5+5) IndexAnyASCII/16:4-48 40.3ns ± 1% 39.7ns ± 4% ~ (p=0.175 n=4+5) IndexAnyASCII/16:8-48 48.7ns ± 5% 45.8ns ± 0% -6.02% (p=0.016 n=5+4) IndexAnyASCII/16:16-48 64.1ns ±11% 62.1ns ± 1% ~ (p=0.143 n=5+5) IndexAnyASCII/16:32-48 97.9ns ± 1% 98.3ns ± 1% ~ (p=0.294 n=5+5) IndexAnyASCII/16:64-48 163ns ± 0% 157ns ± 0% -3.68% (p=0.008 n=5+5) IndexAnyASCII/256:1-48 389ns ± 0% 25ns ± 0% -93.65% (p=0.000 n=5+4) IndexAnyASCII/256:2-48 391ns ± 0% 307ns ± 0% -21.48% (p=0.000 n=5+4) IndexAnyASCII/256:4-48 394ns ± 0% 323ns ± 0% -17.92% (p=0.008 n=5+5) IndexAnyASCII/256:8-48 402ns ± 0% 323ns ± 0% -19.51% (p=0.008 n=5+5) IndexAnyASCII/256:16-48 414ns ± 0% 334ns ± 0% -19.32% (p=0.016 n=4+5) IndexAnyASCII/256:32-48 446ns ± 0% 367ns ± 0% -17.75% (p=0.016 n=5+4) IndexAnyASCII/256:64-48 511ns ± 0% 424ns ± 0% -17.02% (p=0.008 n=5+5) IndexAnyUTF8/1:1-48 17.4ns ± 0% 11.0ns ± 0% -36.64% (p=0.008 n=5+5) IndexAnyUTF8/1:2-48 27.3ns ± 1% 11.0ns ± 0% -59.74% (p=0.008 n=5+5) IndexAnyUTF8/1:4-48 28.7ns ± 0% 11.0ns ± 0% -61.73% (p=0.008 n=5+5) IndexAnyUTF8/1:8-48 49.2ns ± 0% 11.0ns ± 0% -77.66% (p=0.008 n=5+5) IndexAnyUTF8/1:16-48 56.0ns ± 0% 11.5ns ± 0% -79.46% (p=0.000 n=5+4) IndexAnyUTF8/1:32-48 102ns ± 0% 12ns ± 0% -88.24% (p=0.008 n=5+5) IndexAnyUTF8/1:64-48 177ns ± 0% 13ns ± 0% -92.51% (p=0.008 n=5+5) IndexAnyUTF8/16:1-48 212ns ± 0% 112ns ± 0% -47.17% (p=0.008 n=5+5) IndexAnyUTF8/16:2-48 356ns ± 0% 159ns ± 1% -55.28% (p=0.000 n=4+5) IndexAnyUTF8/16:4-48 372ns ± 0% 158ns ± 0% -57.47% (p=0.008 n=5+5) IndexAnyUTF8/16:8-48 712ns ± 0% 159ns ± 1% -77.70% (p=0.008 n=5+5) IndexAnyUTF8/16:16-48 829ns ± 0% 129ns ± 0% -84.44% (p=0.008 n=5+5) IndexAnyUTF8/16:32-48 1.55µs ± 0% 0.16µs ± 0% -89.87% (p=0.008 n=5+5) IndexAnyUTF8/16:64-48 2.77µs ± 0% 0.14µs ± 0% -94.94% (p=0.008 n=5+5) IndexAnyUTF8/256:1-48 2.85µs ± 0% 1.63µs ± 1% -42.74% (p=0.008 n=5+5) IndexAnyUTF8/256:2-48 5.14µs ± 1% 2.03µs ± 0% -60.51% (p=0.008 n=5+5) IndexAnyUTF8/256:4-48 5.56µs ± 0% 2.03µs ± 0% -63.52% (p=0.008 n=5+5) IndexAnyUTF8/256:8-48 10.8µs ± 0% 2.0µs ± 0% -81.22% (p=0.008 n=5+5) IndexAnyUTF8/256:16-48 12.9µs ± 0% 1.9µs ± 0% -85.55% (p=0.008 n=5+5) IndexAnyUTF8/256:32-48 24.2µs ± 0% 2.1µs ± 0% -91.29% (p=0.016 n=5+4) IndexAnyUTF8/256:64-48 43.7µs ± 0% 2.0µs ± 0% -95.32% (p=0.016 n=5+4) LastIndexAnyASCII/1:1-48 13.7ns ± 1% 12.8ns ± 0% -6.57% (p=0.016 n=5+4) LastIndexAnyASCII/1:2-48 14.7ns ± 0% 12.7ns ± 1% -13.33% (p=0.000 n=4+5) LastIndexAnyASCII/1:4-48 16.9ns ± 0% 12.7ns ± 1% -24.73% (p=0.000 n=4+5) LastIndexAnyASCII/1:8-48 20.5ns ± 0% 12.7ns ± 0% -37.85% (p=0.000 n=4+5) LastIndexAnyASCII/1:16-48 28.0ns ± 0% 11.7ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyASCII/1:32-48 69.8ns ± 0% 12.4ns ± 0% -82.19% (p=0.008 n=5+5) LastIndexAnyASCII/1:64-48 73.8ns ± 0% 13.3ns ± 0% -82.03% (p=0.000 n=4+5) LastIndexAnyASCII/16:1-48 35.5ns ± 0% 35.5ns ± 0% ~ (all equal) LastIndexAnyASCII/16:2-48 36.0ns ± 0% 36.1ns ± 0% +0.28% (p=0.016 n=4+5) LastIndexAnyASCII/16:4-48 40.3ns ± 2% 40.0ns ± 6% ~ (p=0.651 n=5+5) LastIndexAnyASCII/16:8-48 50.3ns ± 0% 50.2ns ± 9% ~ (p=0.175 n=4+5) LastIndexAnyASCII/16:16-48 62.4ns ± 4% 64.4ns ± 0% +3.28% (p=0.016 n=5+4) LastIndexAnyASCII/16:32-48 98.9ns ± 0% 98.4ns ± 0% -0.53% (p=0.016 n=5+4) LastIndexAnyASCII/16:64-48 160ns ± 1% 161ns ± 1% ~ (p=0.325 n=5+5) LastIndexAnyASCII/256:1-48 300ns ± 0% 301ns ± 0% +0.33% (p=0.008 n=5+5) LastIndexAnyASCII/256:2-48 304ns ± 0% 304ns ± 0% ~ (p=1.000 n=5+5) LastIndexAnyASCII/256:4-48 311ns ± 0% 311ns ± 0% ~ (p=0.556 n=4+5) LastIndexAnyASCII/256:8-48 320ns ± 0% 321ns ± 0% ~ (p=0.143 n=5+5) LastIndexAnyASCII/256:16-48 333ns ± 0% 335ns ± 0% +0.60% (p=0.029 n=4+4) LastIndexAnyASCII/256:32-48 367ns ± 0% 366ns ± 0% ~ (p=0.095 n=4+5) LastIndexAnyASCII/256:64-48 431ns ± 0% 424ns ± 0% -1.62% (p=0.008 n=5+5) LastIndexAnyUTF8/1:1-48 19.7ns ± 1% 11.9ns ± 0% -39.47% (p=0.008 n=5+5) LastIndexAnyUTF8/1:2-48 27.6ns ± 1% 11.9ns ± 0% -56.82% (p=0.008 n=5+5) LastIndexAnyUTF8/1:4-48 29.9ns ± 0% 11.9ns ± 0% ~ (p=0.079 n=4+5) LastIndexAnyUTF8/1:8-48 48.7ns ± 0% 11.9ns ± 0% -75.54% (p=0.008 n=5+5) LastIndexAnyUTF8/1:16-48 57.8ns ± 0% 11.4ns ± 0% -80.26% (p=0.008 n=5+5) LastIndexAnyUTF8/1:32-48 94.7ns ± 0% 12.2ns ± 0% -87.07% (p=0.008 n=5+5) LastIndexAnyUTF8/1:64-48 163ns ± 0% 13ns ± 1% -91.93% (p=0.008 n=5+5) LastIndexAnyUTF8/16:1-48 258ns ± 0% 88ns ± 0% -65.76% (p=0.008 n=5+5) LastIndexAnyUTF8/16:2-48 400ns ± 0% 162ns ± 0% -59.38% (p=0.008 n=5+5) LastIndexAnyUTF8/16:4-48 415ns ± 0% 162ns ± 0% -60.87% (p=0.008 n=5+5) LastIndexAnyUTF8/16:8-48 737ns ± 0% 162ns ± 0% -78.02% (p=0.000 n=5+4) LastIndexAnyUTF8/16:16-48 882ns ± 0% 128ns ± 0% -85.49% (p=0.008 n=5+5) LastIndexAnyUTF8/16:32-48 1.47µs ± 0% 0.16µs ± 0% -89.29% (p=0.000 n=4+5) LastIndexAnyUTF8/16:64-48 2.56µs ± 0% 0.14µs ± 0% -94.41% (p=0.016 n=5+4) LastIndexAnyUTF8/256:1-48 3.60µs ± 0% 1.23µs ± 0% -65.67% (p=0.008 n=5+5) LastIndexAnyUTF8/256:2-48 5.78µs ± 0% 2.18µs ± 0% -62.32% (p=0.008 n=5+5) LastIndexAnyUTF8/256:4-48 6.26µs ± 0% 2.18µs ± 0% -65.15% (p=0.008 n=5+5) LastIndexAnyUTF8/256:8-48 11.2µs ± 0% 2.2µs ± 0% -80.53% (p=0.008 n=5+5) LastIndexAnyUTF8/256:16-48 13.5µs ± 0% 1.9µs ± 0% -86.02% (p=0.016 n=4+5) LastIndexAnyUTF8/256:32-48 23.0µs ± 0% 2.1µs ± 0% -90.72% (p=0.008 n=5+5) LastIndexAnyUTF8/256:64-48 40.5µs ± 0% 2.1µs ± 0% -94.73% (p=0.008 n=5+5) Change-Id: Ie05e306f8b184b989701868cb161ce8b3f18203b Reviewed-on: https://go-review.googlesource.com/c/go/+/156998 Run-TryBot: eric fang <eric.fang@arm.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
18a6fd44bb |
bytes, strings: moves indexRabinKarp function to internal/bytealg
In order to facilitate optimization of IndexAny and LastIndexAny, this patch moves three Rabin-Karp related functions indexRabinKarp, hashStr and hashStrRev in strings package to initernal/bytealg. There are also three functions in the bytes package with the same names and functions but different parameter types. To highlight this, this patch also moves them to internal/bytealg and gives them slightly different names. Related benchmark changes on amd64 and arm64: name old time/op new time/op delta pkg:strings goos:linux goarch:amd64 Index-16 14.0ns ± 1% 14.1ns ± 2% ~ (p=0.738 n=5+5) LastIndex-16 15.5ns ± 1% 15.7ns ± 4% ~ (p=0.897 n=5+5) pkg:bytes goos:linux goarch:amd64 Index/10-16 26.5ns ± 1% 26.5ns ± 0% ~ (p=0.873 n=5+5) Index/32-16 26.2ns ± 0% 25.7ns ± 0% -1.68% (p=0.008 n=5+5) Index/4K-16 5.12µs ± 4% 5.14µs ± 2% ~ (p=0.841 n=5+5) Index/4M-16 5.44ms ± 3% 5.34ms ± 2% ~ (p=0.056 n=5+5) Index/64M-16 85.8ms ± 3% 84.6ms ± 0% -1.37% (p=0.016 n=5+5) name old speed new speed delta pkg:bytes goos:linux goarch:amd64 Index/10-16 377MB/s ± 1% 377MB/s ± 0% ~ (p=1.000 n=5+5) Index/32-16 1.22GB/s ± 1% 1.24GB/s ± 0% +1.66% (p=0.008 n=5+5) Index/4K-16 800MB/s ± 4% 797MB/s ± 2% ~ (p=0.841 n=5+5) Index/4M-16 771MB/s ± 3% 786MB/s ± 2% ~ (p=0.056 n=5+5) Index/64M-16 783MB/s ± 3% 793MB/s ± 0% +1.36% (p=0.016 n=5+5) name old time/op new time/op delta pkg:strings goos:linux goarch:arm64 Index-8 22.6ns ± 0% 22.5ns ± 0% ~ (p=0.167 n=5+5) LastIndex-8 17.5ns ± 0% 17.5ns ± 0% ~ (all equal) pkg:bytes goos:linux goarch:arm64 Index/10-8 25.0ns ± 0% 25.0ns ± 0% ~ (all equal) Index/32-8 160ns ± 0% 160ns ± 0% ~ (all equal) Index/4K-8 6.26µs ± 0% 6.26µs ± 0% ~ (p=0.167 n=5+5) Index/4M-8 6.30ms ± 0% 6.31ms ± 0% ~ (p=1.000 n=5+5) Index/64M-8 101ms ± 0% 101ms ± 0% ~ (p=0.690 n=5+5) name old speed new speed delta pkg:bytes goos:linux goarch:arm64 Index/10-8 399MB/s ± 0% 400MB/s ± 0% +0.08% (p=0.008 n=5+5) Index/32-8 200MB/s ± 0% 200MB/s ± 0% ~ (p=0.127 n=4+5) Index/4K-8 654MB/s ± 0% 654MB/s ± 0% +0.01% (p=0.016 n=5+5) Index/4M-8 665MB/s ± 0% 665MB/s ± 0% ~ (p=0.833 n=5+5) Index/64M-8 665MB/s ± 0% 665MB/s ± 0% ~ (p=0.913 n=5+5) Change-Id: Icce3bc162bb8613ac36dc963a46c51f8e82ab842 Reviewed-on: https://go-review.googlesource.com/c/go/+/208638 Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
e3f2e9ac4e |
internal/bytealg: fix riscv64 offset names
Vet caught that these were incorrect. Updates #37022 Change-Id: I7b5cd8032ea95eb8e0729f6a4f386aec613c71d8 Reviewed-on: https://go-review.googlesource.com/c/go/+/217777 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
440f7d6404 |
all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev:
|
|
|
|
0c703b37df |
internal/cpu,internal/bytealg: add support for riscv64
Based on riscv-go port. Updates #27532 Change-Id: Ia3aed521d4109e7b73f762c5a3cdacc7cdac430d Reviewed-on: https://go-review.googlesource.com/c/go/+/204635 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
b6245cef3c |
internal/bytealg: add SIMD byte count implementation for s390x
Add a 'single lane' SIMD implemementation of the single byte count function for use on machines that support the vector facility. This allows up to 16 bytes to be counted per loop iteration. We can probably improve performance further by adding more 'lanes' (i.e. counting more bytes in parallel) however this will increase the complexity of the function so I'm not sure it is worth doing yet. name old speed new speed delta pkg:strings goos:linux goarch:s390x CountByte/10 789MB/s ± 0% 1131MB/s ± 0% +43.44% (p=0.000 n=9+9) CountByte/32 936MB/s ± 0% 3236MB/s ± 0% +245.87% (p=0.000 n=8+9) CountByte/4096 1.06GB/s ± 0% 21.26GB/s ± 0% +1907.07% (p=0.000 n=10+10) CountByte/4194304 1.06GB/s ± 0% 20.54GB/s ± 0% +1838.50% (p=0.000 n=10+10) CountByte/67108864 1.06GB/s ± 0% 18.31GB/s ± 0% +1629.51% (p=0.000 n=10+10) pkg:bytes goos:linux goarch:s390x CountSingle/10 800MB/s ± 0% 986MB/s ± 0% +23.21% (p=0.000 n=9+10) CountSingle/32 925MB/s ± 0% 2744MB/s ± 0% +196.55% (p=0.000 n=9+10) CountSingle/4K 1.26GB/s ± 0% 19.44GB/s ± 0% +1445.59% (p=0.000 n=10+10) CountSingle/4M 1.26GB/s ± 0% 20.28GB/s ± 0% +1510.26% (p=0.000 n=8+10) CountSingle/64M 1.23GB/s ± 0% 17.78GB/s ± 0% +1350.67% (p=0.000 n=9+10) Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/199979 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
d0f10a6e68 |
runtime,internal/bytealg: optimize wasmZero, wasmMove, Compare
Coalesce set/get pairs into a tee. Change-Id: I88ccdcb148465615437bebf24145e941a037e0a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/200357 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
03ef105dae |
all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
|
|
07b4abd62e |
all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499. This CL removes amd64p32 support, which might be useful in the future if we implement the x32 ABI. It also removes the nacl bits in the toolchain, and some remaining nacl bits. Updates #30439 Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/200077 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
a1b1ba7daf |
internal/bytealg: (re)adding mips64x compare implementation
The original CL of mips64x compare function has been reverted due to wrong implement for little endian. Original CL: https://go-review.googlesource.com/c/go/+/196837 name old time/op new time/op delta BytesCompare/1 28.9ns ± 4% 22.1ns ± 0% -23.60% (p=0.000 n=9+8) BytesCompare/2 34.6ns ± 0% 23.1ns ± 0% -33.25% (p=0.000 n=8+10) BytesCompare/4 54.6ns ± 0% 40.8ns ± 0% -25.27% (p=0.000 n=8+8) BytesCompare/8 73.9ns ± 0% 49.1ns ± 0% -33.56% (p=0.000 n=8+8) BytesCompare/16 113ns ± 0% 24ns ± 0% -79.20% (p=0.000 n=9+9) BytesCompare/32 190ns ± 0% 26ns ± 0% -86.53% (p=0.000 n=10+10) BytesCompare/64 345ns ± 0% 44ns ± 0% -87.19% (p=0.000 n=10+8) BytesCompare/128 654ns ± 0% 52ns ± 0% -91.97% (p=0.000 n=9+8) BytesCompare/256 1.27µs ± 0% 0.07µs ± 0% -94.14% (p=0.001 n=8+9) BytesCompare/512 2.51µs ± 0% 0.12µs ± 0% -95.26% (p=0.000 n=9+10) BytesCompare/1024 4.99µs ± 0% 0.21µs ± 0% -95.85% (p=0.000 n=8+10) BytesCompare/2048 9.94µs ± 0% 0.38µs ± 0% -96.14% (p=0.000 n=8+8) CompareBytesEqual 105ns ± 0% 64ns ± 0% -39.43% (p=0.000 n=10+9) CompareBytesToNil 34.8ns ± 1% 38.6ns ± 3% +11.01% (p=0.000 n=10+10) CompareBytesEmpty 33.6ns ± 3% 36.6ns ± 0% +8.77% (p=0.000 n=10+8) CompareBytesIdentical 29.7ns ± 0% 40.5ns ± 1% +36.45% (p=0.000 n=10+8) CompareBytesSameLength 69.1ns ± 0% 51.8ns ± 0% -25.04% (p=0.000 n=10+9) CompareBytesDifferentLength 69.8ns ± 0% 52.5ns ± 0% -24.79% (p=0.000 n=10+8) CompareBytesBigUnaligned 5.15ms ± 0% 2.19ms ± 0% -57.59% (p=0.000 n=9+9) CompareBytesBig 5.28ms ± 0% 0.28ms ± 0% -94.64% (p=0.000 n=8+8) CompareBytesBigIdentical 29.7ns ± 0% 36.9ns ± 2% +24.11% (p=0.000 n=8+10) name old speed new speed delta CompareBytesBigUnaligned 204MB/s ± 0% 480MB/s ± 0% +135.77% (p=0.000 n=9+9) CompareBytesBig 198MB/s ± 0% 3704MB/s ± 0% +1765.97% (p=0.000 n=8+8) CompareBytesBigIdentical 35.3TB/s ± 0% 28.4TB/s ± 2% -19.44% (p=0.000 n=8+10) Fixes #34549 Change-Id: I2ef29f13cdd4229745ac2d018bb53c76f2ff1209 Reviewed-on: https://go-review.googlesource.com/c/go/+/197557 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
ae024d9daf |
Revert "internal/bytealg: add assembly implementation of Compare/CompareString on mips64x"
This reverts CL 196837 (commit
|
|
|
|
78baea836d |
internal/bytealg: add assembly implementation of Compare/CompareString on mips64x
name old time/op new time/op delta BytesCompare/1 28.9ns ± 4% 22.8ns ± 0% -21.23% (p=0.000 n=9+10) BytesCompare/2 34.6ns ± 0% 23.5ns ± 0% -32.01% (p=0.000 n=8+10) BytesCompare/4 54.6ns ± 0% 41.4ns ± 0% -24.18% (p=0.001 n=8+9) BytesCompare/8 73.9ns ± 0% 49.7ns ± 0% -32.75% (p=0.002 n=8+10) BytesCompare/16 113ns ± 0% 23ns ± 0% -79.56% (p=0.000 n=9+10) BytesCompare/32 190ns ± 0% 26ns ± 0% -86.53% (p=0.000 n=10+10) BytesCompare/64 345ns ± 0% 44ns ± 0% -87.19% (p=0.000 n=10+10) BytesCompare/128 654ns ± 0% 52ns ± 0% -91.97% (p=0.000 n=9+8) BytesCompare/256 1.27µs ± 0% 0.08µs ± 1% -94.10% (p=0.000 n=8+10) BytesCompare/512 2.51µs ± 0% 0.12µs ± 0% -95.26% (p=0.000 n=9+9) BytesCompare/1024 4.99µs ± 0% 0.21µs ± 1% -95.84% (p=0.000 n=8+10) BytesCompare/2048 9.94µs ± 0% 0.38µs ± 0% -96.13% (p=0.000 n=8+10) CompareBytesEqual 105ns ± 0% 64ns ± 0% -39.05% (p=0.000 n=10+9) CompareBytesToNil 34.8ns ± 1% 39.5ns ± 3% +13.48% (p=0.000 n=10+10) CompareBytesEmpty 33.6ns ± 3% 36.6ns ± 0% +8.77% (p=0.000 n=10+10) CompareBytesIdentical 29.7ns ± 0% 36.6ns ± 0% +23.23% (p=0.000 n=10+10) CompareBytesSameLength 69.1ns ± 0% 51.1ns ± 0% -26.05% (p=0.000 n=10+10) CompareBytesDifferentLength 69.8ns ± 0% 51.1ns ± 0% -26.79% (p=0.000 n=10+10) CompareBytesBigUnaligned 5.15ms ± 0% 2.18ms ± 0% -57.62% (p=0.000 n=9+9) CompareBytesBig 5.28ms ± 0% 0.28ms ± 0% -94.64% (p=0.000 n=8+10) CompareBytesBigIdentical 29.7ns ± 0% 36.8ns ± 0% +23.91% (p=0.000 n=8+8) name old speed new speed delta CompareBytesBigUnaligned 204MB/s ± 0% 480MB/s ± 0% +135.94% (p=0.000 n=9+9) CompareBytesBig 198MB/s ± 0% 3703MB/s ± 0% +1765.85% (p=0.000 n=8+10) CompareBytesBigIdentical 35.3TB/s ± 0% 28.5TB/s ± 0% -19.31% (p=0.000 n=8+8) Change-Id: I112d9de2324986fd65ed237a86b11856a1c0f4a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/196837 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
0efbd10157 |
all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
|
|
ca0c449a6b |
bytes, internal/bytealg: simplify Equal
The compiler has advanced enough that it is cheaper to convert to strings than to go through the assembly trampolines to call runtime.memequal. Simplify Equal accordingly, and cull dead code from bytealg. While we're here, simplify Equal's documentation. Fixes #31587 Change-Id: Ie721d33f9a6cbd86b1d873398b20e7882c2c63e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/173323 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
56517216c0 |
internal/bytealg: fix function reference in comments
There's no IndexShortStr func, refer to Index instead. Change-Id: I6923e7ad3e910e4b5fb0c07d6339ddfec4111f4f Reviewed-on: https://go-review.googlesource.com/c/go/+/170124 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
4a7cd9d9df |
internal/bytealg: simplify memchr for wasm
Get rid of an extra register R5 which just recalculated the value of R4. Reuse R4 instead. We also remove the casting of c to an unsigned char because the initial load of R0 is done with I32Load8U anyways. Also indent the code to make it more readable. name old time/op new time/op delta IndexRune 597ns ± 3% 580ns ± 3% -2.93% (p=0.002 n=10+10) IndexRuneLongString 634ns ± 4% 654ns ± 3% +3.07% (p=0.004 n=10+10) IndexRuneFastPath 57.6ns ± 3% 56.9ns ± 4% ~ (p=0.210 n=10+10) Index 104ns ± 3% 104ns ± 4% ~ (p=0.639 n=10+10) LastIndex 87.1ns ± 5% 85.7ns ± 3% ~ (p=0.171 n=10+10) IndexByte 34.4ns ± 4% 32.9ns ± 5% -4.28% (p=0.002 n=10+10) IndexHard1 21.6ms ± 1% 21.8ms ± 3% ~ (p=0.460 n=8+10) IndexHard2 21.6ms ± 2% 21.9ms ± 5% ~ (p=0.133 n=9+10) IndexHard3 21.8ms ± 3% 21.7ms ± 1% ~ (p=0.579 n=10+10) IndexHard4 21.6ms ± 1% 21.9ms ± 4% ~ (p=0.360 n=8+10) LastIndexHard1 25.1ms ± 2% 25.4ms ± 5% ~ (p=0.853 n=10+10) LastIndexHard2 25.3ms ± 6% 25.2ms ± 5% ~ (p=0.796 n=10+10) LastIndexHard3 25.3ms ± 4% 25.2ms ± 3% ~ (p=0.739 n=10+10) IndexTorture 130µs ± 3% 133µs ± 5% ~ (p=0.218 n=10+10) IndexAnyASCII/1:1 98.4ns ± 5% 96.6ns ± 5% ~ (p=0.054 n=10+10) IndexAnyASCII/1:2 109ns ± 4% 110ns ± 3% ~ (p=0.232 n=10+10) IndexAnyASCII/1:4 135ns ± 4% 134ns ± 3% ~ (p=0.671 n=10+10) IndexAnyASCII/1:8 184ns ± 4% 184ns ± 3% ~ (p=0.749 n=10+10) IndexAnyASCII/1:16 289ns ± 3% 281ns ± 3% -2.73% (p=0.001 n=9+10) IndexAnyASCII/16:1 322ns ± 3% 307ns ± 3% -4.71% (p=0.000 n=10+10) IndexAnyASCII/16:2 329ns ± 3% 320ns ± 3% -2.89% (p=0.008 n=10+10) IndexAnyASCII/16:4 353ns ± 3% 339ns ± 3% -3.91% (p=0.001 n=10+10) IndexAnyASCII/16:8 390ns ± 3% 374ns ± 3% -4.06% (p=0.000 n=10+10) IndexAnyASCII/16:16 471ns ± 4% 452ns ± 2% -4.22% (p=0.000 n=10+10) IndexAnyASCII/256:1 2.94µs ± 4% 2.91µs ± 2% ~ (p=0.424 n=10+10) IndexAnyASCII/256:2 2.92µs ± 3% 2.90µs ± 2% ~ (p=0.388 n=9+10) IndexAnyASCII/256:4 2.93µs ± 1% 2.90µs ± 1% -0.98% (p=0.036 n=8+9) IndexAnyASCII/256:8 3.03µs ± 5% 2.97µs ± 3% ~ (p=0.085 n=10+10) IndexAnyASCII/256:16 3.07µs ± 4% 3.01µs ± 1% -2.03% (p=0.003 n=10+9) IndexAnyASCII/4096:1 45.8µs ± 3% 45.9µs ± 2% ~ (p=0.905 n=10+9) IndexAnyASCII/4096:2 46.7µs ± 3% 46.2µs ± 3% ~ (p=0.190 n=10+10) IndexAnyASCII/4096:4 45.7µs ± 2% 46.4µs ± 3% +1.37% (p=0.022 n=9+10) IndexAnyASCII/4096:8 46.4µs ± 3% 46.0µs ± 2% ~ (p=0.436 n=10+10) IndexAnyASCII/4096:16 46.6µs ± 3% 46.7µs ± 2% ~ (p=0.971 n=10+10) IndexPeriodic/IndexPeriodic2 1.40ms ± 3% 1.40ms ± 2% ~ (p=0.853 n=10+10) IndexPeriodic/IndexPeriodic4 1.40ms ± 3% 1.40ms ± 3% ~ (p=0.579 n=10+10) IndexPeriodic/IndexPeriodic8 1.42ms ± 3% 1.39ms ± 2% -1.60% (p=0.029 n=10+10) IndexPeriodic/IndexPeriodic16 616µs ± 5% 583µs ± 5% -5.32% (p=0.001 n=10+10) IndexPeriodic/IndexPeriodic32 313µs ± 5% 301µs ± 2% -3.67% (p=0.002 n=10+10) IndexPeriodic/IndexPeriodic64 169µs ± 5% 164µs ± 5% -3.17% (p=0.023 n=10+10) NodeJS version - 10.2.1 Change-Id: I9a8268314b5652c4aeffc4c5c72d2fd1a384aa9e Reviewed-on: https://go-review.googlesource.com/c/go/+/169777 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
3cb1e9d98a |
internal/bytealg: add assembly implementation of Count/CountString on arm
Simple single-byte loop count for now, to be further improved in future CLs. Benchmark on linux/arm: name old time/op new time/op delta CountSingle/10-4 122ns ± 0% 87ns ± 1% -28.41% (p=0.000 n=7+10) CountSingle/32-4 242ns ± 0% 174ns ± 1% -28.25% (p=0.000 n=10+10) CountSingle/4K-4 24.2µs ± 1% 15.6µs ± 1% -35.42% (p=0.000 n=10+10) CountSingle/4M-4 29.6ms ± 1% 21.3ms ± 1% -28.09% (p=0.000 n=10+9) CountSingle/64M-4 562ms ± 0% 414ms ± 1% -26.23% (p=0.000 n=8+10) name old speed new speed delta CountSingle/10-4 81.7MB/s ± 1% 114.5MB/s ± 1% +40.07% (p=0.000 n=10+10) CountSingle/32-4 132MB/s ± 0% 184MB/s ± 1% +39.39% (p=0.000 n=10+9) CountSingle/4K-4 170MB/s ± 1% 263MB/s ± 1% +54.86% (p=0.000 n=10+10) CountSingle/4M-4 142MB/s ± 1% 197MB/s ± 1% +39.07% (p=0.000 n=10+9) CountSingle/64M-4 119MB/s ± 0% 162MB/s ± 1% +35.55% (p=0.000 n=8+10) Updates #29001 Change-Id: I42a268215a62044286ec32b548d8e4b86b9570ee Reviewed-on: https://go-review.googlesource.com/c/go/+/168319 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
3496ff1d19 |
internal/bytealg: share code for IndexByte functions on arm
Move the shared code of IndexByte and IndexByteString into indexbytebody. This will allow to implement optimizations (e.g. for #29001) in a single function. Change-Id: I1d550da8eb65f95e492a460a12058cc35b1162b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/167939 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
a734601bdf |
internal/bytealg: use word-wise comparison for Equal on arm
Follow CL 165338 and use word-wise comparison for aligned buffers in
Equal on arm, otherwise fall back to the current byte-wise comparison.
name old time/op new time/op delta
Equal/0-4 25.7ns ± 1% 23.5ns ± 1% -8.78% (p=0.000 n=10+10)
Equal/1-4 65.8ns ± 0% 60.1ns ± 1% -8.69% (p=0.000 n=10+9)
Equal/6-4 82.9ns ± 1% 86.7ns ± 0% +4.59% (p=0.000 n=10+10)
Equal/9-4 90.0ns ± 0% 101.0ns ± 0% +12.18% (p=0.000 n=9+10)
Equal/15-4 108ns ± 0% 119ns ± 0% +10.19% (p=0.000 n=8+8)
Equal/16-4 111ns ± 0% 82ns ± 0% -26.37% (p=0.000 n=8+10)
Equal/20-4 124ns ± 1% 87ns ± 1% -29.94% (p=0.000 n=9+10)
Equal/32-4 160ns ± 1% 97ns ± 1% -39.40% (p=0.000 n=10+10)
Equal/4K-4 14.0µs ± 0% 3.6µs ± 1% -74.57% (p=0.000 n=9+10)
Equal/4M-4 12.8ms ± 1% 3.2ms ± 0% -74.93% (p=0.000 n=9+9)
Equal/64M-4 204ms ± 1% 51ms ± 0% -74.78% (p=0.000 n=10+10)
EqualPort/1-4 47.0ns ± 1% 46.8ns ± 0% -0.40% (p=0.015 n=10+6)
EqualPort/6-4 82.6ns ± 1% 81.9ns ± 1% -0.87% (p=0.002 n=10+10)
EqualPort/32-4 232ns ± 0% 232ns ± 0% ~ (p=0.496 n=8+10)
EqualPort/4K-4 29.0µs ± 1% 29.0µs ± 1% ~ (p=0.604 n=9+10)
EqualPort/4M-4 24.0ms ± 1% 23.8ms ± 0% -0.65% (p=0.001 n=9+9)
EqualPort/64M-4 383ms ± 1% 382ms ± 0% ~ (p=0.218 n=10+10)
CompareBytesEqual-4 61.2ns ± 1% 61.0ns ± 1% ~ (p=0.539 n=10+10)
name old speed new speed delta
Equal/1-4 15.2MB/s ± 0% 16.6MB/s ± 1% +9.52% (p=0.000 n=10+9)
Equal/6-4 72.4MB/s ± 1% 69.2MB/s ± 0% -4.40% (p=0.000 n=10+10)
Equal/9-4 100MB/s ± 0% 89MB/s ± 0% -11.40% (p=0.000 n=9+10)
Equal/15-4 138MB/s ± 1% 125MB/s ± 1% -9.41% (p=0.000 n=10+10)
Equal/16-4 144MB/s ± 1% 196MB/s ± 0% +36.41% (p=0.000 n=10+10)
Equal/20-4 162MB/s ± 1% 231MB/s ± 1% +42.98% (p=0.000 n=9+10)
Equal/32-4 200MB/s ± 1% 331MB/s ± 1% +65.64% (p=0.000 n=10+10)
Equal/4K-4 292MB/s ± 0% 1149MB/s ± 1% +293.19% (p=0.000 n=9+10)
Equal/4M-4 328MB/s ± 1% 1307MB/s ± 0% +298.87% (p=0.000 n=9+9)
Equal/64M-4 329MB/s ± 1% 1306MB/s ± 0% +296.56% (p=0.000 n=10+10)
EqualPort/1-4 21.3MB/s ± 1% 21.4MB/s ± 0% +0.42% (p=0.002 n=10+9)
EqualPort/6-4 72.6MB/s ± 1% 73.2MB/s ± 1% +0.87% (p=0.003 n=10+10)
EqualPort/32-4 138MB/s ± 0% 138MB/s ± 0% ~ (p=0.953 n=9+10)
EqualPort/4K-4 141MB/s ± 1% 141MB/s ± 1% ~ (p=0.382 n=10+10)
EqualPort/4M-4 175MB/s ± 1% 176MB/s ± 0% +0.65% (p=0.001 n=9+9)
EqualPort/64M-4 175MB/s ± 1% 176MB/s ± 0% ~ (p=0.225 n=10+10)
The 5-12% decrease in performance on Equal/{6,9,15} are due to the
benchmarks splitting the bytes buffer in half. The b argument to Equal
then ends up being unaligned and thus the fast word-wise compare doesn't
kick in.
Updates #29001
Change-Id: I73be501c18e67d211ed19da7771b4f254254e609
Reviewed-on: https://go-review.googlesource.com/c/go/+/167557
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
14a58d65e3 |
internal/bytealg: share code for equal functions on arm
Move the shared code into byteal.memeqbody. This will allow to implement optimizations (e.g. for #29001) in a single function. Change-Id: Iaa34ddeb7068d92c35a8b4e581b7fd92da56535c Reviewed-on: https://go-review.googlesource.com/c/go/+/166677 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
029a5af6a1 |
internal/bytealg: use word-wise comparison for Compare on arm
Use word-wise comparison for aligned buffers, otherwise fall back to the current byte-wise comparison. name old time/op new time/op delta BytesCompare/1-4 41.3ns ± 0% 36.4ns ± 1% -11.73% (p=0.008 n=5+5) BytesCompare/2-4 39.5ns ± 0% 39.5ns ± 1% ~ (p=0.960 n=5+5) BytesCompare/4-4 45.3ns ± 0% 41.0ns ± 1% -9.40% (p=0.008 n=5+5) BytesCompare/8-4 64.8ns ± 1% 44.7ns ± 0% -31.12% (p=0.008 n=5+5) BytesCompare/16-4 86.3ns ± 0% 55.1ns ± 0% -36.21% (p=0.008 n=5+5) BytesCompare/32-4 135ns ± 0% 70ns ± 1% -47.73% (p=0.008 n=5+5) BytesCompare/64-4 231ns ± 1% 99ns ± 0% -57.27% (p=0.016 n=5+4) BytesCompare/128-4 424ns ± 0% 147ns ± 0% -65.31% (p=0.000 n=4+5) BytesCompare/256-4 810ns ± 0% 243ns ± 0% -69.96% (p=0.008 n=5+5) BytesCompare/512-4 1.59µs ± 0% 0.44µs ± 0% -72.43% (p=0.008 n=5+5) BytesCompare/1024-4 3.14µs ± 1% 0.83µs ± 1% -73.56% (p=0.008 n=5+5) BytesCompare/2048-4 6.23µs ± 0% 1.61µs ± 1% -74.21% (p=0.008 n=5+5) CompareBytesEqual-4 79.4ns ± 0% 52.2ns ± 0% -34.23% (p=0.008 n=5+5) CompareBytesToNil-4 31.0ns ± 0% 30.3ns ± 0% -2.32% (p=0.008 n=5+5) CompareBytesEmpty-4 25.7ns ± 0% 25.7ns ± 0% ~ (p=0.556 n=4+5) CompareBytesIdentical-4 25.7ns ± 0% 25.7ns ± 0% ~ (p=1.000 n=5+5) CompareBytesSameLength-4 49.1ns ± 0% 48.5ns ± 0% -1.26% (p=0.008 n=5+5) CompareBytesDifferentLength-4 49.8ns ± 1% 49.3ns ± 0% -1.08% (p=0.008 n=5+5) CompareBytesBigUnaligned-4 5.71ms ± 1% 5.68ms ± 1% ~ (p=0.222 n=5+5) CompareBytesBig-4 4.95ms ± 0% 2.28ms ± 1% -53.81% (p=0.008 n=5+5) CompareBytesBigIdentical-4 27.2ns ± 1% 27.3ns ± 1% ~ (p=0.310 n=5+5) name old speed new speed delta CompareBytesBigUnaligned-4 184MB/s ± 1% 185MB/s ± 1% ~ (p=0.222 n=5+5) CompareBytesBig-4 212MB/s ± 0% 459MB/s ± 1% +116.51% (p=0.008 n=5+5) CompareBytesBigIdentical-4 38.5TB/s ± 0% 38.4TB/s ± 1% ~ (p=0.421 n=5+5) Also, this reduces time for TestCompareBytes by about 20 sec on a linux-arm builder via gomote. Updates #29001 Change-Id: I25f148739b9ccb7cb1fc97b3d8763549b0a66c16 Reviewed-on: https://go-review.googlesource.com/c/go/+/165338 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
c0d82bb0ec |
all: rename WebAssembly instructions according to spec changes
The names of some instructions have been updated in the WebAssembly
specification to be more consistent, see
|
|
|
|
ef7ce57ac2 |
internal/bytealg, runtime: provide linknames for pushed symbols
The internal/bytealg package defines several symbols in the runtime, bytes, and strings packages in assembly, and the runtime package defines symbols in reflect and sync/atomic. Currently, there's no corresponding Go prototype for these symbols in the defining package. We're going to start depending on Go prototypes in the same package as their assembly definitions in order to provide ABI wrappers. Plus, these are good documentation and colocate type information with definitions, which could be useful for vet if it learned a little about linkname. This CL adds linknamed Go prototypes for all pushed symbols in internal/bytealg and runtime. For #27539. Change-Id: I9b0c12d935a75bb6af46b6761180d451c00f11b8 Reviewed-on: https://go-review.googlesource.com/c/146820 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
|
|
|
ad4a58e315 |
strings,bytes: use inlineable function trampolines instead of linkname
Cleans things up quite a bit. There's still a few more, like runtime.cmpstring, which might also be worth fixing. Change-Id: Ide18dd621efc129cc686db223f47fa0b044b5580 Reviewed-on: https://go-review.googlesource.com/c/148578 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
6994731ec2 |
internal/bytealg: improve asm for memequal on ppc64x
This includes two changes to the memequal function. Previously the asm implementation on ppc64x for Equal called the internal function memequal using a BL, whereas the other asm implementations for bytes functions on ppc64x used BR. The BR is preferred because the BL causes the calling function to stack a frame. This changes Equal so it uses BR and is consistent with the others. This also uses vsx instructions where possible to improve performance of the compares for sizes over 32. Here are results from the sizes affected: Equal/32 8.40ns ± 0% 7.66ns ± 0% -8.81% (p=0.029 n=4+4) Equal/4K 193ns ± 0% 144ns ± 0% -25.39% (p=0.029 n=4+4) Equal/4M 346µs ± 0% 277µs ± 0% -20.08% (p=0.029 n=4+4) Equal/64M 7.66ms ± 1% 7.27ms ± 0% -5.10% (p=0.029 n=4+4) Change-Id: Ib6ee2cdc3e5d146e2705e3338858b8e965d25420 Reviewed-on: https://go-review.googlesource.com/c/143060 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
b8ac64a581 |
all: this big patch remove whitespace from assembly files
Don't worry, this patch just remove trailing whitespace from assembly files, and does not touch any logical changes. Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d Reviewed-on: https://go-review.googlesource.com/c/113595 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
23f7554194 |
internal/bytealg: improve performance of IndexByte for ppc64x
Use addi+lvx instruction fusion and remove register dependencies in the main loop to improve performance. benchmark old ns/op new ns/op delta BenchmarkIndexByte/10-192 9.86 9.75 -1.12% BenchmarkIndexByte/32-192 15.6 11.2 -28.21% BenchmarkIndexByte/4K-192 155 97.6 -37.03% BenchmarkIndexByte/4M-192 171790 129650 -24.53% BenchmarkIndexByte/64M-192 6530982 5018424 -23.16% benchmark old MB/s new MB/s speedup BenchmarkIndexByte/10-192 1013.72 1025.76 1.01x BenchmarkIndexByte/32-192 2049.47 2868.01 1.40x BenchmarkIndexByte/4K-192 26422.69 41975.67 1.59x BenchmarkIndexByte/4M-192 24415.17 32350.74 1.33x BenchmarkIndexByte/64M-192 10275.46 13372.50 1.30x Change-Id: Iedf17f01f374d58e85dcd6a972209bfcb7eb6063 Reviewed-on: https://go-review.googlesource.com/137415 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
de28555c0b |
internal/bytealg: optimize Equal on arm64
Currently the 16-byte loop chunk16_loop is implemented with NEON instructions LD1, VMOV and VCMEQ. Using scalar instructions LDP and CMP to achieve this loop can reduce the number of clock cycles. For cases where the length of strings are between 4 to 15 bytes, loading the last 8 or 4 bytes at a time to reduce the number of comparisons. Benchmarks: name old time/op new time/op delta Equal/0-8 5.51ns ± 0% 5.84ns ±14% ~ (p=0.246 n=7+8) Equal/1-8 10.5ns ± 0% 10.5ns ± 0% ~ (all equal) Equal/6-8 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=8+8) Equal/9-8 13.5ns ± 0% 12.5ns ± 0% -7.41% (p=0.000 n=8+8) Equal/15-8 15.5ns ± 0% 12.5ns ± 0% -19.35% (p=0.000 n=8+8) Equal/16-8 14.0ns ± 0% 13.0ns ± 0% -7.14% (p=0.000 n=8+8) Equal/20-8 16.5ns ± 0% 16.0ns ± 0% -3.03% (p=0.000 n=8+8) Equal/32-8 16.5ns ± 0% 15.3ns ± 0% -7.27% (p=0.000 n=8+8) Equal/4K-8 552ns ± 0% 553ns ± 0% ~ (p=0.315 n=8+8) Equal/4M-8 1.13ms ±23% 1.20ms ±27% ~ (p=0.442 n=8+8) Equal/64M-8 32.9ms ± 0% 32.6ms ± 0% -1.15% (p=0.000 n=8+8) CompareBytesEqual-8 12.0ns ± 0% 12.0ns ± 0% ~ (all equal) Change-Id: If317ecdcc98e31883d37fd7d42b113b548c5bd2a Reviewed-on: https://go-review.googlesource.com/112496 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> |