Add a 'single lane' SIMD implemementation of the single byte count
function for use on machines that support the vector facility. This
allows up to 16 bytes to be counted per loop iteration.
We can probably improve performance further by adding more 'lanes'
(i.e. counting more bytes in parallel) however this will increase
the complexity of the function so I'm not sure it is worth doing
yet.
name old speed new speed delta
pkg:strings goos:linux goarch:s390x
CountByte/10 789MB/s ± 0% 1131MB/s ± 0% +43.44% (p=0.000 n=9+9)
CountByte/32 936MB/s ± 0% 3236MB/s ± 0% +245.87% (p=0.000 n=8+9)
CountByte/4096 1.06GB/s ± 0% 21.26GB/s ± 0% +1907.07% (p=0.000 n=10+10)
CountByte/4194304 1.06GB/s ± 0% 20.54GB/s ± 0% +1838.50% (p=0.000 n=10+10)
CountByte/67108864 1.06GB/s ± 0% 18.31GB/s ± 0% +1629.51% (p=0.000 n=10+10)
pkg:bytes goos:linux goarch:s390x
CountSingle/10 800MB/s ± 0% 986MB/s ± 0% +23.21% (p=0.000 n=9+10)
CountSingle/32 925MB/s ± 0% 2744MB/s ± 0% +196.55% (p=0.000 n=9+10)
CountSingle/4K 1.26GB/s ± 0% 19.44GB/s ± 0% +1445.59% (p=0.000 n=10+10)
CountSingle/4M 1.26GB/s ± 0% 20.28GB/s ± 0% +1510.26% (p=0.000 n=8+10)
CountSingle/64M 1.23GB/s ± 0% 17.78GB/s ± 0% +1350.67% (p=0.000 n=9+10)
Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/199979
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
The compiler has advanced enough that it is cheaper
to convert to strings than to go through the assembly
trampolines to call runtime.memequal.
Simplify Equal accordingly, and cull dead code from bytealg.
While we're here, simplify Equal's documentation.
Fixes#31587
Change-Id: Ie721d33f9a6cbd86b1d873398b20e7882c2c63e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/173323
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Move the shared code of IndexByte and IndexByteString into
indexbytebody. This will allow to implement optimizations (e.g.
for #29001) in a single function.
Change-Id: I1d550da8eb65f95e492a460a12058cc35b1162b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/167939
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Move the shared code into byteal.memeqbody. This will allow to implement
optimizations (e.g. for #29001) in a single function.
Change-Id: Iaa34ddeb7068d92c35a8b4e581b7fd92da56535c
Reviewed-on: https://go-review.googlesource.com/c/go/+/166677
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The names of some instructions have been updated in the WebAssembly
specification to be more consistent, see
994591e51c.
This change to the spec is possible because it is still in a draft
state.
Go's support for WebAssembly is still experimental and thus excempt from
the compatibility promise. Being consistent with the spec should
warrant this breaking change to the assembly instruction names.
Change-Id: Iafb8b18ee7f55dd0e23c6c7824aa1fad43117ef1
Reviewed-on: https://go-review.googlesource.com/c/153797
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The internal/bytealg package defines several symbols in the runtime,
bytes, and strings packages in assembly, and the runtime package
defines symbols in reflect and sync/atomic. Currently, there's no
corresponding Go prototype for these symbols in the defining package.
We're going to start depending on Go prototypes in the same package as
their assembly definitions in order to provide ABI wrappers. Plus,
these are good documentation and colocate type information with
definitions, which could be useful for vet if it learned a little
about linkname.
This CL adds linknamed Go prototypes for all pushed symbols in
internal/bytealg and runtime.
For #27539.
Change-Id: I9b0c12d935a75bb6af46b6761180d451c00f11b8
Reviewed-on: https://go-review.googlesource.com/c/146820
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Cleans things up quite a bit.
There's still a few more, like runtime.cmpstring, which might also
be worth fixing.
Change-Id: Ide18dd621efc129cc686db223f47fa0b044b5580
Reviewed-on: https://go-review.googlesource.com/c/148578
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This includes two changes to the memequal function.
Previously the asm implementation on ppc64x for Equal called the internal
function memequal using a BL, whereas the other asm implementations for
bytes functions on ppc64x used BR. The BR is preferred because the BL
causes the calling function to stack a frame. This changes Equal so it
uses BR and is consistent with the others.
This also uses vsx instructions where possible to improve performance
of the compares for sizes over 32.
Here are results from the sizes affected:
Equal/32 8.40ns ± 0% 7.66ns ± 0% -8.81% (p=0.029 n=4+4)
Equal/4K 193ns ± 0% 144ns ± 0% -25.39% (p=0.029 n=4+4)
Equal/4M 346µs ± 0% 277µs ± 0% -20.08% (p=0.029 n=4+4)
Equal/64M 7.66ms ± 1% 7.27ms ± 0% -5.10% (p=0.029 n=4+4)
Change-Id: Ib6ee2cdc3e5d146e2705e3338858b8e965d25420
Reviewed-on: https://go-review.googlesource.com/c/143060
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Don't worry, this patch just remove trailing whitespace from
assembly files, and does not touch any logical changes.
Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d
Reviewed-on: https://go-review.googlesource.com/c/113595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This adds an asm implementation for the Count function in ppc64x.
The Go code that manipulates a byte at a time is especially
inefficient on ppc64x, so an asm implementation is a significant
improvement.
bytes:
name old time/op new time/op delta
CountSingle/10-8 23.1ns ± 0% 18.6ns ± 0% -19.48% (p=1.000 n=1+1)
CountSingle/32-8 60.4ns ± 0% 19.0ns ± 0% -68.54% (p=1.000 n=1+1)
CountSingle/4K-8 7.29µs ± 0% 0.45µs ± 0% -93.80% (p=1.000 n=1+1)
CountSingle/4M-8 7.49ms ± 0% 0.45ms ± 0% -93.97% (p=1.000 n=1+1)
CountSingle/64M-8 127ms ± 0% 9ms ± 0% -92.53% (p=1.000 n=1+1)
html:
name old time/op new time/op delta
Escape-8 57.5µs ± 0% 36.1µs ± 0% -37.13% (p=1.000 n=1+1)
EscapeNone-8 20.0µs ± 0% 2.0µs ± 0% -90.14% (p=1.000 n=1+1)
Change-Id: Iadbf422c0e9a37b47d2d95fb8c778420f3aabb58
Reviewed-on: https://go-review.googlesource.com/131695
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
This makes the runtime.support_sse2 variable unused
so it is removed in this CL too.
Change-Id: Ia8b9ffee7ac97128179f74ef244b10315e44c234
Reviewed-on: https://go-review.googlesource.com/131455
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add an "offset_" prefix to all cpu feature variable offset constants to
signify that they are not boolean cpu feature variables.
Remove _ from offset constant names.
Change-Id: I6e22a79ebcbe6e2ae54c4ac8764f9260bb3223ff
Reviewed-on: https://go-review.googlesource.com/131215
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Functions exported on behalf of other packages need to have their
argument stack maps specified explicitly. They don't get an implicit
map because they are not in the local package, and if they get defer'd
they need argument maps.
Fixes#24419
Change-Id: I35b7d8b4a03d4770ba88699e1007cb3fcb5397a9
Reviewed-on: https://go-review.googlesource.com/122676
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The existing implementation of bytes.Compare on s390x doesn't work properly for slices longer
than 256 elements. This change fixes that. Added tests for long strings and slices of bytes.
Fixes#26114
Change-Id: If6d8b68ee6dbcf99a24f867a1d3438b1f208954f
Reviewed-on: https://go-review.googlesource.com/121495
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This commit enables vet/all for the js/wasm architecture. It got
skipped initially because the codebase did not fully compile yet
for js/wasm, which made vet/all fail.
startTimer and stopTimer are not needed in the syscall package.
Removed their assembly code since their Go stubs were already gone.
Change-Id: Icaeb6d903876e51ceb1edff7631f715a98c28696
Reviewed-on: https://go-review.googlesource.com/118657
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This commit adds the wasm architecture to the internal/bytealg package.
Some parts of the assembly code have been extracted from WebAssembly
bytecode generated with Emscripten (which uses musl libc).
Updates #18892
Change-Id: Iba7f7158356b816c9ad03ca9223903a41a024da6
Reviewed-on: https://go-review.googlesource.com/103915
Reviewed-by: Keith Randall <khr@golang.org>
This reverts commit bfa8b6f8ff.
Reason for revert: This depends on another CL which is not yet submitted.
Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07
Reviewed-on: https://go-review.googlesource.com/101496
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Missed removing the argument loading from the indexbody function.
Change-Id: Ia1391231fc99771d00410a09fe80a09f08ceed02
Reviewed-on: https://go-review.googlesource.com/98575
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.
Once this is in, the next step is to move the other functions
(Compare, Equal, ...).
Update #19792
This change seems complicated enough that we might just declare
"not worth it" and abandon. Opinions welcome.
The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.
The wrapper functions have been cleaned up as they are now actually
checked by vet.
Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
Reviewed-on: https://go-review.googlesource.com/98016
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>