Commit Graph

46 Commits

Author SHA1 Message Date
Josh Bleecher Snyder e3f2e9ac4e internal/bytealg: fix riscv64 offset names
Vet caught that these were incorrect.

Updates #37022

Change-Id: I7b5cd8032ea95eb8e0729f6a4f386aec613c71d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/217777
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-04 23:15:01 +00:00
Ville Skyttä 440f7d6404 all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev: 778c5d2131
GitHub-Pull-Request: golang/go#35624
Reviewed-on: https://go-review.googlesource.com/c/go/+/207421
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-15 21:04:43 +00:00
Joel Sing 0c703b37df internal/cpu,internal/bytealg: add support for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia3aed521d4109e7b73f762c5a3cdacc7cdac430d
Reviewed-on: https://go-review.googlesource.com/c/go/+/204635
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-11 22:13:42 +00:00
Michael Munday b6245cef3c internal/bytealg: add SIMD byte count implementation for s390x
Add a 'single lane' SIMD implemementation of the single byte count
function for use on machines that support the vector facility. This
allows up to 16 bytes to be counted per loop iteration.

We can probably improve performance further by adding more 'lanes'
(i.e. counting more bytes in parallel) however this will increase
the complexity of the function so I'm not sure it is worth doing
yet.

name                old speed      new speed       delta
pkg:strings goos:linux goarch:s390x
CountByte/10         789MB/s ± 0%   1131MB/s ± 0%    +43.44%  (p=0.000 n=9+9)
CountByte/32         936MB/s ± 0%   3236MB/s ± 0%   +245.87%  (p=0.000 n=8+9)
CountByte/4096      1.06GB/s ± 0%  21.26GB/s ± 0%  +1907.07%  (p=0.000 n=10+10)
CountByte/4194304   1.06GB/s ± 0%  20.54GB/s ± 0%  +1838.50%  (p=0.000 n=10+10)
CountByte/67108864  1.06GB/s ± 0%  18.31GB/s ± 0%  +1629.51%  (p=0.000 n=10+10)
pkg:bytes goos:linux goarch:s390x
CountSingle/10       800MB/s ± 0%    986MB/s ± 0%    +23.21%  (p=0.000 n=9+10)
CountSingle/32       925MB/s ± 0%   2744MB/s ± 0%   +196.55%  (p=0.000 n=9+10)
CountSingle/4K      1.26GB/s ± 0%  19.44GB/s ± 0%  +1445.59%  (p=0.000 n=10+10)
CountSingle/4M      1.26GB/s ± 0%  20.28GB/s ± 0%  +1510.26%  (p=0.000 n=8+10)
CountSingle/64M     1.23GB/s ± 0%  17.78GB/s ± 0%  +1350.67%  (p=0.000 n=9+10)

Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/199979
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-04 22:06:16 +00:00
Agniva De Sarker d0f10a6e68 runtime,internal/bytealg: optimize wasmZero, wasmMove, Compare
Coalesce set/get pairs into a tee.

Change-Id: I88ccdcb148465615437bebf24145e941a037e0a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/200357
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-11 04:00:35 +00:00
Brad Fitzpatrick 03ef105dae all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
        one of my grep patterns when splitting the original CL up into
        two parts.

This one might also have interesting stuff to resurrect for any future
x32 ABI support.

Updates #30439

Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-10 22:38:38 +00:00
Brad Fitzpatrick 07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Meng Zhuo a1b1ba7daf internal/bytealg: (re)adding mips64x compare implementation
The original CL of mips64x compare function has been reverted due to wrong
implement for little endian.
Original CL: https://go-review.googlesource.com/c/go/+/196837

name                         old time/op    new time/op    delta
BytesCompare/1                 28.9ns ± 4%    22.1ns ± 0%    -23.60%  (p=0.000 n=9+8)
BytesCompare/2                 34.6ns ± 0%    23.1ns ± 0%    -33.25%  (p=0.000 n=8+10)
BytesCompare/4                 54.6ns ± 0%    40.8ns ± 0%    -25.27%  (p=0.000 n=8+8)
BytesCompare/8                 73.9ns ± 0%    49.1ns ± 0%    -33.56%  (p=0.000 n=8+8)
BytesCompare/16                 113ns ± 0%      24ns ± 0%    -79.20%  (p=0.000 n=9+9)
BytesCompare/32                 190ns ± 0%      26ns ± 0%    -86.53%  (p=0.000 n=10+10)
BytesCompare/64                 345ns ± 0%      44ns ± 0%    -87.19%  (p=0.000 n=10+8)
BytesCompare/128                654ns ± 0%      52ns ± 0%    -91.97%  (p=0.000 n=9+8)
BytesCompare/256               1.27µs ± 0%    0.07µs ± 0%    -94.14%  (p=0.001 n=8+9)
BytesCompare/512               2.51µs ± 0%    0.12µs ± 0%    -95.26%  (p=0.000 n=9+10)
BytesCompare/1024              4.99µs ± 0%    0.21µs ± 0%    -95.85%  (p=0.000 n=8+10)
BytesCompare/2048              9.94µs ± 0%    0.38µs ± 0%    -96.14%  (p=0.000 n=8+8)
CompareBytesEqual               105ns ± 0%      64ns ± 0%    -39.43%  (p=0.000 n=10+9)
CompareBytesToNil              34.8ns ± 1%    38.6ns ± 3%    +11.01%  (p=0.000 n=10+10)
CompareBytesEmpty              33.6ns ± 3%    36.6ns ± 0%     +8.77%  (p=0.000 n=10+8)
CompareBytesIdentical          29.7ns ± 0%    40.5ns ± 1%    +36.45%  (p=0.000 n=10+8)
CompareBytesSameLength         69.1ns ± 0%    51.8ns ± 0%    -25.04%  (p=0.000 n=10+9)
CompareBytesDifferentLength    69.8ns ± 0%    52.5ns ± 0%    -24.79%  (p=0.000 n=10+8)
CompareBytesBigUnaligned       5.15ms ± 0%    2.19ms ± 0%    -57.59%  (p=0.000 n=9+9)
CompareBytesBig                5.28ms ± 0%    0.28ms ± 0%    -94.64%  (p=0.000 n=8+8)
CompareBytesBigIdentical       29.7ns ± 0%    36.9ns ± 2%    +24.11%  (p=0.000 n=8+10)

name                         old speed      new speed      delta
CompareBytesBigUnaligned      204MB/s ± 0%   480MB/s ± 0%   +135.77%  (p=0.000 n=9+9)
CompareBytesBig               198MB/s ± 0%  3704MB/s ± 0%  +1765.97%  (p=0.000 n=8+8)
CompareBytesBigIdentical     35.3TB/s ± 0%  28.4TB/s ± 2%    -19.44%  (p=0.000 n=8+10)

Fixes #34549

Change-Id: I2ef29f13cdd4229745ac2d018bb53c76f2ff1209
Reviewed-on: https://go-review.googlesource.com/c/go/+/197557
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-03 20:32:21 +00:00
Brad Fitzpatrick ae024d9daf Revert "internal/bytealg: add assembly implementation of Compare/CompareString on mips64x"
This reverts CL 196837 (commit 78baea836d).

Reason for revert: broke the mips64le build.

Change-Id: I531da60d098cb227659c9977a3d229474325b0a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/197538
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-26 15:54:41 +00:00
Meng Zhuo 78baea836d internal/bytealg: add assembly implementation of Compare/CompareString on mips64x
name                         old time/op    new time/op    delta
BytesCompare/1                 28.9ns ± 4%    22.8ns ± 0%    -21.23%  (p=0.000 n=9+10)
BytesCompare/2                 34.6ns ± 0%    23.5ns ± 0%    -32.01%  (p=0.000 n=8+10)
BytesCompare/4                 54.6ns ± 0%    41.4ns ± 0%    -24.18%  (p=0.001 n=8+9)
BytesCompare/8                 73.9ns ± 0%    49.7ns ± 0%    -32.75%  (p=0.002 n=8+10)
BytesCompare/16                 113ns ± 0%      23ns ± 0%    -79.56%  (p=0.000 n=9+10)
BytesCompare/32                 190ns ± 0%      26ns ± 0%    -86.53%  (p=0.000 n=10+10)
BytesCompare/64                 345ns ± 0%      44ns ± 0%    -87.19%  (p=0.000 n=10+10)
BytesCompare/128                654ns ± 0%      52ns ± 0%    -91.97%  (p=0.000 n=9+8)
BytesCompare/256               1.27µs ± 0%    0.08µs ± 1%    -94.10%  (p=0.000 n=8+10)
BytesCompare/512               2.51µs ± 0%    0.12µs ± 0%    -95.26%  (p=0.000 n=9+9)
BytesCompare/1024              4.99µs ± 0%    0.21µs ± 1%    -95.84%  (p=0.000 n=8+10)
BytesCompare/2048              9.94µs ± 0%    0.38µs ± 0%    -96.13%  (p=0.000 n=8+10)
CompareBytesEqual               105ns ± 0%      64ns ± 0%    -39.05%  (p=0.000 n=10+9)
CompareBytesToNil              34.8ns ± 1%    39.5ns ± 3%    +13.48%  (p=0.000 n=10+10)
CompareBytesEmpty              33.6ns ± 3%    36.6ns ± 0%     +8.77%  (p=0.000 n=10+10)
CompareBytesIdentical          29.7ns ± 0%    36.6ns ± 0%    +23.23%  (p=0.000 n=10+10)
CompareBytesSameLength         69.1ns ± 0%    51.1ns ± 0%    -26.05%  (p=0.000 n=10+10)
CompareBytesDifferentLength    69.8ns ± 0%    51.1ns ± 0%    -26.79%  (p=0.000 n=10+10)
CompareBytesBigUnaligned       5.15ms ± 0%    2.18ms ± 0%    -57.62%  (p=0.000 n=9+9)
CompareBytesBig                5.28ms ± 0%    0.28ms ± 0%    -94.64%  (p=0.000 n=8+10)
CompareBytesBigIdentical       29.7ns ± 0%    36.8ns ± 0%    +23.91%  (p=0.000 n=8+8)

name                         old speed      new speed      delta
CompareBytesBigUnaligned      204MB/s ± 0%   480MB/s ± 0%   +135.94%  (p=0.000 n=9+9)
CompareBytesBig               198MB/s ± 0%  3703MB/s ± 0%  +1765.85%  (p=0.000 n=8+10)
CompareBytesBigIdentical     35.3TB/s ± 0%  28.5TB/s ± 0%    -19.31%  (p=0.000 n=8+8)

Change-Id: I112d9de2324986fd65ed237a86b11856a1c0f4a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/196837
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-25 22:03:38 +00:00
Ainar Garipov 0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
Josh Bleecher Snyder ca0c449a6b bytes, internal/bytealg: simplify Equal
The compiler has advanced enough that it is cheaper
to convert to strings than to go through the assembly
trampolines to call runtime.memequal.

Simplify Equal accordingly, and cull dead code from bytealg.

While we're here, simplify Equal's documentation.

Fixes #31587

Change-Id: Ie721d33f9a6cbd86b1d873398b20e7882c2c63e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/173323
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-24 00:56:36 +00:00
Tobias Klauser 56517216c0 internal/bytealg: fix function reference in comments
There's no IndexShortStr func, refer to Index instead.

Change-Id: I6923e7ad3e910e4b5fb0c07d6339ddfec4111f4f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170124
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-02 05:45:33 +00:00
Agniva De Sarker 4a7cd9d9df internal/bytealg: simplify memchr for wasm
Get rid of an extra register R5 which just recalculated the value of R4.
Reuse R4 instead.

We also remove the casting of c to an unsigned char because the initial
load of R0 is done with I32Load8U anyways.

Also indent the code to make it more readable.

name                           old time/op  new time/op  delta
IndexRune                       597ns ± 3%   580ns ± 3%  -2.93%  (p=0.002 n=10+10)
IndexRuneLongString             634ns ± 4%   654ns ± 3%  +3.07%  (p=0.004 n=10+10)
IndexRuneFastPath              57.6ns ± 3%  56.9ns ± 4%    ~     (p=0.210 n=10+10)
Index                           104ns ± 3%   104ns ± 4%    ~     (p=0.639 n=10+10)
LastIndex                      87.1ns ± 5%  85.7ns ± 3%    ~     (p=0.171 n=10+10)
IndexByte                      34.4ns ± 4%  32.9ns ± 5%  -4.28%  (p=0.002 n=10+10)
IndexHard1                     21.6ms ± 1%  21.8ms ± 3%    ~     (p=0.460 n=8+10)
IndexHard2                     21.6ms ± 2%  21.9ms ± 5%    ~     (p=0.133 n=9+10)
IndexHard3                     21.8ms ± 3%  21.7ms ± 1%    ~     (p=0.579 n=10+10)
IndexHard4                     21.6ms ± 1%  21.9ms ± 4%    ~     (p=0.360 n=8+10)
LastIndexHard1                 25.1ms ± 2%  25.4ms ± 5%    ~     (p=0.853 n=10+10)
LastIndexHard2                 25.3ms ± 6%  25.2ms ± 5%    ~     (p=0.796 n=10+10)
LastIndexHard3                 25.3ms ± 4%  25.2ms ± 3%    ~     (p=0.739 n=10+10)
IndexTorture                    130µs ± 3%   133µs ± 5%    ~     (p=0.218 n=10+10)
IndexAnyASCII/1:1              98.4ns ± 5%  96.6ns ± 5%    ~     (p=0.054 n=10+10)
IndexAnyASCII/1:2               109ns ± 4%   110ns ± 3%    ~     (p=0.232 n=10+10)
IndexAnyASCII/1:4               135ns ± 4%   134ns ± 3%    ~     (p=0.671 n=10+10)
IndexAnyASCII/1:8               184ns ± 4%   184ns ± 3%    ~     (p=0.749 n=10+10)
IndexAnyASCII/1:16              289ns ± 3%   281ns ± 3%  -2.73%  (p=0.001 n=9+10)
IndexAnyASCII/16:1              322ns ± 3%   307ns ± 3%  -4.71%  (p=0.000 n=10+10)
IndexAnyASCII/16:2              329ns ± 3%   320ns ± 3%  -2.89%  (p=0.008 n=10+10)
IndexAnyASCII/16:4              353ns ± 3%   339ns ± 3%  -3.91%  (p=0.001 n=10+10)
IndexAnyASCII/16:8              390ns ± 3%   374ns ± 3%  -4.06%  (p=0.000 n=10+10)
IndexAnyASCII/16:16             471ns ± 4%   452ns ± 2%  -4.22%  (p=0.000 n=10+10)
IndexAnyASCII/256:1            2.94µs ± 4%  2.91µs ± 2%    ~     (p=0.424 n=10+10)
IndexAnyASCII/256:2            2.92µs ± 3%  2.90µs ± 2%    ~     (p=0.388 n=9+10)
IndexAnyASCII/256:4            2.93µs ± 1%  2.90µs ± 1%  -0.98%  (p=0.036 n=8+9)
IndexAnyASCII/256:8            3.03µs ± 5%  2.97µs ± 3%    ~     (p=0.085 n=10+10)
IndexAnyASCII/256:16           3.07µs ± 4%  3.01µs ± 1%  -2.03%  (p=0.003 n=10+9)
IndexAnyASCII/4096:1           45.8µs ± 3%  45.9µs ± 2%    ~     (p=0.905 n=10+9)
IndexAnyASCII/4096:2           46.7µs ± 3%  46.2µs ± 3%    ~     (p=0.190 n=10+10)
IndexAnyASCII/4096:4           45.7µs ± 2%  46.4µs ± 3%  +1.37%  (p=0.022 n=9+10)
IndexAnyASCII/4096:8           46.4µs ± 3%  46.0µs ± 2%    ~     (p=0.436 n=10+10)
IndexAnyASCII/4096:16          46.6µs ± 3%  46.7µs ± 2%    ~     (p=0.971 n=10+10)
IndexPeriodic/IndexPeriodic2   1.40ms ± 3%  1.40ms ± 2%    ~     (p=0.853 n=10+10)
IndexPeriodic/IndexPeriodic4   1.40ms ± 3%  1.40ms ± 3%    ~     (p=0.579 n=10+10)
IndexPeriodic/IndexPeriodic8   1.42ms ± 3%  1.39ms ± 2%  -1.60%  (p=0.029 n=10+10)
IndexPeriodic/IndexPeriodic16   616µs ± 5%   583µs ± 5%  -5.32%  (p=0.001 n=10+10)
IndexPeriodic/IndexPeriodic32   313µs ± 5%   301µs ± 2%  -3.67%  (p=0.002 n=10+10)
IndexPeriodic/IndexPeriodic64   169µs ± 5%   164µs ± 5%  -3.17%  (p=0.023 n=10+10)

NodeJS version - 10.2.1

Change-Id: I9a8268314b5652c4aeffc4c5c72d2fd1a384aa9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/169777
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-29 03:59:19 +00:00
Tobias Klauser 3cb1e9d98a internal/bytealg: add assembly implementation of Count/CountString on arm
Simple single-byte loop count for now, to be further improved in future
CLs.

Benchmark on linux/arm:

name               old time/op    new time/op     delta
CountSingle/10-4      122ns ± 0%       87ns ± 1%  -28.41%  (p=0.000 n=7+10)
CountSingle/32-4      242ns ± 0%      174ns ± 1%  -28.25%  (p=0.000 n=10+10)
CountSingle/4K-4     24.2µs ± 1%     15.6µs ± 1%  -35.42%  (p=0.000 n=10+10)
CountSingle/4M-4     29.6ms ± 1%     21.3ms ± 1%  -28.09%  (p=0.000 n=10+9)
CountSingle/64M-4     562ms ± 0%      414ms ± 1%  -26.23%  (p=0.000 n=8+10)

name               old speed      new speed       delta
CountSingle/10-4   81.7MB/s ± 1%  114.5MB/s ± 1%  +40.07%  (p=0.000 n=10+10)
CountSingle/32-4    132MB/s ± 0%    184MB/s ± 1%  +39.39%  (p=0.000 n=10+9)
CountSingle/4K-4    170MB/s ± 1%    263MB/s ± 1%  +54.86%  (p=0.000 n=10+10)
CountSingle/4M-4    142MB/s ± 1%    197MB/s ± 1%  +39.07%  (p=0.000 n=10+9)
CountSingle/64M-4   119MB/s ± 0%    162MB/s ± 1%  +35.55%  (p=0.000 n=8+10)

Updates #29001

Change-Id: I42a268215a62044286ec32b548d8e4b86b9570ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/168319
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-19 16:33:10 +00:00
Tobias Klauser 3496ff1d19 internal/bytealg: share code for IndexByte functions on arm
Move the shared code of IndexByte and IndexByteString into
indexbytebody. This will allow to implement optimizations (e.g.
for #29001) in a single function.

Change-Id: I1d550da8eb65f95e492a460a12058cc35b1162b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/167939
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-18 07:37:13 +00:00
Tobias Klauser a734601bdf internal/bytealg: use word-wise comparison for Equal on arm
Follow CL 165338 and use word-wise comparison for aligned buffers in
Equal on arm, otherwise fall back to the current byte-wise comparison.

name                 old time/op    new time/op    delta
Equal/0-4              25.7ns ± 1%    23.5ns ± 1%    -8.78%  (p=0.000 n=10+10)
Equal/1-4              65.8ns ± 0%    60.1ns ± 1%    -8.69%  (p=0.000 n=10+9)
Equal/6-4              82.9ns ± 1%    86.7ns ± 0%    +4.59%  (p=0.000 n=10+10)
Equal/9-4              90.0ns ± 0%   101.0ns ± 0%   +12.18%  (p=0.000 n=9+10)
Equal/15-4              108ns ± 0%     119ns ± 0%   +10.19%  (p=0.000 n=8+8)
Equal/16-4              111ns ± 0%      82ns ± 0%   -26.37%  (p=0.000 n=8+10)
Equal/20-4              124ns ± 1%      87ns ± 1%   -29.94%  (p=0.000 n=9+10)
Equal/32-4              160ns ± 1%      97ns ± 1%   -39.40%  (p=0.000 n=10+10)
Equal/4K-4             14.0µs ± 0%     3.6µs ± 1%   -74.57%  (p=0.000 n=9+10)
Equal/4M-4             12.8ms ± 1%     3.2ms ± 0%   -74.93%  (p=0.000 n=9+9)
Equal/64M-4             204ms ± 1%      51ms ± 0%   -74.78%  (p=0.000 n=10+10)
EqualPort/1-4          47.0ns ± 1%    46.8ns ± 0%    -0.40%  (p=0.015 n=10+6)
EqualPort/6-4          82.6ns ± 1%    81.9ns ± 1%    -0.87%  (p=0.002 n=10+10)
EqualPort/32-4          232ns ± 0%     232ns ± 0%      ~     (p=0.496 n=8+10)
EqualPort/4K-4         29.0µs ± 1%    29.0µs ± 1%      ~     (p=0.604 n=9+10)
EqualPort/4M-4         24.0ms ± 1%    23.8ms ± 0%    -0.65%  (p=0.001 n=9+9)
EqualPort/64M-4         383ms ± 1%     382ms ± 0%      ~     (p=0.218 n=10+10)
CompareBytesEqual-4    61.2ns ± 1%    61.0ns ± 1%      ~     (p=0.539 n=10+10)

name                 old speed      new speed      delta
Equal/1-4            15.2MB/s ± 0%  16.6MB/s ± 1%    +9.52%  (p=0.000 n=10+9)
Equal/6-4            72.4MB/s ± 1%  69.2MB/s ± 0%    -4.40%  (p=0.000 n=10+10)
Equal/9-4             100MB/s ± 0%    89MB/s ± 0%   -11.40%  (p=0.000 n=9+10)
Equal/15-4            138MB/s ± 1%   125MB/s ± 1%    -9.41%  (p=0.000 n=10+10)
Equal/16-4            144MB/s ± 1%   196MB/s ± 0%   +36.41%  (p=0.000 n=10+10)
Equal/20-4            162MB/s ± 1%   231MB/s ± 1%   +42.98%  (p=0.000 n=9+10)
Equal/32-4            200MB/s ± 1%   331MB/s ± 1%   +65.64%  (p=0.000 n=10+10)
Equal/4K-4            292MB/s ± 0%  1149MB/s ± 1%  +293.19%  (p=0.000 n=9+10)
Equal/4M-4            328MB/s ± 1%  1307MB/s ± 0%  +298.87%  (p=0.000 n=9+9)
Equal/64M-4           329MB/s ± 1%  1306MB/s ± 0%  +296.56%  (p=0.000 n=10+10)
EqualPort/1-4        21.3MB/s ± 1%  21.4MB/s ± 0%    +0.42%  (p=0.002 n=10+9)
EqualPort/6-4        72.6MB/s ± 1%  73.2MB/s ± 1%    +0.87%  (p=0.003 n=10+10)
EqualPort/32-4        138MB/s ± 0%   138MB/s ± 0%      ~     (p=0.953 n=9+10)
EqualPort/4K-4        141MB/s ± 1%   141MB/s ± 1%      ~     (p=0.382 n=10+10)
EqualPort/4M-4        175MB/s ± 1%   176MB/s ± 0%    +0.65%  (p=0.001 n=9+9)
EqualPort/64M-4       175MB/s ± 1%   176MB/s ± 0%      ~     (p=0.225 n=10+10)

The 5-12% decrease in performance on Equal/{6,9,15} are due to the
benchmarks splitting the bytes buffer in half. The b argument to Equal
then ends up being unaligned and thus the fast word-wise compare doesn't
kick in.

Updates #29001

Change-Id: I73be501c18e67d211ed19da7771b4f254254e609
Reviewed-on: https://go-review.googlesource.com/c/go/+/167557
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-17 18:45:00 +00:00
Tobias Klauser 14a58d65e3 internal/bytealg: share code for equal functions on arm
Move the shared code into byteal.memeqbody. This will allow to implement
optimizations (e.g. for #29001) in a single function.

Change-Id: Iaa34ddeb7068d92c35a8b4e581b7fd92da56535c
Reviewed-on: https://go-review.googlesource.com/c/go/+/166677
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-12 07:27:56 +00:00
Tobias Klauser 029a5af6a1 internal/bytealg: use word-wise comparison for Compare on arm
Use word-wise comparison for aligned buffers, otherwise fall back to the
current byte-wise comparison.

name                           old time/op    new time/op    delta
BytesCompare/1-4                 41.3ns ± 0%    36.4ns ± 1%   -11.73%  (p=0.008 n=5+5)
BytesCompare/2-4                 39.5ns ± 0%    39.5ns ± 1%      ~     (p=0.960 n=5+5)
BytesCompare/4-4                 45.3ns ± 0%    41.0ns ± 1%    -9.40%  (p=0.008 n=5+5)
BytesCompare/8-4                 64.8ns ± 1%    44.7ns ± 0%   -31.12%  (p=0.008 n=5+5)
BytesCompare/16-4                86.3ns ± 0%    55.1ns ± 0%   -36.21%  (p=0.008 n=5+5)
BytesCompare/32-4                 135ns ± 0%      70ns ± 1%   -47.73%  (p=0.008 n=5+5)
BytesCompare/64-4                 231ns ± 1%      99ns ± 0%   -57.27%  (p=0.016 n=5+4)
BytesCompare/128-4                424ns ± 0%     147ns ± 0%   -65.31%  (p=0.000 n=4+5)
BytesCompare/256-4                810ns ± 0%     243ns ± 0%   -69.96%  (p=0.008 n=5+5)
BytesCompare/512-4               1.59µs ± 0%    0.44µs ± 0%   -72.43%  (p=0.008 n=5+5)
BytesCompare/1024-4              3.14µs ± 1%    0.83µs ± 1%   -73.56%  (p=0.008 n=5+5)
BytesCompare/2048-4              6.23µs ± 0%    1.61µs ± 1%   -74.21%  (p=0.008 n=5+5)
CompareBytesEqual-4              79.4ns ± 0%    52.2ns ± 0%   -34.23%  (p=0.008 n=5+5)
CompareBytesToNil-4              31.0ns ± 0%    30.3ns ± 0%    -2.32%  (p=0.008 n=5+5)
CompareBytesEmpty-4              25.7ns ± 0%    25.7ns ± 0%      ~     (p=0.556 n=4+5)
CompareBytesIdentical-4          25.7ns ± 0%    25.7ns ± 0%      ~     (p=1.000 n=5+5)
CompareBytesSameLength-4         49.1ns ± 0%    48.5ns ± 0%    -1.26%  (p=0.008 n=5+5)
CompareBytesDifferentLength-4    49.8ns ± 1%    49.3ns ± 0%    -1.08%  (p=0.008 n=5+5)
CompareBytesBigUnaligned-4       5.71ms ± 1%    5.68ms ± 1%      ~     (p=0.222 n=5+5)
CompareBytesBig-4                4.95ms ± 0%    2.28ms ± 1%   -53.81%  (p=0.008 n=5+5)
CompareBytesBigIdentical-4       27.2ns ± 1%    27.3ns ± 1%      ~     (p=0.310 n=5+5)

name                           old speed      new speed      delta
CompareBytesBigUnaligned-4      184MB/s ± 1%   185MB/s ± 1%      ~     (p=0.222 n=5+5)
CompareBytesBig-4               212MB/s ± 0%   459MB/s ± 1%  +116.51%  (p=0.008 n=5+5)
CompareBytesBigIdentical-4     38.5TB/s ± 0%  38.4TB/s ± 1%      ~     (p=0.421 n=5+5)

Also, this reduces time for TestCompareBytes by about 20 sec on a
linux-arm builder via gomote.

Updates #29001

Change-Id: I25f148739b9ccb7cb1fc97b3d8763549b0a66c16
Reviewed-on: https://go-review.googlesource.com/c/go/+/165338
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-06 16:09:38 +00:00
Richard Musiol c0d82bb0ec all: rename WebAssembly instructions according to spec changes
The names of some instructions have been updated in the WebAssembly
specification to be more consistent, see
994591e51c.
This change to the spec is possible because it is still in a draft
state.

Go's support for WebAssembly is still experimental and thus excempt from
the compatibility promise. Being consistent with the spec should
warrant this breaking change to the assembly instruction names.

Change-Id: Iafb8b18ee7f55dd0e23c6c7824aa1fad43117ef1
Reviewed-on: https://go-review.googlesource.com/c/153797
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-03 21:10:01 +00:00
Austin Clements ef7ce57ac2 internal/bytealg, runtime: provide linknames for pushed symbols
The internal/bytealg package defines several symbols in the runtime,
bytes, and strings packages in assembly, and the runtime package
defines symbols in reflect and sync/atomic. Currently, there's no
corresponding Go prototype for these symbols in the defining package.

We're going to start depending on Go prototypes in the same package as
their assembly definitions in order to provide ABI wrappers. Plus,
these are good documentation and colocate type information with
definitions, which could be useful for vet if it learned a little
about linkname.

This CL adds linknamed Go prototypes for all pushed symbols in
internal/bytealg and runtime.

For #27539.

Change-Id: I9b0c12d935a75bb6af46b6761180d451c00f11b8
Reviewed-on: https://go-review.googlesource.com/c/146820
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-12 20:27:16 +00:00
Keith Randall ad4a58e315 strings,bytes: use inlineable function trampolines instead of linkname
Cleans things up quite a bit.

There's still a few more, like runtime.cmpstring, which might also
be worth fixing.

Change-Id: Ide18dd621efc129cc686db223f47fa0b044b5580
Reviewed-on: https://go-review.googlesource.com/c/148578
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-11-08 20:52:47 +00:00
Lynn Boger 6994731ec2 internal/bytealg: improve asm for memequal on ppc64x
This includes two changes to the memequal function.

Previously the asm implementation on ppc64x for Equal called the internal
function memequal using a BL, whereas the other asm implementations for
bytes functions on ppc64x used BR. The BR is preferred because the BL
causes the calling function to stack a frame. This changes Equal so it
uses BR and is consistent with the others.

This also uses vsx instructions where possible to improve performance
of the compares for sizes over 32.

Here are results from the sizes affected:

Equal/32             8.40ns ± 0%     7.66ns ± 0%    -8.81%  (p=0.029 n=4+4)
Equal/4K              193ns ± 0%      144ns ± 0%   -25.39%  (p=0.029 n=4+4)
Equal/4M              346µs ± 0%      277µs ± 0%   -20.08%  (p=0.029 n=4+4)
Equal/64M            7.66ms ± 1%     7.27ms ± 0%    -5.10%  (p=0.029 n=4+4)

Change-Id: Ib6ee2cdc3e5d146e2705e3338858b8e965d25420
Reviewed-on: https://go-review.googlesource.com/c/143060
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 19:22:44 +00:00
Zhou Peng b8ac64a581 all: this big patch remove whitespace from assembly files
Don't worry, this patch just remove trailing whitespace from
assembly files, and does not touch any logical changes.

Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d
Reviewed-on: https://go-review.googlesource.com/c/113595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-03 15:28:51 +00:00
Carlos Eduardo Seo 23f7554194 internal/bytealg: improve performance of IndexByte for ppc64x
Use addi+lvx instruction fusion and remove register dependencies in
the main loop to improve performance.

benchmark                      old ns/op     new ns/op     delta
BenchmarkIndexByte/10-192      9.86          9.75          -1.12%
BenchmarkIndexByte/32-192      15.6          11.2          -28.21%
BenchmarkIndexByte/4K-192      155           97.6          -37.03%
BenchmarkIndexByte/4M-192      171790        129650        -24.53%
BenchmarkIndexByte/64M-192     6530982       5018424       -23.16%

benchmark                      old MB/s     new MB/s     speedup
BenchmarkIndexByte/10-192      1013.72      1025.76      1.01x
BenchmarkIndexByte/32-192      2049.47      2868.01      1.40x
BenchmarkIndexByte/4K-192      26422.69     41975.67     1.59x
BenchmarkIndexByte/4M-192      24415.17     32350.74     1.33x
BenchmarkIndexByte/64M-192     10275.46     13372.50     1.30x

Change-Id: Iedf17f01f374d58e85dcd6a972209bfcb7eb6063
Reviewed-on: https://go-review.googlesource.com/137415
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-09-25 20:25:25 +00:00
erifan01 de28555c0b internal/bytealg: optimize Equal on arm64
Currently the 16-byte loop chunk16_loop is implemented with NEON instructions LD1, VMOV and VCMEQ.
Using scalar instructions LDP and CMP to achieve this loop can reduce the number of clock cycles.
For cases where the length of strings are between 4 to 15 bytes, loading the last 8 or 4 bytes at
a time to reduce the number of comparisons.

Benchmarks:
name                 old time/op    new time/op    delta
Equal/0-8              5.51ns ± 0%    5.84ns ±14%     ~     (p=0.246 n=7+8)
Equal/1-8              10.5ns ± 0%    10.5ns ± 0%     ~     (all equal)
Equal/6-8              14.0ns ± 0%    12.5ns ± 0%  -10.71%  (p=0.000 n=8+8)
Equal/9-8              13.5ns ± 0%    12.5ns ± 0%   -7.41%  (p=0.000 n=8+8)
Equal/15-8             15.5ns ± 0%    12.5ns ± 0%  -19.35%  (p=0.000 n=8+8)
Equal/16-8             14.0ns ± 0%    13.0ns ± 0%   -7.14%  (p=0.000 n=8+8)
Equal/20-8             16.5ns ± 0%    16.0ns ± 0%   -3.03%  (p=0.000 n=8+8)
Equal/32-8             16.5ns ± 0%    15.3ns ± 0%   -7.27%  (p=0.000 n=8+8)
Equal/4K-8              552ns ± 0%     553ns ± 0%     ~     (p=0.315 n=8+8)
Equal/4M-8             1.13ms ±23%    1.20ms ±27%     ~     (p=0.442 n=8+8)
Equal/64M-8            32.9ms ± 0%    32.6ms ± 0%   -1.15%  (p=0.000 n=8+8)
CompareBytesEqual-8    12.0ns ± 0%    12.0ns ± 0%     ~     (all equal)

Change-Id: If317ecdcc98e31883d37fd7d42b113b548c5bd2a
Reviewed-on: https://go-review.googlesource.com/112496
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-09-12 19:56:55 +00:00
Lynn Boger a0fad982b1 internal/bytealg: implement bytes.Count in asm for ppc64x
This adds an asm implementation for the Count function in ppc64x.
The Go code that manipulates a byte at a time is especially
inefficient on ppc64x, so an asm implementation is a significant
improvement.

bytes:
name               old time/op   new time/op    delta
CountSingle/10-8    23.1ns ± 0%    18.6ns ± 0%    -19.48%  (p=1.000 n=1+1)
CountSingle/32-8    60.4ns ± 0%    19.0ns ± 0%    -68.54%  (p=1.000 n=1+1)
CountSingle/4K-8    7.29µs ± 0%    0.45µs ± 0%    -93.80%  (p=1.000 n=1+1)
CountSingle/4M-8    7.49ms ± 0%    0.45ms ± 0%    -93.97%  (p=1.000 n=1+1)
CountSingle/64M-8    127ms ± 0%       9ms ± 0%    -92.53%  (p=1.000 n=1+1)

html:
name              old time/op  new time/op  delta
Escape-8          57.5µs ± 0%  36.1µs ± 0%  -37.13%  (p=1.000 n=1+1)
EscapeNone-8      20.0µs ± 0%   2.0µs ± 0%  -90.14%  (p=1.000 n=1+1)

Change-Id: Iadbf422c0e9a37b47d2d95fb8c778420f3aabb58
Reviewed-on: https://go-review.googlesource.com/131695
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-09-12 15:47:43 +00:00
Martin Möhrmann eae5fc88c1 internal/bytealg: replace use of runtime.support_sse2 with cpu.X86.HasSSE2
This makes the runtime.support_sse2 variable unused
so it is removed in this CL too.

Change-Id: Ia8b9ffee7ac97128179f74ef244b10315e44c234
Reviewed-on: https://go-review.googlesource.com/131455
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-26 15:27:20 +00:00
Martin Möhrmann 05c02444eb all: align cpu feature variable offset naming
Add an "offset_" prefix to all cpu feature variable offset constants to
signify that they are not boolean cpu feature variables.

Remove _ from offset constant names.

Change-Id: I6e22a79ebcbe6e2ae54c4ac8764f9260bb3223ff
Reviewed-on: https://go-review.googlesource.com/131215
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-24 18:40:16 +00:00
Kazuhiro Sera ad644d2e86 all: fix typos detected by github.com/client9/misspell
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661
GitHub-Last-Rev: ae85bcf82b
GitHub-Pull-Request: golang/go#26920
Reviewed-on: https://go-review.googlesource.com/128955
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-23 15:54:07 +00:00
Keith Randall be9c994609 internal/bytealg: specify argmaps for exported functions
Functions exported on behalf of other packages need to have their
argument stack maps specified explicitly.  They don't get an implicit
map because they are not in the local package, and if they get defer'd
they need argument maps.

Fixes #24419

Change-Id: I35b7d8b4a03d4770ba88699e1007cb3fcb5397a9
Reviewed-on: https://go-review.googlesource.com/122676
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-10 17:13:53 +00:00
bill_ofarrell a5f8128e39 bytes, strings: fix comparison of long byte slices on s390x
The existing implementation of bytes.Compare on s390x doesn't work properly for slices longer
than 256 elements. This change fixes that. Added tests for long strings and slices of bytes.

Fixes #26114

Change-Id: If6d8b68ee6dbcf99a24f867a1d3438b1f208954f
Reviewed-on: https://go-review.googlesource.com/121495
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-06-29 20:48:07 +00:00
Richard Musiol c12399fffb all: enable vet/all for js/wasm and fix vet issues
This commit enables vet/all for the js/wasm architecture. It got
skipped initially because the codebase did not fully compile yet
for js/wasm, which made vet/all fail.

startTimer and stopTimer are not needed in the syscall package.
Removed their assembly code since their Go stubs were already gone.

Change-Id: Icaeb6d903876e51ceb1edff7631f715a98c28696
Reviewed-on: https://go-review.googlesource.com/118657
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-15 00:05:22 +00:00
Richard Musiol b382fe28a7 internal/bytealg: add wasm architecture
This commit adds the wasm architecture to the internal/bytealg package.

Some parts of the assembly code have been extracted from WebAssembly
bytecode generated with Emscripten (which uses musl libc).

Updates #18892

Change-Id: Iba7f7158356b816c9ad03ca9223903a41a024da6
Reviewed-on: https://go-review.googlesource.com/103915
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:28:18 +00:00
erifan01 f8ef6ed24a internal/bytealg: optimize Index (substring lengths from 9 to 32) on arm64
The current code is not optimized for cases where the length of the
substring to be searched is between 9 bytes and 32 bytes. This CL
optimizes the situations.

Benchmark:
name                             old time/op  new time/op  delta
pkg:strings goos:linux goarch:arm64
IndexHard1-8                     1.06ms ± 0%  1.06ms ± 0%   -0.44%  (p=0.000 n=7+8)
IndexHard2-8                     1.25ms ± 1%  1.26ms ± 2%     ~     (p=0.328 n=8+8)
IndexHard3-8                     2.85ms ± 1%  1.18ms ± 1%  -58.59%  (p=0.000 n=8+8)
IndexHard4-8                     2.90ms ± 1%  2.87ms ± 1%   -0.96%  (p=0.021 n=8+8)

pkg:bytes goos:linux goarch:arm64
IndexByte/4M-8                      726124.200000ns +- 6%     560021.400000ns +-20%  -22.88%  (p=0.008 n=5+5)
IndexRune/4M-8                      928768.600000ns +- 0%     793144.600000ns +- 6%  -14.60%  (p=0.008 n=5+5)

Change-Id: Ieebeb784ae69b2a0642ea96e9486a1d120923568
Reviewed-on: https://go-review.googlesource.com/109895
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-01 15:45:54 +00:00
erifan01 d4e936cfd6 internal/bytealg: optimize IndexString on arm64
This CL adjusts the order of the branch instructions of the
code to make it easier for the LIKELY branch to happen.

Benchmarks:
name                            old time/op    new time/op    delta
pkg:strings goos:linux goarch:arm64
IndexHard2-8                      2.17ms ± 1%    1.23ms ± 0%  -43.34%  (p=0.008 n=5+5)
CountHard2-8                      2.13ms ± 1%    1.21ms ± 2%  -43.31%  (p=0.008 n=5+5)

pkg:bytes goos:linux goarch:arm64
IndexRune/4M-8                     661µs ±22%     513µs ± 0%  -22.32%  (p=0.008 n=5+5)
IndexEasy/4M-8                     672µs ±23%     513µs ± 0%  -23.71%  (p=0.016 n=5+4)

Change-Id: Ib96f095edf77747edc8a971e79f5c1428e5808ce
Reviewed-on: https://go-review.googlesource.com/109015
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-24 20:21:08 +00:00
fanzha02 d31ee64186 internal/bytealg: add optimized Compare for arm64
Use LDP instructions to load 16 bytes per loop when the source length is long. Specially
process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance.

Benchmark result:
name                           old time/op   new time/op    delta
BytesCompare/1-8                21.0ns ± 0%    10.5ns ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/2-8                11.5ns ± 0%    10.5ns ± 0%    -8.70%  (p=0.008 n=5+5)
BytesCompare/4-8                13.5ns ± 0%    10.0ns ± 0%   -25.93%  (p=0.008 n=5+5)
BytesCompare/8-8                28.8ns ± 0%     9.5ns ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/16-8               40.5ns ± 0%    10.5ns ± 0%   -74.07%  (p=0.008 n=5+5)
BytesCompare/32-8               64.6ns ± 0%    12.5ns ± 0%   -80.65%  (p=0.008 n=5+5)
BytesCompare/64-8                112ns ± 0%      16ns ± 0%   -85.27%  (p=0.008 n=5+5)
BytesCompare/128-8               208ns ± 0%      24ns ± 0%   -88.22%  (p=0.008 n=5+5)
BytesCompare/256-8               400ns ± 0%      50ns ± 0%   -87.62%  (p=0.008 n=5+5)
BytesCompare/512-8               785ns ± 0%      82ns ± 0%   -89.61%  (p=0.008 n=5+5)
BytesCompare/1024-8             1.55µs ± 0%    0.14µs ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/2048-8             3.09µs ± 0%    0.27µs ± 0%      ~     (p=0.079 n=4+5)
CompareBytesEqual-8             39.0ns ± 0%    12.0ns ± 0%   -69.23%  (p=0.008 n=5+5)
CompareBytesToNil-8             8.57ns ± 5%    8.23ns ± 2%    -3.99%  (p=0.016 n=5+5)
CompareBytesEmpty-8             7.37ns ± 0%    7.36ns ± 4%      ~     (p=0.690 n=5+5)
CompareBytesIdentical-8         7.39ns ± 0%    7.46ns ± 2%      ~     (p=0.667 n=5+5)
CompareBytesSameLength-8        17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
CompareBytesDifferentLength-8   17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
CompareBytesBigUnaligned-8      1.58ms ± 0%    0.19ms ± 0%   -88.31%  (p=0.016 n=4+5)
CompareBytesBig-8               1.59ms ± 0%    0.19ms ± 0%   -88.27%  (p=0.016 n=5+4)
CompareBytesBigIdentical-8      7.01ns ± 0%    6.60ns ± 3%    -5.91%  (p=0.008 n=5+5)

name                           old speed     new speed      delta
CompareBytesBigUnaligned-8     662MB/s ± 0%  5660MB/s ± 0%  +755.15%  (p=0.016 n=4+5)
CompareBytesBig-8              661MB/s ± 0%  5636MB/s ± 0%  +752.57%  (p=0.016 n=5+4)
CompareBytesBigIdentical-8     150TB/s ± 0%   159TB/s ± 3%    +6.27%  (p=0.008 n=5+5)

This is resubmit of CL90175.

Change-Id: Ie841daedb3123a68dd2554f27ebef0b3f8a855c2
Reviewed-on: https://go-review.googlesource.com/101635
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11 14:54:59 +00:00
Ilya Tocar f8b28e28f8 internal/bytealg: remove dependency on runtime·support_avx2
Use internal/cpu instead.

Change-Id: I8670440389cbd88951fee61e352c4a10ac7eee6e
Reviewed-on: https://go-review.googlesource.com/102737
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-28 15:39:16 +00:00
Cherry Zhang e22d24131c Revert "bytes: add optimized Compare for arm64"
This reverts commit bfa8b6f8ff.

Reason for revert: This depends on another CL which is not yet submitted.

Change-Id: I50e7594f1473c911a2079fe910849a6694ac6c07
Reviewed-on: https://go-review.googlesource.com/101496
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20 00:10:24 +00:00
fanzha02 bfa8b6f8ff bytes: add optimized Compare for arm64
Use LDP instructions to load 16 bytes per loop when the source length is long. Specially
process the 8 bytes length, 4 bytes length and 2 bytes length to get a better performance.

Benchmark result:
name                           old time/op   new time/op    delta
BytesCompare/1-8                21.0ns ± 0%    10.5ns ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/2-8                11.5ns ± 0%    10.5ns ± 0%    -8.70%  (p=0.008 n=5+5)
BytesCompare/4-8                13.5ns ± 0%    10.0ns ± 0%   -25.93%  (p=0.008 n=5+5)
BytesCompare/8-8                28.8ns ± 0%     9.5ns ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/16-8               40.5ns ± 0%    10.5ns ± 0%   -74.07%  (p=0.008 n=5+5)
BytesCompare/32-8               64.6ns ± 0%    12.5ns ± 0%   -80.65%  (p=0.008 n=5+5)
BytesCompare/64-8                112ns ± 0%      16ns ± 0%   -85.27%  (p=0.008 n=5+5)
BytesCompare/128-8               208ns ± 0%      24ns ± 0%   -88.22%  (p=0.008 n=5+5)
BytesCompare/256-8               400ns ± 0%      50ns ± 0%   -87.62%  (p=0.008 n=5+5)
BytesCompare/512-8               785ns ± 0%      82ns ± 0%   -89.61%  (p=0.008 n=5+5)
BytesCompare/1024-8             1.55µs ± 0%    0.14µs ± 0%      ~     (p=0.079 n=4+5)
BytesCompare/2048-8             3.09µs ± 0%    0.27µs ± 0%      ~     (p=0.079 n=4+5)
CompareBytesEqual-8             39.0ns ± 0%    12.0ns ± 0%   -69.23%  (p=0.008 n=5+5)
CompareBytesToNil-8             8.57ns ± 5%    8.23ns ± 2%    -3.99%  (p=0.016 n=5+5)
CompareBytesEmpty-8             7.37ns ± 0%    7.36ns ± 4%      ~     (p=0.690 n=5+5)
CompareBytesIdentical-8         7.39ns ± 0%    7.46ns ± 2%      ~     (p=0.667 n=5+5)
CompareBytesSameLength-8        17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
CompareBytesDifferentLength-8   17.0ns ± 0%    10.5ns ± 0%   -38.24%  (p=0.008 n=5+5)
CompareBytesBigUnaligned-8      1.58ms ± 0%    0.19ms ± 0%   -88.31%  (p=0.016 n=4+5)
CompareBytesBig-8               1.59ms ± 0%    0.19ms ± 0%   -88.27%  (p=0.016 n=5+4)
CompareBytesBigIdentical-8      7.01ns ± 0%    6.60ns ± 3%    -5.91%  (p=0.008 n=5+5)

name                           old speed     new speed      delta
CompareBytesBigUnaligned-8     662MB/s ± 0%  5660MB/s ± 0%  +755.15%  (p=0.016 n=4+5)
CompareBytesBig-8              661MB/s ± 0%  5636MB/s ± 0%  +752.57%  (p=0.016 n=5+4)
CompareBytesBigIdentical-8     150TB/s ± 0%   159TB/s ± 3%    +6.27%  (p=0.008 n=5+5)

Change-Id: I70332de06f873df3bc12c4a5af1028307b670046
Reviewed-on: https://go-review.googlesource.com/90175
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-20 00:06:34 +00:00
Keith Randall 63bcabed49 internal/bytealg: fix arm64 Index function
Missed removing the argument loading from the indexbody function.

Change-Id: Ia1391231fc99771d00410a09fe80a09f08ceed02
Reviewed-on: https://go-review.googlesource.com/98575
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-05 15:38:29 +00:00
Keith Randall ee58eccc56 internal/bytealg: move short string Index implementations into bytealg
Also move the arm64 CountByte implementation while we're here.

Fixes #19792

Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e
Reviewed-on: https://go-review.googlesource.com/98518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 19:49:44 +00:00
Keith Randall f6332bb84a internal/bytealg: move compare functions to bytealg
Move bytes.Compare and runtime·cmpstring to bytealg.

Update #19792

Change-Id: I139e6d7c59686bef7a3017e3dec99eba5fd10447
Reviewed-on: https://go-review.googlesource.com/98515
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 17:49:39 +00:00
Keith Randall 45964e4f9c internal/bytealg: move Count to bytealg
Move bytes.Count and strings.Count to bytealg.

Update #19792

Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb
Reviewed-on: https://go-review.googlesource.com/98495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 17:49:25 +00:00
Keith Randall 1dfa380e3d internal/bytealg: move equal functions to bytealg
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen
to the bytealg package.

Update #19792

Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac
Reviewed-on: https://go-review.googlesource.com/98355
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-03 04:18:27 +00:00
Keith Randall 403ab0f221 internal/bytealg: move IndexByte asssembly to the new bytealg package
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.

Once this is in, the next step is to move the other functions
(Compare, Equal, ...).

Update #19792

This change seems complicated enough that we might just declare
"not worth it" and abandon.  Opinions welcome.

The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.

The wrapper functions have been cleaned up as they are now actually
checked by vet.

Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
Reviewed-on: https://go-review.googlesource.com/98016
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 22:46:15 +00:00