This implementation is based on the C function from
"Faster Population Counts Using AVX2 Instructions" paper, Figure 3,
available at https://arxiv.org/pdf/1612.07612.pdf
More details and benchmark results are available in the #46188.
Closes#46188
Change-Id: Ibce07f8f36f7c64f7022ce656f8efbec5dff3f82
Reviewed-on: https://go-review.googlesource.com/c/go/+/313829
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Robert Griesemer <gri@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
The Div functions in math/bits (Div, Div32, and Div64) compute both
quotients and remainders, but they panic if the quotients do not not
fit a 32/64 uint.
Since, on the other hand, the remainder will always fit the size of
the divisor, it is useful to have Div variants that only compute the
remainder, and don't panic on a quotient overflow.
This change adds to the math/bits package three new functions:
Rem(hi, lo, y uint) uint
Rem32(hi, lo, y uint32) uint32
Rem64(hi, lo, y uint64) uint64
which can be used to compute (hi,lo)%y even when the quotient
overflows the uint size.
Fixes#28970
Change-Id: I119948429f737670c5e5ceb8756121e6a738dbdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/197838
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When building as part of the bootstrap process, avoid
use of "go:linkname" applied to variables, since this
feature is ill-defined/unsupported for gccgo.
Updates #30771.
Change-Id: Id44d01b5c98d292702e5075674117518cb59e2d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/170737
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use ReverseBytes32 and ReverseBytes64 to speed up these functions.
The byte reversal functions are intrinsics on most platforms and
generally compile to a single instruction.
name old time/op new time/op delta
Reverse32 2.41ns ± 1% 1.94ns ± 3% -19.60% (p=0.000 n=20+19)
Reverse64 3.85ns ± 1% 2.56ns ± 1% -33.32% (p=0.000 n=17+19)
Change-Id: I160bf59a0c7bd5db94114803ec5a59fae448f096
Reviewed-on: https://go-review.googlesource.com/c/159358
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TrailingZeros16 is the only one of the TrailingZeros functions with a
named return value in the signature. This creates a sligthly
unpleasant effect in the godoc listing:
func TrailingZeros(x uint) int
func TrailingZeros16(x uint16) (n int)
func TrailingZeros32(x uint32) int
func TrailingZeros64(x uint64) int
func TrailingZeros8(x uint8) int
Since the named return value is not even used, remove it.
Change-Id: I15c5aedb6157003911b6e0685c357ce56e466c0e
Reviewed-on: https://go-review.googlesource.com/c/153340
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Div panics when y<=hi because either the quotient overflows
the size of the output or division by zero occurs when y==0.
This provides a uniform behavior for all implementations.
Fixes#28316
Change-Id: If23aeb10e0709ee1a60b7d614afc9103d674a980
Reviewed-on: https://go-review.googlesource.com/c/149517
Reviewed-by: Robert Griesemer <gri@golang.org>
Explicitly check for divide-by-zero/overflow and panic with the appropriate
runtime error. The additional checks have basically no effect on performance
since the branch is easily predicted.
name old time/op new time/op delta
Div-4 53.9ns ± 1% 53.0ns ± 1% -1.59% (p=0.016 n=4+5)
Div32-4 17.9ns ± 0% 18.4ns ± 0% +2.56% (p=0.008 n=5+5)
Div64-4 53.5ns ± 0% 53.3ns ± 0% ~ (p=0.095 n=5+5)
Updates #28316
Change-Id: I36297ee9946cbbc57fefb44d1730283b049ecf57
Reviewed-on: https://go-review.googlesource.com/c/144377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Port math/big pure go versions of add-with-carry, subtract-with-borrow,
full-width multiply, and full-width divide.
Updates #24813
Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c
Reviewed-on: https://go-review.googlesource.com/123157
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Removes an extra function call for TrailingZeroes and thus may
increase chances for inlining.
Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af
Reviewed-on: https://go-review.googlesource.com/37613
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
- moved from: x&m>>k | x&^m<<k to: x&m>>k | x<<k&m
This permits use of the same constant m twice (*) which may be
better for machines that can't use large immediate constants
directly with an AND instruction and have to load them explicitly.
*) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m)
- simplified returns
This improves the generated code because the compiler recognizes
x>>k | x<<k as ROT when k is the bitsize of x.
The 8-bit versions of these instructions can be significantly faster
still if they are replaced with table lookups, as long as the table
is in cache. If the table is not in cache, table-lookup is probably
slower, hence the choice of an explicit register-only implementation
for now.
BenchmarkReverse-8 8.50 6.86 -19.29%
BenchmarkReverse8-8 2.17 1.74 -19.82%
BenchmarkReverse16-8 2.89 2.34 -19.03%
BenchmarkReverse32-8 3.55 2.95 -16.90%
BenchmarkReverse64-8 6.81 5.57 -18.21%
BenchmarkReverseBytes-8 3.49 2.48 -28.94%
BenchmarkReverseBytes16-8 0.93 0.62 -33.33%
BenchmarkReverseBytes32-8 1.55 1.13 -27.10%
BenchmarkReverseBytes64-8 2.47 2.47 +0.00%
Reverse-8 8.50ns ± 0% 6.86ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 2.17ns ± 0% 1.74ns ± 0% ~ (p=1.000 n=1+1)
Reverse16-8 2.89ns ± 0% 2.34ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 3.55ns ± 0% 2.95ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 6.81ns ± 0% 5.57ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes-8 3.49ns ± 0% 2.48ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 0.93ns ± 0% 0.62ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 1.55ns ± 0% 1.13ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 2.47ns ± 0% 2.47ns ± 0% ~ (all samples are equal)
Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d
Reviewed-on: https://go-review.googlesource.com/37215
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>