mapiterinit allows external linkname. These users must allocate their
own iter struct for initialization by mapiterinit. Since the type is
unexported, they also must define the struct themselves. As a result,
they of course define the struct matching the old hiter definition (in
map_noswiss.go).
The old definition is smaller on 32-bit platforms. On those platforms,
mapiternext will clobber memory outside of the caller's allocation.
On all platforms, the pointer layout between the old hiter and new
maps.Iter does not match. Thus the GC may miss pointers and free
reachable objects early, or it may see non-pointers that look like heap
pointers and throw due to invalid references to free objects.
To avoid these issues, we must keep mapiterinit and mapiternext with the
old hiter definition. The most straightforward way to do this is to use
mapiterinit and mapiternext as a compatibility layer between the old and
new iter types.
The first step to that is to move normal map use off of these functions,
which is what this CL does.
Introduce new mapIterStart and mapIterNext functions that replace the
former functions everywhere in the toolchain. These have the same
behavior as the old functions.
This CL temporarily makes the old functions throw to ensure we don't
have hidden dependencies on them. We cannot remove them entirely because
GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj
requires all builtins to exist regardless of GOEXPERIMENT. The next CL
will introduce the compatibility layer.
I want to avoid using linkname between runtime and reflect, as that
would also allow external linknames. So mapIterStart and mapIterNext are
duplicated in reflect, which can be done trivially, as it imports
internal/runtime/maps.
For #71408.
Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669
Reviewed-on: https://go-review.googlesource.com/c/go/+/643898
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Re-enable tests for stack-allocated maps and fast map accessors.
Those are implemented now.
Update #54766
Change-Id: I8c019702bd9fb077b2fe3f7c78e8e9e10d2263a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/642376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
We can't coalesce a non-WB store with a subsequent Move, as the
result of the store might be the source of the move.
There's a simple codegen test. Not sure how we might do a real test,
as all the repro's I've come up with are very expensive and unreliable.
Fixes#71228
Change-Id: If18bf181a266b9b90964e2591cd2e61a7168371c
Reviewed-on: https://go-review.googlesource.com/c/go/+/642197
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Remove the OpLocalAddrs that are unnecessary in the CSE pass, so the
following passes like DSE and memcombine can do its work better.
Fixes#70300
Change-Id: I600025d49eeadb3ca4f092d614428399750f69bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/628075
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
This came up in some swissmap code.
Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83
Reviewed-on: https://go-review.googlesource.com/c/go/+/629516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.
Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
For #69635
Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621755
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/618476
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes#70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The spill/restore code around morestack is almost never exectued, so
we should make it as small as possible. Using 2-register loads/stores
makes sense here. Also, the offsets from SP are pretty small so the
offset almost always fits in the (smaller than a normal load/store)
offset field of the instruction.
Makes cmd/go 0.6% smaller.
Change-Id: I8845283c1b269a259498153924428f6173bda293
Reviewed-on: https://go-review.googlesource.com/c/go/+/621556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
If the expression type is a single compile-time known type, use that
type instead of the dynamic one, so the later passes of the compiler
could skip un-necessary runtime calls.
Thanks Youlin Feng for writing the original test case.
Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773
Reviewed-on: https://go-review.googlesource.com/c/go/+/616975
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Change-Id: I7d996b8d46fbeef933943f806052a30f1f8d50c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/588836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
This CL optimizes the compilation of string-to-bytes conversion in the
case of string additions.
Fixes#62407
Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/527935
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Tim King <taking@google.com>
When GORISCV64 enables rva22u64, combined shift and addition using the
SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba
extension. This results in more than 2000 instructions being removed
from the Go binary on riscv64.
Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8
Reviewed-on: https://go-review.googlesource.com/c/go/+/606636
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
No point in keeping values in registers when their next use is after
a call, as we'd have to spill/restore them anyway.
cmd/go is 0.1% smaller.
Fixes#59297
Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/509255
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
After newobject, we don't need to write zeroes to initialize the
object. It has already been zeroed by the allocator.
This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.
Fixes#68845
Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Get rid of TODO in prove pass.
We currently avoid marking shifts of constants as bounded, where
bounded means we don't have to worry about <0 or >=bitwidth shifts.
We do this because it causes different rule applications during lowering
which cause some codegen tests to fail.
Add some new rules which ensure that we get the right final instruction
sequence regardless of the ordering. Then we can remove this special case.
Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205
Reviewed-on: https://go-review.googlesource.com/c/go/+/602935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We don't need noLimit checks in a bunch of places.
Also simplify folding of provable constant results.
At this point in the CL stack, compilebench reports no performance
changes. The only thing of note is that binaries got a bit smaller.
name old text-bytes new text-bytes delta
HelloSize 960kB ± 0% 952kB ± 0% -0.83% (p=0.000 n=10+10)
CmdGoSize 12.3MB ± 0% 12.1MB ± 0% -1.53% (p=0.000 n=10+10)
Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/599197
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.
Fixes#66826Fixes#66692Fixes#48213
Update #57959
TODO: remove constant logic from poset code, no longer needed.
Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Temporary measure to reduce the required MVP code.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a
Reviewed-on: https://go-review.googlesource.com/c/go/+/580915
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
x86 is better at storing constant ints than constant floats.
(It uses a constant directly in the instruction stream, instead of
loading it from a constant global memory.)
Noticed as part of #67957
Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d
Reviewed-on: https://go-review.googlesource.com/c/go/+/592596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.
Fixes#67844.
Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: I51e9832317e5dee1e3fe0772e7592b3dae95a625
Reviewed-on: https://go-review.googlesource.com/c/go/+/586797
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.
Fixes#67526
Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Avoid creating duplicate usages of ANDCCconst. This is preparation for
a patch to reintroduce ANDconst to simplify the lower pass while
treating ANDCCconst like other *CC* ssa opcodes.
Also, move many of the similar rules wich retarget ANDCCconst users
to the flag result to a common rule for all compares against zero.
Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/585400
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
The results of cmpstring are reuseable if the second call has the
same arguments and memory.
Note that this gets rid of cmpstring, but we still generate a
redundant </<= test and branch afterwards, because the compiler
doesn't know that cmpstring only ever returns -1,0,1.
Update #61725
Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/578835
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Any normal float32 constant can be generated by this instruction;
use xxspltidp when possible. This prefixed instruction is much
faster than the two instruction load sequence from the
float32/float64 constant pool.
Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/575555
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
For #66585
Change-Id: Iddc407e3ef4c3b6ecf5173963b66b3e65e43c92d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Remove dynamic checks for atomic instructions for ARM64 targets that support LSE extension.
For #66131
Change-Id: I0ec1b183a3f4ea4c8a537430646e6bc4b4f64271
Reviewed-on: https://go-review.googlesource.com/c/go/+/569536
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Shu-Chun Weng <scw@google.com>
Constant bools are like constant 1-byte values, they memcombine just fine.
(There are still trickier cases that this pass doesn't catch
yet, see TODO at memcombine.go:503.)
Fixes#66413
Change-Id: Ia67cf72ed1c416e27ac22da443bd88a3f09a6cc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/573416
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.
Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
For GOPPC64 < 10 targets, most large 32 bit constants (those
exceeding int16 capacity) can be added using two instructions
instead of 3.
This cannot be done for values greater than 0x7FFF7FFF, so this
must be done during asm preprocessing as the optab matching
rules cannot differentiate this special case.
Likewise, constants 0x8000 <= x < 0x10000 are not converted. The
assembler currently generates 2 instructions sequences for these
constants.
Change-Id: I1ccc839c6c28fc32f15d286b2e52e2d22a2a06d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/568116
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Provide and use rotation pseudo-instructions for riscv64. The RISC-V bitmanip
extension adds support for hardware rotation instructions in the form of ROL,
ROLW, ROR, RORI, RORIW and RORW. These are easily implemented in the assembler
as pseudo-instructions for CPUs that do not support the bitmanip extension.
This approach provides a number of advantages, including reducing the rewrite
rules needed in the compiler, simplifying codegen tests and most importantly,
allowing these instructions to be used in assembly (for example, riscv64
optimised versions of SHA-256 and SHA-512). When bitmanip support is added,
these instruction sequences can simply be replaced with a single instruction
if permitted by the GORISCV64 profile.
Change-Id: Ia23402e1a82f211ac760690deb063386056ae1fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/565015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Currently we use pointer equality on types when deciding whether we can
reuse a stack slot. That's too strict, as we don't guarantee pointer
equality for the same type. In particular, it can vary based on whether
PtrTo has been called in the frontend or not.
Instead, use the type's LinkString, which is guaranteed to both be
unique for a type, and to not vary given two different type structures
describing the same type.
Update #65783
Change-Id: I64f55138475f04bfa30cfb819b786b7cc06aebe4
Reviewed-on: https://go-review.googlesource.com/c/go/+/565436
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Enable canRotate for riscv64, enable rotation intrinsics and provide
better rewrite implementations for rotations. By avoiding Lsh*x64
and Rsh*Ux64 we can produce better code, especially for 32 and 64
bit rotations. By enabling canRotate we also benefit from the generic
rotation rewrite rules.
Benchmark on a StarFive VisionFive 2:
│ rotate.1 │ rotate.2 │
│ sec/op │ sec/op vs base │
RotateLeft-4 14.700n ± 0% 8.016n ± 0% -45.47% (p=0.000 n=10)
RotateLeft8-4 14.70n ± 0% 10.69n ± 0% -27.28% (p=0.000 n=10)
RotateLeft16-4 14.70n ± 0% 12.02n ± 0% -18.23% (p=0.000 n=10)
RotateLeft32-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10)
RotateLeft64-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10)
geomean 14.15n 9.208n -34.92%
Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/560315
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
As CL 514596 and CL 514775 adds hardware implement of float
max/min, we should add codegen test for these two CL.
Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c
Reviewed-on: https://go-review.googlesource.com/c/go/+/561295
Reviewed-by: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
This commit is aimed at improving the readability and consistency
of the code base. Extraneous newline characters were present after
some return statements, creating unnecessary separation in the code.
Fixes#64610
Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/548316
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.
Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.
Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.
Fixes#64285
Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
The shift amounts were wrong in this case, leading to miscompilation
of load combining.
Also the store combining was not triggering when it should.
Fixes#64468
Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b
Reviewed-on: https://go-review.googlesource.com/c/go/+/546355
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
order.go ensures expressions that are passed to the runtime by address
are in fact addressable. However, in the case of local variables, if the
variable hasn't already been marked as addrtaken, then taking its
address here will effectively prevent the variable from being converted
to SSA form.
Instead, it's better to just copy the variable into a new temporary,
which we can pass by address instead. This ensures the original variable
can still be converted to SSA form.
Fixes#63332.
Change-Id: I182376d98d419df8bf07c400d84c344c9b82c0fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/541715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Generate the CC version of many opcodes whose result is compared against
signed 0. The approach taken here works even if the opcode result is used in
multiple places too.
Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions
to their CC opcode variant. These are the most commonly used variants.
Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber
flags.
This results in about 1% smaller text sections in kubernetes binaries,
and no regressions in the crypto benchmarks.
Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda
Reviewed-on: https://go-review.googlesource.com/c/go/+/538636
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
This optab matching rule was used to match signed 16 bit values shifted
left by 16 bits. Unsigned 16 bit values greater than 0x7FFF<<16 were
classified as C_U32CON which led to larger than necessary codegen.
Instead, rewrite logical/arithmetic operations in the preprocessor pass
to use the 16 bit shifted immediate operation (e.g ADDIS vs ADD). This
simplifies the optab matching rules, while also minimizing codegen size
for large unsigned values.
Note, ADDIS sign-extends the constant argument, all others do not.
For matching opcodes, this means:
MOVD $is<<16,Rx becomes ADDIS $is,Rx or ORIS $is,Rx
MOVW $is<<16,Rx becomes ADDIS $is,Rx
ADD $is<<16,[Rx,]Ry becomes ADDIS $is[Rx,]Ry
OR $is<<16,[Rx,]Ry becomes ORIS $is[Rx,]Ry
XOR $is<<16,[Rx,]Ry becomes XORIS $is[Rx,]Ry
Change-Id: I1a988d9f52517a04bb8dc2e41d7caf3d5fff867c
Reviewed-on: https://go-review.googlesource.com/c/go/+/536735
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits. Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.
Here are some examples of code sequences that are changed by this
patch:
int32(a) >> 2
before:
sll x5,x10,0x20
sra x10,x5,0x22
after:
sraw x10,x10,0x2
int32(v) >> int(s)
before:
sext.w x5,x10
sltiu x6,x11,64
add x6,x6,-1
or x6,x11,x6
sra x10,x5,x6
after:
sltiu x5,x11,32
add x5,x5,-1
or x5,x11,x5
sraw x10,x10,x5
int32(v) >> (int(s) & 31)
before:
sext.w x5,x10
and x6,x11,63
sra x10,x5,x6
after:
and x5,x11,31
sraw x10,x10,x5
int32(100) >> int(a)
before:
bltz x10,<target address calls runtime.panicshift>
sltiu x5,x10,64
add x5,x5,-1
or x5,x10,x5
li x6,100
sra x10,x6,x5
after:
bltz x10,<target address calls runtime.panicshift>
sltiu x5,x10,32
add x5,x5,-1
or x5,x10,x5
li x6,100
sraw x10,x6,x5
int32(v) >> (int(s) & 63)
before:
sext.w x5,x10
and x6,x11,63
sra x10,x5,x6
after:
and x5,x11,63
sltiu x6,x5,32
add x6,x6,-1
or x5,x5,x6
sraw x10,x10,x5
In most cases we eliminate one instruction. In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical. A sra is simply replaced by a sraw. In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions. As this is
an unusual case we do not try to optimize for it.
Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.
| utf8-old | utf8-new |
| sec/op | sec/op vs base |
EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10)
EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10)
AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10)
AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10)
DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10)
Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Most of the test cases in the test directory use the new go:build syntax
already. Convert the rest. In general, try to place the build constraint
line below the test directive comment in more places.
For #41184.
For #60268.
Change-Id: I11c41a0642a8a26dc2eda1406da908645bbc005b
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/536236
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit 061d77cb70 was published in parallel with another commit
36ecff0893 which changed how certain constants were generated.
Update the test to account for the changes.
Change-Id: I314b735a34857efa02392b7a0dd9fd634e4ee428
Reviewed-on: https://go-review.googlesource.com/c/go/+/536256
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
This is only supported power10/linux/PPC64. This generates smaller,
faster code by merging a pli + add into paddi.
Change-Id: I1f4d522fce53aea4c072713cc119a9e0d7065acc
Reviewed-on: https://go-review.googlesource.com/c/go/+/531717
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.
In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.
Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.
Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
var p *[2]uint32 = ...
p[0] = 0
p[1] = 0
When we combine these two 32-bit stores into a single 64-bit store,
use the line number of the first store, not the second one.
This differs from the default behavior because usually with the combining
that the compiler does, we use the line number of the last instruction
in the combo (e.g. load+add, we use the line number of the add).
This is the same behavior that gcc does in C (picking the line
number of the first of a set of combined stores).
Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784
Reviewed-on: https://go-review.googlesource.com/c/go/+/521678
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
It is one less dependent load away, and right next to another
field in the itab we also load as part of the type switch or
type assert.
Change-Id: If7aaa7814c47bd79a6c7ed4232ece0bc1d63550e
Reviewed-on: https://go-review.googlesource.com/c/go/+/533117
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This is the last of the getitab users to receive a cache.
We should now no longer see getitab (and callees) in profiles.
Hopefully.
Change-Id: I2ed72b9943095bbe8067c805da7f08e00706c98c
Reviewed-on: https://go-review.googlesource.com/c/go/+/531055
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Convert expand calls into a smaller number of focused
recursive rewrites, and rely on an enhanced version of
"decompose" to clean up afterwards.
Debugging information seems to emerge intact.
Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507295
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
That way we don't need to call into the runtime for every
type assertion (to an interface type).
name old time/op new time/op delta
TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8)
Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde
Reviewed-on: https://go-review.googlesource.com/c/go/+/529316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
That way we don't need to call into the runtime when the type being
switched on has been seen many times before.
The cache is just a hash table of a sample of all the concrete types
that have been switched on at that source location. We record the
matching case number and the resulting itab for each concrete input
type.
The caches seldom get large. The only two in a run of all.bash that
get more than 100 entries, even with the sampling rate set to 1, are
test/fixedbugs/issue29264.go, with 101
test/fixedbugs/issue29312.go, with 254
Both happen at the type switch in fmt.(*pp).handleMethods, perhaps
unsurprisingly.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10)
Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/526658
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
For type switches where the targets are interface types,
call into the runtime once instead of doing a sequence
of assert* calls.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10)
Not super helpful by itself, but this code organization allows
followon CLs that add caching to the lookups.
Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c
Reviewed-on: https://go-review.googlesource.com/c/go/+/526657
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Add a new form of RLDC which maps directly to the ISA definition
of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask
constants described below.
Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant
can be generated. A mask is a contiguous series of 1 bits, which
may wrap.
Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282
Reviewed-on: https://go-review.googlesource.com/c/go/+/531120
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Check for shifted 16b constants, and transform them to avoid the load
penalty. This should be much faster than loading, and reduce binary
size by reducing the constant pool size.
Change-Id: I6834e08be7ca88e3b77449d226d08d199db84299
Reviewed-on: https://go-review.googlesource.com/c/go/+/531119
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.
Also, print more debug info if panic is called from
encodePPC64RotateMask.
Fixes#62698
Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
sparse conditional constant propagation can discover optimization
opportunities that cannot be found by just combining constant folding
and constant propagation and dead code elimination separately.
This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck
Updates https://github.com/golang/go/issues/59399
Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47
GitHub-Last-Rev: 830016f24e
GitHub-Pull-Request: golang/go#60469
Reviewed-on: https://go-review.googlesource.com/c/go/+/498795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz.
This helps reduce code size, and reduces the latency caused
by the constant load.
Similarly, for smaller-than-register values, truncate constants
which exceed the range of the value's type to avoid needing to
load a constant.
Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64
Reviewed-on: https://go-review.googlesource.com/c/go/+/515576
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
For large interface -> concrete type switches, we can use a jump
table on some bits of the type hash instead of a binary search on
the type hash.
name old time/op new time/op delta
SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10)
SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9)
Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259
Reviewed-on: https://go-review.googlesource.com/c/go/+/521497
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
This lets us combine more write barriers, getting rid of some of the
test+branch and gcWriteBarrier* calls.
With the new write barriers, it's easy to add a few non-pointer writes
to the set of values written.
We allow up to 2 non-pointer writes between pointer writes. This is enough
for, for example, adjacent slice fields.
Fixes#62126
Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/521498
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv
Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
If we're not using the upper bits, don't bother issuing a
sign/zero extension operation.
For arm64, after CL 520916 which fixed a correctness bug with
extensions but as a side effect leaves many unnecessary ones
still in place.
Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/521496
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Stop using BTSconst and friends when ORLconst can be used instead.
OR can be issued by more function units than BTS can, so it could
lead to better IPC. OR might take a few more bytes to encode, but
not a lot more.
Still use BTSconst for cases where the constant otherwise wouldn't
fit and would require a separate movabs instruction to materialize
the constant. This happens when setting bits 31-63 of 64-bit targets.
Add BTS-to-memory operations so we don't need to load/bts/store.
Fixes#61694
Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c
Reviewed-on: https://go-review.googlesource.com/c/go/+/515755
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Fixes#61629
This reduce the pressure on regalloc because then the loop only keep alive
one value (the iterator) instead of the iterator and the upper bound since
the comparison now acts against an immediate, often zero which can be skipped.
This optimize things like:
for i := 0; i < n; i++ {
Or a range over a slice where the index is not used:
for _, v := range someSlice {
Or the new range over int from #61405:
for range n {
It is hit in 975 unique places while doing ./make.bash.
Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f
Reviewed-on: https://go-review.googlesource.com/c/go/+/512935
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
s==s is always true for strings. This comes up in NaN testing in
generic code, where we want x==x to compile completely away except for
float types.
Fixes#60777
Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637
Reviewed-on: https://go-review.googlesource.com/c/go/+/503795
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
If we load 2 values and then store those 2 loaded values, we can likely
perform that operation with a single wider load and store.
Fixes#60709
Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770
Reviewed-on: https://go-review.googlesource.com/c/go/+/502295
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero
extending. Otherwise, sign bits are lost on negative values.
(ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero
extension of x. Likewise for the MOVHreg variant.
Fixes#61297
Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b
Reviewed-on: https://go-review.googlesource.com/c/go/+/508775
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
According to RISCV manual 11.6:
FMADD x,y,z computes x*y+z and
FNMADD x,y,z => -x*y-z
FMSUB x,y,z => x*y-z
FNMSUB x,y,z => -x*y+z respectively
However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which
is wrong and should be convert to FNMSUB according to manual.
Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575
Reviewed-on: https://go-review.googlesource.com/c/go/+/506575
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Don't use the line number of the argument itself, as that may be from
arbitrarily earlier in the function.
Fixes#60673
Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/502656
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by
skipping these tests.
Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16
Reviewed-on: https://go-review.googlesource.com/c/go/+/499675
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>