Commit Graph

5253 Commits

Author SHA1 Message Date
limeidan 09d76e59d2 cmd/compile: set unalignedOK to make memcombine work properly on loong64
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
ValidTenASCIIChars            7.604n ± 0%   6.805n ± 0%  -10.51% (p=0.000 n=10)
Valid100KASCIIChars           37.41µ ± 0%   16.58µ ± 0%  -55.67% (p=0.000 n=10)
ValidTenJapaneseChars         60.84n ± 0%   58.62n ± 0%   -3.64% (p=0.000 n=10)
ValidLongMostlyASCII          113.5µ ± 0%   113.5µ ± 0%        ~ (p=0.303 n=10)
ValidLongJapanese             204.6µ ± 0%   206.8µ ± 0%   +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars      7.604n ± 0%   6.803n ± 0%  -10.53% (p=0.000 n=10)
ValidString100KASCIIChars     38.05µ ± 0%   17.14µ ± 0%  -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars   60.58n ± 0%   59.48n ± 0%   -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII    113.5µ ± 0%   113.4µ ± 0%   -0.10% (p=0.000 n=10)
ValidStringLongJapanese       205.9µ ± 0%   207.3µ ± 0%   +0.67% (p=0.000 n=10)
geomean                       3.324µ        2.756µ       -17.08%

Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-09 09:18:20 -07:00
Keith Randall af278bfb1f cmd/compile: add additional flag constant folding rules
Fixes #73200

Change-Id: I77518d37acd838acf79ed113194bac5e2c30897f
Reviewed-on: https://go-review.googlesource.com/c/go/+/663535
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-04-07 22:48:32 -07:00
Keith Randall 16dbd2be39 cmd/compile: be more conservative about arm64 insns that can take zero register
It's really only needed for stores and store-like instructions
(atomic exchange, compare-and-swap, ...).

Fixes #73180

Change-Id: I8ecd833a301355adf0fa4bff43250091640c6226
Reviewed-on: https://go-review.googlesource.com/c/go/+/663155
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-06 19:11:43 -07:00
Keith Randall 2d050e91a3 cmd/compile: allow pointer-containing elements in stack allocations
For variable-sized allocations.

Turns out that we already implement the correct escape semantics
for this case. Even when the result of the "make" does not escape,
everything assigned into it does.

Change-Id: Ia123c538d39f2f1e1581c24e4135a65af3821c5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/657937
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-04 15:04:34 -07:00
Keith Randall 7a427143b6 cmd/compile: stack allocate variable-sized makeslice
Instead of always allocating variable-sized "make" calls on the heap,
allocate a small, constant-sized array on the stack and use that array
as the backing store if it is big enough.

Requires the result of the "make" doesn't escape.

  if cap <= K {
      var arr [K]E
      slice = arr[:len:cap]
  } else {
      slice = makeslice(E, len, cap)
  }

Pretty conservatively for now, K = 32/sizeof(E). The slice header is
already 24 bytes, so wasting 32 bytes of stack if the requested size
is too big isn't that bad. Larger would waste more stack space but
maybe avoid more allocations.

This CL also requires the element type be pointer-free.  Maybe we
could relax that at some point, but it is hard. If the element type
has pointers we can get heap->stack pointers (in the case where the
requested size is too big and the slice is heap allocated).

Note that this only handles the case of makeslice called directly from
compiler-generated code. It does not handle slices built in the
runtime on behalf of the program (e.g. in growslice). Some of those
are currently handled by passing in a tmpBuf (e.g. concatstrings),
but we could probably do more.

Change-Id: I8378efad527cd00d25948a80b82a68d88fbd93a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/653856
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-04 10:36:58 -07:00
Alexander Musman 16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Joel Sing e6c2e12c63 cmd/compile/internal/ssa: optimise more branches with zero on riscv64
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:

   li      t2, 0
   bltu    t2, t1, 0x174b4

This is equivalent to checking if the bounds is not equal to zero. With
this change:

   bnez    t1, 0x174c0

This removes more than 500 instructions from the Go binary on riscv64.

Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 01:27:22 -07:00
Mark Freeman 6722c008c1 cmd/compile: rename some test packages in codegen
All other files here use the codegen package.

Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-27 13:54:37 -07:00
Joel Sing 6bf95d40bb test/codegen: add combined conversion and shift tests
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.

Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-03-25 06:53:49 -07:00
Austin Clements 5918101d67 testing: detect a stopped timer in B.Loop
Currently, if the user stops the timer in a B.Loop benchmark loop, the
benchmark will run until it hits the timeout and fails.

Fix this by detecting that the timer is stopped and failing the
benchmark right away. We avoid making the fast path more expensive for
this check by "poisoning" the B.Loop iteration counter when the timer
is stopped so that it falls back to the slow path, which can check the
timer.

This causes b to escape from B.Loop, which is totally harmless because
it was already definitely heap-allocated. But it causes the
test/inline_testingbloop.go errorcheck test to fail. I don't think the
escape messages actually mattered to that test, they just had to be
matched. To fix this, we drop the debug level to -m=1, since -m=2
prints a lot of extra information for escaping parameters that we
don't want to deal with, and change one error check to allow b to
escape.

Fixes #72971.

Change-Id: I7d4abbb1ec1e096685514536f91ba0d581cca6b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/659657
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-24 11:41:09 -07:00
Joel Sing b70244ff7a cmd/compile: intrinsify math/bits.Len on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │   clz.b.1   │               clz.b.2               │
                 │   sec/op    │   sec/op     vs base                │
LeadingZeros-4     28.89n ± 0%   12.08n ± 0%  -58.19% (p=0.000 n=10)
LeadingZeros8-4    18.79n ± 0%   14.76n ± 0%  -21.45% (p=0.000 n=10)
LeadingZeros16-4   25.27n ± 0%   14.76n ± 0%  -41.59% (p=0.000 n=10)
LeadingZeros32-4   25.12n ± 0%   12.08n ± 0%  -51.92% (p=0.000 n=10)
LeadingZeros64-4   25.89n ± 0%   12.08n ± 0%  -53.35% (p=0.000 n=10)
geomean            24.55n        13.09n       -46.70%

Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-21 18:21:44 -07:00
Keith Randall deb6790fcf cmd/compile: remove implicit deref from len(p) where p is ptr-to-array
func f() *[4]int { return nil }
_ = len(f())

should not panic. We evaluate f, but there isn't a dereference
according to the spec (just "arg is evaluated").

Update #72844

Change-Id: Ia32cefc1b7aa091cd1c13016e015842b4d12d5b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/658096
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-19 09:55:46 -07:00
Joel Sing 6fb7bdc96d cmd/compile: intrinsify math/bits.TrailingZeros on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                  │   ctz.b.1    │               ctz.b.2               │
                  │    sec/op    │   sec/op     vs base                │
TrailingZeros-4     25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros8-4     14.76n ± 0%   10.74n ± 0%  -27.24% (p=0.000 n=10)
TrailingZeros16-4    26.84n ± 0%   10.74n ± 0%  -59.99% (p=0.000 n=10)
TrailingZeros32-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros64-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
geomean              23.09n        9.035n       -60.88%

Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2025-03-15 19:07:53 -07:00
Joel Sing 21417518a9 cmd/compile: combine negation and word sign extension on riscv64
Use NEGW to produce a negated and sign extended word, rather than doing
the same via two instructions:

   neg     t0, t0
   sext.w  a0, t0

Becomes:

   negw    t0, t0

Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212
Reviewed-on: https://go-review.googlesource.com/c/go/+/652319
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-15 06:05:16 -07:00
Joel Sing 10d070668c cmd/compile/internal/ssa: remove double negation with addition on riscv64
On riscv64, subtraction from a constant is typically implemented as an
ADDI with the negative constant, followed by a negation. However this can
lead to multiple NEG/ADDI/NEG sequences that can be optimised out.

For example, runtime.(*_panic).nextDefer currently contains:

   lbu     t0, 0(t0)
   addi    t0, t0, -8
   neg     t0, t0
   addi    t0, t0, -7
   neg     t0, t0

Which is now optimised to:

   lbu     t0, 0(t0)
   addi    t0, t0, -1

Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80
Reviewed-on: https://go-review.googlesource.com/c/go/+/652318
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:04:28 -07:00
Joel Sing a8f2e63f2f test/codegen: add a test for negation and conversion to int32
Codify the current code generation used on riscv64 in this case.

Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/652317
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-15 06:02:57 -07:00
Joel Sing e1f9013a58 test/codegen: add riscv64 codegen for arithmetic tests
Codify the current riscv64 code generation for various subtract from
constant and addition/subtraction tests.

Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729
Reviewed-on: https://go-review.googlesource.com/c/go/+/652316
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:02:27 -07:00
Joel Sing c01fa0cc21 test/codegen: add riscv64/rva23u64 specifiers to existing tests
Tests that exist for riscv64/rva22u64 should also be applied to
riscv64/rva23u64.

Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842
Reviewed-on: https://go-review.googlesource.com/c/go/+/652315
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 05:58:43 -07:00
Joel Sing c1c7e5902f test/codegen: tighten the TrailingZeros64 test on 386
Make the TrailingZeros64 code generation check more specific for 386.
Just checking for BSFL will match both the generic 64 bit decomposition
and the custom 386 lowering.

Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656996
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-14 15:04:38 -07:00
Keith Randall a1ddbdd3ef cmd/compile: don't move nilCheck operations during tighten
Nil checks need to stay in their original blocks. They cannot
be moved to a following conditionally-executed block.

Fixes #72860

Change-Id: Ic2d66cdf030357d91f8a716a004152ba4c016f77
Reviewed-on: https://go-review.googlesource.com/c/go/+/657715
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-13 21:24:20 -07:00
Joel Sing af92bb594d test/codegen: remove plan9/amd64 specific array zeroing/copying tests
The compiler previously avoided the use of MOVUPS on plan9/amd64. This
was changed in CL 655875, however the codegen tests were not updated
and now fail (seemingly the full codegen tests do not run anywhere,
not even on the longtest builders).

Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/656997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
2025-03-13 05:19:13 -07:00
Xiaolin Zhao b143c98169 cmd/compile: simplify bounded shift on loong64
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros      1.1005n ± 0%   0.8425n ± 1%  -23.44% (p=0.000 n=10)
LeadingZeros8      1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.001 n=10)
LeadingZeros16     1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.000 n=10)
LeadingZeros32    0.9511n ± 0%   0.8050n ± 0%  -15.36% (p=0.000 n=10)
LeadingZeros64    1.1195n ± 0%   0.8423n ± 0%  -24.76% (p=0.000 n=10)
TrailingZeros     0.8086n ± 0%   0.8005n ± 0%   -1.00% (p=0.000 n=10)
TrailingZeros8     1.031n ± 1%    1.035n ± 1%        ~ (p=0.136 n=10)
TrailingZeros16   0.8114n ± 0%   0.8254n ± 1%   +1.73% (p=0.000 n=10)
TrailingZeros32   0.8090n ± 0%   0.8005n ± 0%   -1.05% (p=0.000 n=10)
TrailingZeros64   0.8089n ± 1%   0.8005n ± 0%   -1.04% (p=0.000 n=10)
OnesCount         0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount8        0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
OnesCount16       0.9344n ± 0%   1.2010n ± 0%  +28.53% (p=0.000 n=10)
OnesCount32       0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount64       1.2010n ± 0%   0.8671n ± 0%  -27.80% (p=0.000 n=10)
RotateLeft        0.8009n ± 0%   0.6671n ± 0%  -16.71% (p=0.000 n=10)
RotateLeft8        1.202n ± 0%    1.327n ± 0%  +10.40% (p=0.000 n=10)
RotateLeft16      0.8036n ± 0%   0.8218n ± 0%   +2.26% (p=0.000 n=10)
RotateLeft32      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
RotateLeft64      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
Reverse           0.4067n ± 1%   0.4122n ± 1%   +1.38% (p=0.001 n=10)
Reverse8          0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Reverse16         0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
Reverse32         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.001 n=10)
Reverse64         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.008 n=10)
ReverseBytes      0.4057n ± 1%   0.4133n ± 1%   +1.90% (p=0.000 n=10)
ReverseBytes16    0.8009n ± 0%   0.8004n ± 0%   -0.07% (p=0.000 n=10)
ReverseBytes32    0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
ReverseBytes64    0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Add64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add64multiple      1.832n ± 0%    1.828n ± 0%   -0.22% (p=0.001 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.602n ± 0%    1.601n ± 0%   -0.06% (p=0.000 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Sub64multiple      2.402n ± 0%    2.400n ± 0%   -0.10% (p=0.000 n=10)
Mul               0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul32             0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul64             0.8008n ± 0%   0.8004n ± 0%   -0.05% (p=0.000 n=10)
Div                9.083n ± 0%    7.638n ± 0%  -15.91% (p=0.000 n=10)
Div32              4.011n ± 0%    4.009n ± 0%   -0.05% (p=0.000 n=10)
Div64              9.711n ± 0%    8.204n ± 0%  -15.51% (p=0.000 n=10)
geomean            1.083n         1.078n        -0.40%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.341n ± 4%    1.331n ± 2%   -0.71% (p=0.008 n=10)
LeadingZeros8      1.781n ± 0%    1.766n ± 1%   -0.84% (p=0.011 n=10)
LeadingZeros16     1.782n ± 0%    1.767n ± 0%   -0.79% (p=0.001 n=10)
LeadingZeros32     1.341n ± 1%    1.333n ± 0%   -0.52% (p=0.001 n=10)
LeadingZeros64     1.338n ± 0%    1.333n ± 0%   -0.37% (p=0.008 n=10)
TrailingZeros     0.9025n ± 0%   0.8077n ± 0%  -10.50% (p=0.000 n=10)
TrailingZeros8     1.056n ± 0%    1.089n ± 1%   +3.17% (p=0.001 n=10)
TrailingZeros16    1.101n ± 0%    1.102n ± 0%   +0.09% (p=0.011 n=10)
TrailingZeros32   0.9024n ± 1%   0.8083n ± 0%  -10.43% (p=0.000 n=10)
TrailingZeros64   0.9028n ± 1%   0.8087n ± 0%  -10.43% (p=0.000 n=10)
OnesCount          1.482n ± 1%    1.302n ± 0%  -12.15% (p=0.000 n=10)
OnesCount8         1.206n ± 0%    1.207n ± 2%   +0.12% (p=0.000 n=10)
OnesCount16        1.534n ± 0%    1.402n ± 0%   -8.58% (p=0.000 n=10)
OnesCount32        1.531n ± 1%    1.302n ± 0%  -14.99% (p=0.000 n=10)
OnesCount64        1.302n ± 0%    1.538n ± 1%  +18.16% (p=0.000 n=10)
RotateLeft        0.8083n ± 0%   0.8087n ± 1%        ~ (p=0.579 n=10)
RotateLeft8        1.310n ± 0%    1.323n ± 0%   +0.95% (p=0.001 n=10)
RotateLeft16       1.149n ± 0%    1.165n ± 1%   +1.35% (p=0.001 n=10)
RotateLeft32      0.8093n ± 0%   0.8105n ± 0%        ~ (p=0.393 n=10)
RotateLeft64      0.8088n ± 0%   0.8090n ± 0%        ~ (p=0.739 n=10)
Reverse           0.5109n ± 0%   0.5172n ± 1%   +1.25% (p=0.000 n=10)
Reverse8          0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse16         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.002 n=10)
Reverse32         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse64         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes      0.5122n ± 2%   0.5182n ± 1%        ~ (p=0.060 n=10)
ReverseBytes16    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes32    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes64    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.001 n=10)
Add                1.201n ± 4%    1.202n ± 0%   +0.08% (p=0.028 n=10)
Add32              1.201n ± 0%    1.202n ± 2%   +0.08% (p=0.014 n=10)
Add64              1.201n ± 1%    1.202n ± 0%   +0.08% (p=0.025 n=10)
Add64multiple      1.902n ± 0%    1.913n ± 0%   +0.55% (p=0.004 n=10)
Sub                1.201n ± 0%    1.202n ± 3%   +0.08% (p=0.001 n=10)
Sub32              1.654n ± 0%    1.656n ± 1%        ~ (p=0.117 n=10)
Sub64              1.201n ± 0%    1.202n ± 0%   +0.08% (p=0.001 n=10)
Sub64multiple      2.180n ± 4%    2.159n ± 1%   -0.96% (p=0.006 n=10)
Mul               0.9345n ± 0%   0.9346n ± 0%   +0.01% (p=0.000 n=10)
Mul32              1.030n ± 0%    1.050n ± 1%   +1.94% (p=0.000 n=10)
Mul64             0.9345n ± 0%   0.9346n ± 1%   +0.01% (p=0.000 n=10)
Div                11.57n ± 1%    11.12n ± 0%   -3.85% (p=0.000 n=10)
Div32              4.337n ± 1%    4.341n ± 1%        ~ (p=0.286 n=10)
Div64              12.76n ± 0%    12.02n ± 3%   -5.80% (p=0.000 n=10)
geomean            1.252n         1.235n        -1.32%

Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-12 18:18:03 -07:00
Jorropo 99411d7847 cmd/compile: compute bits.OnesCount's limits from argument's limits
Change-Id: Ia90d48ea0fab363c8592221fad88958b522edefe
Reviewed-on: https://go-review.googlesource.com/c/go/+/656159
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-11 20:17:36 -07:00
Jorropo c00647b49b cmd/compile: set bits.OnesCount's limits to [0, 64]
Change-Id: I2f60de836f58ef91baae856f44d8f73c190326f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/656158
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-11 19:52:29 -07:00
Jorropo 554a3c51dc cmd/compile: use min & max builtins to assert constant bounds in prove's tests
I've originally used |= and &= to setup assumptions exploitable by the
operation under test but theses have multiple issues making it poor
for this usecase:
- &= does not pass the minimum value as-is, rather always set it to 0
- |= rounds up the max value to a number of the same length with all ones set
- I've never implemented them to work with negative signed numbers

Change-Id: Ie43c576fb10393e69d6f989b048823daa02b1df8
Reviewed-on: https://go-review.googlesource.com/c/go/+/656160
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 19:51:59 -07:00
Jorropo d2842229fc cmd/compile: compute min's & max's limits from argument's limits inside flowLimit
Updates #68857

Change-Id: Ied07e656bba42f3b1b5f9b9f5442806aa2e7959b
Reviewed-on: https://go-review.googlesource.com/c/go/+/656157
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 19:51:31 -07:00
Michael Pratt 3fb8b4f3db all: move //go:debug decoratemappings=0 test to cmd/go
test/decoratemappingszero.go is intended to test that
//go:debug decoratemappings=0 disables annonations.

Unfortunately, //go:debug processing is handled by cmd/go, but
cmd/internal/testdir (which runs tests from test/) generally invokes the
compiler directly, thus it does not set default GODEBUGs.

Move this test to the cmd/go script tests, alongside the similar test
for language version.

Fixes #72772.

Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64le_power10
Change-Id: I6a6a636c9d380ef984f760be5689fdc7f5cb2aeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/656795
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2025-03-11 15:07:33 -07:00
Alexander Musman 6c70f2b960 cmd/compile: Enable inlining of tail calls
Enable inlining tail calls and do not limit emitting tail calls only to the
non-inlineable methods when generating wrappers. This change produces
additional code size reduction.

 Code size difference measured with this change (tried for x86_64):
    etcd binary:
    .text section size: 10613393 -> 10593841 (0.18%)
    total binary size:  33450787 -> 33424307 (0.07%)

    compile binary:
    .text section size: 10171025 -> 10126545 (0.43%)
    total binary size:  28241012 -> 28192628 (0.17%)

    cockroach binary:
    .text section size:  83947260 -> 83694140  (0.3%)
    total binary size:  263799808 -> 263534160 (0.1%)

Change-Id: I694f83cb838e64bd4c51f05b7b9f2bf0193bb551
Reviewed-on: https://go-review.googlesource.com/c/go/+/650455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-11 14:18:43 -07:00
Robert Griesemer ae4c13afc5 go/types, types2: report better error messages for slice expressions
Explicitly compute the common underlying type and while doing
so report better slice-expression relevant error messages.
Streamline message format for index and slice errors.

This removes the last uses of the coreString and match functions.
Delete them.

Change-Id: I4b50dda1ef7e2ab5e296021458f7f0b6f6e229cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/655935
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2025-03-11 05:44:15 -07:00
Robert Griesemer 2d097e363a go/types, types2: better error messages for copy built-in
Rather than relying on coreString, use the new commonUnder function
to determine the argument slice element types.

Factor out this functionality, which is shared for append and copy,
into a new helper function sliceElem (similar to chanElem).
Use sliceElem for both the append and copy implementation.
As a result, the error messages for invalid copy calls are
now more detailed.

While at it, handle the special cases for append and copy first
because they don't need the slice element computation.

Finally, share the same type recording code for the special and
general cases.

As an aside, in commonUnder, be clearer in the code that the
result is either a nil type and an error, or a non-nil type
and a nil error. This matches in style what we do in sliceElem.

Change-Id: I318bafc0d2d31df04f33b1b464ad50d581918671
Reviewed-on: https://go-review.googlesource.com/c/go/+/655675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-10 21:30:51 -07:00
Xiaolin Zhao 2a772a2fe7 cmd/compile: optimize shifts of int32 and uint32 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.100n ± 1%    1.101n ± 0%        ~ (p=0.566 n=10)
LeadingZeros8      1.501n ± 0%    1.502n ± 0%   +0.07% (p=0.000 n=10)
LeadingZeros16     1.501n ± 0%    1.502n ± 0%   +0.07% (p=0.000 n=10)
LeadingZeros32    1.2010n ± 0%   0.9511n ± 0%  -20.81% (p=0.000 n=10)
LeadingZeros64     1.104n ± 1%    1.119n ± 0%   +1.40% (p=0.000 n=10)
TrailingZeros     0.8137n ± 0%   0.8086n ± 0%   -0.63% (p=0.001 n=10)
TrailingZeros8     1.031n ± 1%    1.031n ± 1%        ~ (p=0.956 n=10)
TrailingZeros16   0.8204n ± 1%   0.8114n ± 0%   -1.11% (p=0.000 n=10)
TrailingZeros32   0.8145n ± 0%   0.8090n ± 0%   -0.68% (p=0.000 n=10)
TrailingZeros64   0.8159n ± 0%   0.8089n ± 1%   -0.86% (p=0.000 n=10)
OnesCount         0.8672n ± 0%   0.8677n ± 0%   +0.06% (p=0.000 n=10)
OnesCount8        0.8005n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
OnesCount16       0.9339n ± 0%   0.9344n ± 0%   +0.05% (p=0.000 n=10)
OnesCount32       0.8672n ± 0%   0.8677n ± 0%   +0.06% (p=0.000 n=10)
OnesCount64        1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
RotateLeft        0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
RotateLeft8        1.202n ± 0%    1.202n ± 0%        ~ (p=0.210 n=10)
RotateLeft16      0.8050n ± 0%   0.8036n ± 0%   -0.17% (p=0.002 n=10)
RotateLeft32      0.6674n ± 0%   0.6674n ± 0%        ~ (p=1.000 n=10)
RotateLeft64      0.6673n ± 0%   0.6674n ± 0%        ~ (p=0.072 n=10)
Reverse           0.4123n ± 0%   0.4067n ± 1%   -1.37% (p=0.000 n=10)
Reverse8          0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Reverse16         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
Reverse32         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
Reverse64         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.001 n=10)
ReverseBytes      0.4100n ± 1%   0.4057n ± 1%   -1.06% (p=0.002 n=10)
ReverseBytes16    0.8004n ± 0%   0.8009n ± 0%   +0.07% (p=0.000 n=10)
ReverseBytes32    0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
ReverseBytes64    0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Add64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add64multiple      1.831n ± 0%    1.832n ± 0%        ~ (p=1.000 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.601n ± 0%    1.602n ± 0%   +0.06% (p=0.000 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Sub64multiple      2.400n ± 0%    2.402n ± 0%   +0.10% (p=0.000 n=10)
Mul               0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Mul32             0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Mul64             0.8004n ± 0%   0.8008n ± 0%   +0.05% (p=0.000 n=10)
Div                9.107n ± 0%    9.083n ± 0%        ~ (p=0.255 n=10)
Div32              4.009n ± 0%    4.011n ± 0%   +0.05% (p=0.000 n=10)
Div64              9.705n ± 0%    9.711n ± 0%   +0.06% (p=0.000 n=10)
geomean            1.089n         1.083n        -0.62%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.352n ± 0%    1.341n ± 4%   -0.81% (p=0.024 n=10)
LeadingZeros8      1.766n ± 0%    1.781n ± 0%   +0.88% (p=0.000 n=10)
LeadingZeros16     1.766n ± 0%    1.782n ± 0%   +0.88% (p=0.000 n=10)
LeadingZeros32     1.536n ± 0%    1.341n ± 1%  -12.73% (p=0.000 n=10)
LeadingZeros64     1.351n ± 1%    1.338n ± 0%   -0.96% (p=0.000 n=10)
TrailingZeros     0.9037n ± 0%   0.9025n ± 0%   -0.12% (p=0.020 n=10)
TrailingZeros8     1.087n ± 3%    1.056n ± 0%        ~ (p=0.060 n=10)
TrailingZeros16    1.101n ± 0%    1.101n ± 0%        ~ (p=0.211 n=10)
TrailingZeros32   0.9040n ± 0%   0.9024n ± 1%   -0.18% (p=0.017 n=10)
TrailingZeros64   0.9043n ± 0%   0.9028n ± 1%        ~ (p=0.118 n=10)
OnesCount          1.503n ± 2%    1.482n ± 1%   -1.43% (p=0.001 n=10)
OnesCount8         1.207n ± 0%    1.206n ± 0%   -0.12% (p=0.000 n=10)
OnesCount16        1.501n ± 0%    1.534n ± 0%   +2.13% (p=0.000 n=10)
OnesCount32        1.483n ± 1%    1.531n ± 1%   +3.27% (p=0.000 n=10)
OnesCount64        1.301n ± 0%    1.302n ± 0%   +0.08% (p=0.000 n=10)
RotateLeft        0.8136n ± 4%   0.8083n ± 0%   -0.66% (p=0.002 n=10)
RotateLeft8        1.311n ± 0%    1.310n ± 0%        ~ (p=0.786 n=10)
RotateLeft16       1.165n ± 0%    1.149n ± 0%   -1.33% (p=0.001 n=10)
RotateLeft32      0.8138n ± 1%   0.8093n ± 0%   -0.57% (p=0.017 n=10)
RotateLeft64      0.8149n ± 1%   0.8088n ± 0%   -0.74% (p=0.000 n=10)
Reverse           0.5195n ± 1%   0.5109n ± 0%   -1.67% (p=0.000 n=10)
Reverse8          0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Reverse16         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Reverse32         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.012 n=10)
Reverse64         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.010 n=10)
ReverseBytes      0.5120n ± 1%   0.5122n ± 2%        ~ (p=0.306 n=10)
ReverseBytes16    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
ReverseBytes32    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
ReverseBytes64    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 4%        ~ (p=0.334 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.563 n=10)
Add64              1.201n ± 0%    1.201n ± 1%        ~ (p=0.652 n=10)
Add64multiple      1.909n ± 0%    1.902n ± 0%        ~ (p=0.126 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.655n ± 0%    1.654n ± 0%        ~ (p=0.589 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub64multiple      2.150n ± 0%    2.180n ± 4%   +1.37% (p=0.000 n=10)
Mul               0.9341n ± 0%   0.9345n ± 0%   +0.04% (p=0.011 n=10)
Mul32              1.053n ± 0%    1.030n ± 0%   -2.23% (p=0.000 n=10)
Mul64             0.9341n ± 0%   0.9345n ± 0%   +0.04% (p=0.018 n=10)
Div                11.59n ± 0%    11.57n ± 1%        ~ (p=0.091 n=10)
Div32              4.337n ± 0%    4.337n ± 1%        ~ (p=0.783 n=10)
Div64              12.81n ± 0%    12.76n ± 0%   -0.39% (p=0.001 n=10)
geomean            1.257n         1.252n        -0.46%

Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/627855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-03-10 17:55:10 -07:00
Michael Pratt c40a3731f4 internal/godebugs: add decoratemappings as an opaque godebug setting
This adds a new godebug to control whether the runtime applies the
anonymous memory mapping annotations added in https://go.dev/cl/646095.
It is enabled by default.

This has several effects:

* The feature is only enabled by default when the main go.mod has go >=
  1.25.
* This feature can be disabled with GODEBUG=decoratemappings=0, or the
  equivalents in go.mod or package main. See https://go.dev/doc/godebug.
* As an opaque setting, this option will not appear in runtime/metrics.
* This setting is non-atomic, so it cannot be changed after startup.

I am not 100% sure about my decision for the last two points.

I've made this an opaque setting because it affects every memory mapping
the runtime performs. Thus every mapping would report "non-default
behavior", which doesn't seem useful.

This setting could trivially be atomic and allow changes at run time,
but those changes would only affect future mappings. That seems
confusing and not helpful. On the other hand, going back to annotate or
unannotate every previous mapping when the setting changes is
unwarranted complexity.

For #71546.

Change-Id: I6a6a636c5ad551d76691cba2a6f668d5cff0e352
Reviewed-on: https://go-review.googlesource.com/c/go/+/655895
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2025-03-10 08:29:59 -07:00
Robert Griesemer 584e631023 go/types, types2: better error messages for invalid calls
Rather than reporting "non-function" for an invalid type parameter,
report which type in the type parameter's type set is not a function.

Change-Id: I8beec25cc337bae8e03d23e62d97aa82db46bab4
Reviewed-on: https://go-review.googlesource.com/c/go/+/654475
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-06 13:35:49 -08:00
David Chase 1cf6b50263 cmd/compile: remove no-longer-necessary recursive inlining checks
this does result in a little bit more inlining,
cmd/compile text is 0.5% larger,
bent-benchmark text geomeans grow by only 0.02%.
some of our tests make assumptions about inlining.

Change-Id: I999d1798aca5dc64a1928bd434258a61e702951a
Reviewed-on: https://go-review.googlesource.com/c/go/+/655157
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-03-06 10:07:17 -08:00
David Chase cad4dca518 cmd/compile: use inline-Pos-based recursion test
Look at the inlining stack of positions for a call site,
if the line/col/file of the call site appears in that
stack, do not inline.  This subsumes all the other
recently-added recursive inlining checks, but they are
left in to make this easier+safer to backport.

Fixes #72090

Change-Id: I0f487bb0d4c514015907c649312672b7be464abd
Reviewed-on: https://go-review.googlesource.com/c/go/+/655155
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-03-05 15:02:38 -08:00
Cuong Manh Le 4f45b2b7e0 cmd/compile: fix out of memory when inlining closure
CL 629195 strongly favor closure inlining, allowing closures to be
inlined more aggressively.

However, if the closure body contains a call to a function, which itself
is one of the call arguments, it causes the infinite inlining.

Fixing this by prevent this kind of functions from being inlinable.

Fixes #72063

Change-Id: I5fb5723a819b1e2c5aadb57c1023ec84ca9fa53c
Reviewed-on: https://go-review.googlesource.com/c/go/+/654195
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-04 03:10:17 -08:00
Joel Sing 927fdb7843 cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.

Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-27 03:45:44 -08:00
Keith Randall 8b8bff7bb2 cmd/compile: don't pull constant offsets out of pointer arithmetic
This could lead to manufacturing a pointer that points outside
its original allocation.

Bug was introduced in CL 629858.

Fixes #71932

Change-Id: Ia86ab0b65ce5f80a8e0f4f4c81babd07c5904f8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/652078
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-02-26 09:39:12 -08:00
khr@golang.org cc16fb52e6 cmd/compile: ensure we don't reuse temporary register
Before this CL, we could use the same register for both a temporary
register and for moving a value in the output register out of the way.

Fixes #71857

Change-Id: Iefbfd9d4139136174570d8aadf8a0fb391791ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/651221
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-25 10:19:25 -08:00
Jorropo 00635de759 cmd/compile: don't report newLimit discovered when unsat happens multiple times
Fixes #71852

Change-Id: I696fcb8fc8c0c2e5e5ae6ab50596f6bdb9b7d498
Reviewed-on: https://go-review.googlesource.com/c/go/+/650975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-02-20 09:44:28 -08:00
Mateusz Poliwczak 43e6525986 cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.

type M interface{ M() }
type A interface{ A() }

type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}

func main() {
        var a M = &Impl{}
        a.(A).A()
}

Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev: 4bfc901917
GitHub-Pull-Request: golang/go#71784
Reviewed-on: https://go-review.googlesource.com/c/go/+/649500
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-19 13:39:00 -08:00
Mateusz Poliwczak ba3f568988 cmd/compile: determine static values of len and cap in make() calls
This change improves escape analysis by attempting to
deduce static values for the len and cap parameters,
allowing allocations to be made on the stack.

Change-Id: I1161019aed9f60cf2c2fe4d405da94ad415231ac
GitHub-Last-Rev: d78c1b4ca5
GitHub-Pull-Request: golang/go#71693
Reviewed-on: https://go-review.googlesource.com/c/go/+/649035
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-19 13:38:58 -08:00
David Chase 9ddeac30b5 cmd/compile, runtime: use deferreturn as target PC for recover from deferrangefunc
The existing code for recover from deferrangefunc was broken in
several ways.

1. the code following a deferrangefunc call did not check the return
value for an out-of-band value indicating "return now" (i.e., recover
was called)

2. the returned value was delivered using a bespoke ABI that happened
to match on register-ABI platforms, but not on older stack-based
ABI.

3. the returned value was the wrong width (1 word versus 2) and
type/value(integer 1, not a pointer to anything) for deferrangefunc's
any-typed return value (in practice, the OOB value check could catch
this, but still, it's sketchy).

This -- using the deferreturn lookup method already in place for
open-coded defers -- turned out to be a much-less-ugly way of
obtaining the desired transfer of control for recover().

TODO: we also could do this for regular defer, and delete some code.

Fixes #71675

Change-Id: If7d7ea789ad4320821aab3b443759a7d71647ff0
Reviewed-on: https://go-review.googlesource.com/c/go/+/650476
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-02-19 08:27:06 -08:00
Cuong Manh Le 34073a736a cmd/compile: avoid infinite recursion when inlining closures
CL 630696 changes budget for once-called closures, making them more
inlinable. However, when recursive inlining involve both the closure and
its parent, the inliner goes into an infinite loop:

	parent (a closure)  -> closure -> parent -> ...

The problem here dues to the closure name mangling, causing the inlined
checking condition failed, since the closure name affects how the
linker symbol generated.

To fix this, just prevent the closure from inlining its parent into
itself, avoid the infinite inlining loop.

Fixes #71680

Change-Id: Ib27626d70f95e5f1c24a3eb1c8e6c3443b7d90c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/649656
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-02-18 17:52:51 -08:00
Jakub Ciolek d524e1eccd cmd/compile: on AMD64, turn x < 128 into x <= 127
x < 128 -> x <= 127
x >= 128 -> x > 127

This allows for shorter encoding as 127 fits into
a single-byte immediate.

archive/tar benchmark (Alder Lake 12600K)

name              old time/op    new time/op    delta
/Writer/USTAR-16    1.46µs ± 0%    1.32µs ± 0%  -9.43%  (p=0.008 n=5+5)
/Writer/GNU-16      1.85µs ± 1%    1.79µs ± 1%  -3.47%  (p=0.008 n=5+5)
/Writer/PAX-16      3.21µs ± 0%    3.11µs ± 2%  -2.96%  (p=0.008 n=5+5)
/Reader/USTAR-16    1.38µs ± 1%    1.37µs ± 0%    ~     (p=0.127 n=5+4)
/Reader/GNU-16       798ns ± 1%     800ns ± 2%    ~     (p=0.548 n=5+5)
/Reader/PAX-16      3.07µs ± 1%    3.00µs ± 0%  -2.35%  (p=0.008 n=5+5)
[Geo mean]          1.76µs         1.70µs       -3.15%

compilecmp:

hash/maphash
hash/maphash.(*Hash).Write 517 -> 510  (-1.35%)

runtime
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

runtime [cmd/compile]
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

math/rand/v2
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)

strings
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.EqualFold 625 -> 624  (-0.16%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes [cmd/compile]
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)

regexp/syntax
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

math/rand/v2 [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

strings [cmd/compile]
strings.EqualFold 625 -> 624  (-0.16%)
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

regexp
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

image/jpeg
image/jpeg.(*decoder).processDQT 500 -> 491  (-1.80%)

regexp/syntax [cmd/compile]
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

regexp [cmd/compile]
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

encoding/csv
encoding/csv.(*Writer).fieldNeedsQuotes 269 -> 266  (-1.12%)

cmd/vendor/golang.org/x/sys/unix
type:.eq.[131]struct 855 -> 823  (-3.74%)

vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)
vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)

vendor/golang.org/x/text/secure/bidirule
vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83  (-2.35%)

go/scanner
go/scanner.isDigit 100 -> 98  (-2.00%)
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)

encoding/asn1
encoding/asn1.parseTagAndLength 1189 -> 1182  (-0.59%)
encoding/asn1.makeField 3481 -> 3463  (-0.52%)

text/scanner
text/scanner.(*Scanner).next 1242 -> 1236  (-0.48%)

archive/tar
archive/tar.isASCII 133 -> 127  (-4.51%)
archive/tar.(*Writer).writeRawFile 1206 -> 1198  (-0.66%)
archive/tar.(*Reader).readHeader.func1 9 -> 7  (-22.22%)
archive/tar.toASCII 393 -> 383  (-2.54%)
archive/tar.splitUSTARPath 405 -> 396  (-2.22%)
archive/tar.(*Writer).writePAXHeader.func1 627 -> 620  (-1.12%)

text/template
text/template.jsIsSpecial 59 -> 57  (-3.39%)

go/doc
go/doc.assumedPackageName 714 -> 701  (-1.82%)

vendor/golang.org/x/net/http/httpguts
vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952  (-1.35%)
vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269  (-3.93%)
vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26  (-7.14%)

net/mail
net/mail.isVchar 26 -> 24  (-7.69%)
net/mail.isAtext 106 -> 104  (-1.89%)
net/mail.(*Address).String 1084 -> 1052  (-2.95%)
net/mail.isQtext 39 -> 37  (-5.13%)
net/mail.isMultibyte 9 -> 7  (-22.22%)
net/mail.isDtext 45 -> 43  (-4.44%)
net/mail.(*addrParser).consumeQuotedString 1050 -> 1029  (-2.00%)
net/mail.quoteString 741 -> 714  (-3.64%)

cmd/internal/obj/s390x
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

fmt [cmd/compile]
fmt.Fprintf 4726 -> 4662  (-1.35%)

go/scanner [cmd/compile]
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)
go/scanner.isDigit 100 -> 98  (-2.00%)

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).nextch 879 -> 847  (-3.64%)

cmd/vendor/golang.org/x/mod/module
cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235  (-1.44%)
cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517  (-0.39%)

go/doc [cmd/compile]
go/doc.assumedPackageName 714 -> 701  (-1.82%)

cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*scanner).escape 1965 -> 1933  (-1.63%)
cmd/compile/internal/syntax.(*scanner).next 8975 -> 8847  (-1.43%)

cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

cmd/internal/gcprog
cmd/internal/gcprog.(*Writer).Repeat 688 -> 677  (-1.60%)
cmd/internal/gcprog.(*Writer).varint 442 -> 439  (-0.68%)

cmd/compile/internal/ir
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

cmd/compile/internal/ir [cmd/compile]
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

net/http
net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
net/http.isNotToken 33 -> 30  (-9.09%)
net/http.containsDotDot 606 -> 588  (-2.97%)
net/http.isCookieNameValid 197 -> 191  (-3.05%)
net/http.parsePattern 4330 -> 4317  (-0.30%)
net/http.ParseCookie 1099 -> 1096  (-0.27%)
net/http.validMethod 197 -> 187  (-5.08%)

cmd/vendor/golang.org/x/text/unicode/norm
cmd/vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)
cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)

net/http/cookiejar
net/http/cookiejar.encode 1936 -> 1918  (-0.93%)

expvar
expvar.appendJSONQuote 972 -> 965  (-0.72%)

cmd/cgo/internal/test
cmd/cgo/internal/test.stack128 116 -> 114  (-1.72%)

cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628  (-0.55%)
cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179  (-9.14%)

Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c
Reviewed-on: https://go-review.googlesource.com/c/go/+/639935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-16 07:23:13 -08:00
Keith Randall beac2f7d3b cmd/compile: fix sign extension of paired 32-bit loads on arm64
Fixes #71759

Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/649895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-02-15 07:53:28 -08:00
Keith Randall 187fd2698d cmd/compile: make write barrier code amenable to paired loads/stores
It currently isn't because it does load/store/load/store/...
Rework to do overwrite processing in pairs so it is instead
load/load/store/store/...

Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/631495
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:14 -08:00
Keith Randall a0029e95e5 cmd/compile: regalloc: handle desired registers of 2-output insns
Particularly with 2-word load instructions, this becomes important.
Classic example is:

    func f(p *string) string {
        return *p
    }

We want the two loads to put the return values directly into
the two ABI return registers.

At this point in the stack, cmd/go is 1.1% smaller.

Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/631137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:07 -08:00
khr@golang.org 20d7c57422 cmd/compile: pair loads and stores on arm64
Look for possible paired load/store operations on arm64.
I don't expect this would be a lot faster, but it will save
binary space, and indirectly through the icache at least a bit
of time.

Change-Id: I4dd73b0e6329c4659b7453998f9b75320fcf380b
Reviewed-on: https://go-review.googlesource.com/c/go/+/629256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-13 14:07:47 -08:00
Keith Randall 89c2f282dc cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.

Fixes #44898
Update #71132

There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.

1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]

The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.

Case 2 happens particularly after inlining, so it is pretty common.

Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 13:03:07 -08:00