Commit Graph

9507 Commits

Author SHA1 Message Date
Keith Randall 12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
Joel Sing 4d10d4ad84 cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.

On a Banana Pi F3, with GORISCV64=rva22u64:

              │     oc.1     │                oc.2                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   4.389n ± 0%  -74.08% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.016n ± 0%  -11.10% (p=0.000 n=10)
OnesCount16-8    9.404n ± 0%   5.015n ± 0%  -46.67% (p=0.000 n=10)
OnesCount32-8   13.165n ± 0%   4.388n ± 0%  -66.67% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   4.388n ± 0%  -73.08% (p=0.000 n=10)
geomean          11.40n        4.629n       -59.40%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:

              │     oc.3     │                oc.4                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   5.643n ± 0%  -66.67% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.642n ± 0%        ~ (p=0.447 n=10)
OnesCount16-8   10.030n ± 0%   6.896n ± 0%  -31.25% (p=0.000 n=10)
OnesCount32-8   13.170n ± 0%   5.642n ± 0%  -57.16% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   5.642n ± 0%  -65.39% (p=0.000 n=10)
geomean          11.55n        5.873n       -49.16%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:

              │    oc.3     │                oc.5                 │
              │   sec/op    │   sec/op     vs base                │
OnesCount-8     16.93n ± 0%   29.47n ± 0%  +74.07% (p=0.000 n=10)
OnesCount8-8    5.642n ± 0%   5.643n ± 0%        ~ (p=0.191 n=10)
OnesCount16-8   10.03n ± 0%   15.05n ± 0%  +50.05% (p=0.000 n=10)
OnesCount32-8   13.17n ± 0%   18.18n ± 0%  +38.04% (p=0.000 n=10)
OnesCount64-8   16.30n ± 0%   21.94n ± 0%  +34.60% (p=0.000 n=10)
geomean         11.55n        15.84n       +37.16%

For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.

Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 05:57:41 -07:00
Joel Sing 90e8b8cdae cmd/compile: intrinsify math/bits.Bswap on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │     rb.1     │                rb.2                 │
                 │    sec/op    │   sec/op     vs base                │
ReverseBytes-4     18.790n ± 0%   4.026n ± 0%  -78.57% (p=0.000 n=10)
ReverseBytes16-4    6.710n ± 0%   5.368n ± 0%  -20.00% (p=0.000 n=10)
ReverseBytes32-4   13.420n ± 0%   5.368n ± 0%  -60.00% (p=0.000 n=10)
ReverseBytes64-4   17.450n ± 0%   4.026n ± 0%  -76.93% (p=0.000 n=10)
geomean             13.11n        4.649n       -64.54%

Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-05-01 05:57:13 -07:00
Jakub Ciolek 1e756dc5f7 cmd/compile: relax tighten register-pressure heuristic slightly
Sometimes a value has multiple args, but they are the same
dependency. Relax the regalloc heuristic for those.

No measurable compile-time regression according to compilebench,
maybe even a small improvement.

name    old time/op  new time/op  delta
StdCmd   14.4s ± 1%   14.4s ± 1%  -0.39%  (p=0.101 n=11+11)

compilecmp:

linux/amd64:

strconv
strconv.formatBits 1199 -> 1189  (-0.83%)
strconv.formatDecimal 637 -> 631  (-0.94%)

strconv [cmd/compile]
strconv.formatBits 1199 -> 1189  (-0.83%)
strconv.formatDecimal 637 -> 631  (-0.94%)

image
image.NewGray16 286 -> 275  (-3.85%)
image.NewAlpha16 286 -> 275  (-3.85%)

regexp/syntax
regexp/syntax.ranges.Less 150 -> 147  (-2.00%)
regexp/syntax.(*compiler).rune 774 -> 773  (-0.13%)
regexp/syntax.(*ranges).Swap 197 -> 180  (-8.63%)
regexp/syntax.ranges.Swap 146 -> 134  (-8.22%)
regexp/syntax.(*compiler).cap 440 -> 425  (-3.41%)
regexp/syntax.(*compiler).nop 310 -> 297  (-4.19%)
regexp/syntax.(*compiler).compile 5815 -> 5733  (-1.41%)
regexp/syntax.(*ranges).Less 211 -> 197  (-6.64%)

regexp/syntax [cmd/compile]
regexp/syntax.(*compiler).compile 5815 -> 5733  (-1.41%)
regexp/syntax.(*compiler).rune 774 -> 773  (-0.13%)
regexp/syntax.(*compiler).cap 440 -> 425  (-3.41%)
regexp/syntax.(*ranges).Less 211 -> 197  (-6.64%)
regexp/syntax.ranges.Swap 146 -> 134  (-8.22%)
regexp/syntax.(*ranges).Swap 197 -> 180  (-8.63%)
regexp/syntax.(*compiler).nop 310 -> 297  (-4.19%)
regexp/syntax.ranges.Less 150 -> 147  (-2.00%)

crypto/elliptic
crypto/elliptic.(*nistCurve[go.shape.*uint8]).pointFromAffine 1272 -> 1240  (-2.52%)

image/gif
image/gif.(*decoder).readColorTable 652 -> 646  (-0.92%)
image/gif.(*encoder).colorTablesMatch 350 -> 349  (-0.29%)

crypto/internal/cryptotest
crypto/internal/cryptotest.testCipher.func3 1289 -> 1286  (-0.23%)

internal/trace/internal/tracev1
internal/trace/internal/tracev1.(*parser).collectBatchesAndCPUSamples 1352 -> 1338  (-1.04%)

internal/fuzz
internal/fuzz.byteSliceDuplicateBytes 741 -> 718  (-3.10%)

cmd/compile/internal/types
cmd/compile/internal/types.CalcSize 3663 -> 3633  (-0.82%)

cmd/compile/internal/rttype
cmd/compile/internal/rttype.Init 2149 -> 2124  (-1.16%)

cmd/link/internal/loadmacho
cmd/link/internal/loadmacho.macholoadsym 1213 -> 1212  (-0.08%)

cmd/compile/internal/rangefunc
cmd/compile/internal/rangefunc.(*rewriter).checks 5207 -> 5175  (-0.61%)

net/http
net/http.(*http2SettingsFrame).Setting 155 -> 147  (-5.16%)

cmd/compile/internal/rttype [cmd/compile]
cmd/compile/internal/rttype.Init 2149 -> 2124  (-1.16%)

cmd/compile/internal/rangefunc [cmd/compile]
cmd/compile/internal/rangefunc.(*rewriter).checks 5207 -> 5175  (-0.61%)

cmd/link/internal/ld
cmd/link/internal/ld.pefips 3119 -> 3109  (-0.32%)

cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.parseDash 593 -> 587  (-1.01%)

cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).setOrder 3442 -> 3416  (-0.76%)
cmd/compile/internal/ssa.rewriteValuegeneric_OpMul16 2054 -> 2022  (-1.56%)
cmd/compile/internal/ssa.rewriteValuegeneric_OpMul8 2054 -> 2022  (-1.56%)
inserted cmd/compile/internal/ssa.tighten.deferwrap5

cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValuegeneric_OpMul8 2054 -> 2022  (-1.56%)
cmd/compile/internal/ssa.rewriteValuegeneric_OpMul16 2054 -> 2022  (-1.56%)
cmd/compile/internal/ssa.tighten.deferwrap4 76 -> 67  (-11.84%)
cmd/compile/internal/ssa.tighten 6746 -> 5082  (-24.67%)
inserted cmd/compile/internal/ssa.tighten.deferwrap5

file                                             before   after    Δ       %
strconv.s                                        49029    49020    -9      -0.018%
strconv [cmd/compile].s                          49029    49020    -9      -0.018%
image.s                                          34963    34941    -22     -0.063%
regexp/syntax.s                                  83017    82860    -157    -0.189%
regexp/syntax [cmd/compile].s                    83017    82860    -157    -0.189%
crypto/elliptic.s                                26848    26816    -32     -0.119%
image/gif.s                                      22840    22833    -7      -0.031%
crypto/internal/cryptotest.s                     63834    63832    -2      -0.003%
internal/trace/internal/tracev1.s                52995    52981    -14     -0.026%
internal/trace.s                                 181396   181412   +16     +0.009%
internal/fuzz.s                                  85526    85503    -23     -0.027%
cmd/internal/obj/s390x.s                         121651   121683   +32     +0.026%
cmd/internal/obj/ppc64.s                         139867   139871   +4      +0.003%
cmd/compile/internal/types.s                     71425    71395    -30     -0.042%
cmd/internal/obj/ppc64 [cmd/compile].s           139952   139956   +4      +0.003%
cmd/internal/obj/s390x [cmd/compile].s           121753   121785   +32     +0.026%
cmd/compile/internal/rttype.s                    10418    10393    -25     -0.240%
cmd/link/internal/loadmacho.s                    23270    23272    +2      +0.009%
cmd/compile/internal/rangefunc.s                 35050    35018    -32     -0.091%
cmd/vendor/github.com/google/pprof/profile.s     148264   148273   +9      +0.006%
net/http.s                                       612895   612910   +15     +0.002%
cmd/compile/internal/rttype [cmd/compile].s      10397    10372    -25     -0.240%
cmd/compile/internal/rangefunc [cmd/compile].s   35681    35649    -32     -0.090%
net/http/cookiejar.s                             28758    28761    +3      +0.010%
cmd/compile/internal/reflectdata.s               86639    86644    +5      +0.006%
cmd/compile/internal/reflectdata [cmd/compile].s 89725    89730    +5      +0.006%
cmd/link/internal/ld.s                           649596   649633   +37     +0.006%
cmd/vendor/rsc.io/markdown.s                     116731   116757   +26     +0.022%
cmd/compile/internal/ssa.s                       3574185  3574642  +457    +0.013%
cmd/compile/internal/ssa [cmd/compile].s         3725364  3723715  -1649   -0.044%
cmd/compile/internal/ssagen.s                    415135   415155   +20     +0.005%
total                                            36475376 36473818 -1558   -0.004%

linux/arm64:

go/printer
go/printer.(*printer).expr1 7152 -> 7168  (+0.22%)

fmt [cmd/compile]
fmt.(*ss).advance 1712 -> 1696  (-0.93%)

crypto/x509
crypto/x509.marshalCertificatePolicies.func1.2.(*Builder).AddASN1ObjectIdentifier.1 changed

internal/fuzz
internal/fuzz.minimizeBytes changed

cmd/internal/obj/arm64
cmd/internal/obj/arm64.bitconEncode changed

math/big [cmd/compile]
math/big.(*Float).Int64 512 -> 528  (+3.12%)
math/big.NewInt changed
math/big.fmtE 720 -> 736  (+2.22%)
math/big.basicSqr changed

cmd/asm/internal/asm
cmd/asm/internal/asm.(*Parser).asmText 1424 -> 1440  (+1.12%)

go/constant [cmd/compile]
go/constant.UnaryOp changed
go/constant.BinaryOp changed

crypto/tls
crypto/tls.prf10 576 -> 560  (-2.78%)

cmd/internal/obj/arm64 [cmd/compile]
cmd/internal/obj/arm64.bitconEncode changed

cmd/vendor/golang.org/x/term
cmd/vendor/golang.org/x/term.(*Terminal).addKeyToLine changed

cmd/compile/internal/ir
cmd/compile/internal/ir.ConstOverflow changed

cmd/vendor/github.com/google/pprof/internal/graph
cmd/vendor/github.com/google/pprof/internal/graph.(*builder).addEdge changed

cmd/compile/internal/ir [cmd/compile]
cmd/compile/internal/ir.ConstOverflow changed

cmd/compile/internal/rttype
cmd/compile/internal/rttype.Init changed

cmd/compile/internal/rttype [cmd/compile]
cmd/compile/internal/rttype.Init changed

cmd/compile/internal/abi [cmd/compile]
cmd/compile/internal/abi.(*ABIParamAssignment).RegisterTypesAndOffsets 1344 -> 1328  (-1.19%)

cmd/vendor/golang.org/x/tools/go/types/typeutil
cmd/vendor/golang.org/x/tools/go/types/typeutil.hasher.hash changed

cmd/vendor/github.com/ianlancetaylor/demangle
cmd/vendor/github.com/ianlancetaylor/demangle.(*rustState).expandPunycode changed

net/http/cookiejar
net/http/cookiejar.adapt changed
net/http/cookiejar.encode changed

cmd/compile/internal/reflectdata
cmd/compile/internal/reflectdata.OldMapType changed

cmd/compile/internal/reflectdata [cmd/compile]
cmd/compile/internal/reflectdata.OldMapType changed

cmd/vendor/github.com/google/pprof/internal/report
cmd/vendor/github.com/google/pprof/internal/report.(*Report).newTrimmedGraph 2336 -> 2368  (+1.37%)

cmd/link/internal/ld
cmd/link/internal/ld.(*relocSymState).relocsym changed

cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.parseDash changed
cmd/vendor/rsc.io/markdown.parseLinkRefDef changed

cmd/trace
main.(*stackMap).profile 912 -> 880  (-3.51%)

cmd/vendor/golang.org/x/tools/go/analysis/passes/tests
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.checkExampleOutput 832 -> 816  (-1.92%)

cmd/compile/internal/ssa
cmd/compile/internal/ssa.shouldElimIfElse changed
cmd/compile/internal/ssa.storeOrder changed
cmd/compile/internal/ssa.elimIfElse changed
cmd/compile/internal/ssa.tighten 3408 -> 3456  (+1.41%)

cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.storeOrder changed
cmd/compile/internal/ssa.elimIfElse changed
cmd/compile/internal/ssa.shouldElimIfElse changed
cmd/compile/internal/ssa.tighten 4960 -> 4976  (+0.32%)
cmd/compile/internal/ssa.branchelim changed

file                                                     before   after    Δ       %
runtime.s                                                624064   624032   -32     -0.005%
runtime [cmd/compile].s                                  679456   679424   -32     -0.005%
strconv.s                                                48528    48560    +32     +0.066%
strconv [cmd/compile].s                                  48528    48560    +32     +0.066%
index/suffixarray.s                                      41808    41856    +48     +0.115%
fmt.s                                                    72272    72256    -16     -0.022%
math/big.s                                               152992   153024   +32     +0.021%
go/printer.s                                             77680    77696    +16     +0.021%
fmt [cmd/compile].s                                      81760    81744    -16     -0.020%
math/big [cmd/compile].s                                 153040   153072   +32     +0.021%
cmd/asm/internal/asm.s                                   57360    57376    +16     +0.028%
crypto/tls.s                                             354304   354288   -16     -0.005%
cmd/compile/internal/abi [cmd/compile].s                 22752    22736    -16     -0.070%
cmd/vendor/github.com/google/pprof/internal/report.s     67008    67040    +32     +0.048%
cmd/trace.s                                              215040   215008   -32     -0.015%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 12544    12528    -16     -0.128%
cmd/compile/internal/ssa.s                               3209248  3209296  +48     +0.001%
cmd/compile/internal/ssa [cmd/compile].s                 3319152  3319168  +16     +0.000%
total                                                    33366288 33366416 +128    +0.000%

Change-Id: I8111792c9dd4f927b49a6d5dd90a3fdc3ec26277
Reviewed-on: https://go-review.googlesource.com/c/go/+/666836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-29 07:38:24 -07:00
Keith Randall 67e0681aef cmd/compile: put constant value on node inside parentheses
That's where the unified IR writer expects it.

Fixes #73476

Change-Id: Ic22bd8dee5be5991e6d126ae3f6eccb2acdc0b19
Reviewed-on: https://go-review.googlesource.com/c/go/+/667415
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-24 12:17:27 -07:00
Keith Randall 3452d80da3 cmd/compile: add cast in range loop final value computation
When replacing a loop where the iteration variable has a named type,
we need to compute the last iteration value as i = T(len(a)-1), not
just i = len(a)-1.

Fixes #73491

Change-Id: Ic1cc3bdf8571a40c10060f929a9db8a888de2b70
Reviewed-on: https://go-review.googlesource.com/c/go/+/667815
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-24 11:02:26 -07:00
Keith Randall 7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Keith Randall 336626bac4 cmd/compile: ensure we evaluate side effects of len() arg
For any len() which requires the evaluation of its arg (according to the spec).

Update #72844

Change-Id: Id2b0bcc78073a6d5051abd000131dafdf65e7f26
Reviewed-on: https://go-review.googlesource.com/c/go/+/658097
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-21 15:50:54 -07:00
Keith Randall 8af32240c6 cmd/compile: don't evaluate side effects of range over array
If the thing we're ranging over is an array or ptr to array, and
it doesn't have a function call or channel receive in it, then we
shouldn't evaluate it.

Typecheck the ranged-over value as a constant in that case.
That makes the unified exporter replace the range expression
with a constant int.

Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554
Reviewed-on: https://go-review.googlesource.com/c/go/+/659317
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-21 15:50:43 -07:00
thepudds 04a9b16f3d cmd/compile/internal/escape: avoid reading ir.Node during inner loop of walkOne
Broadly speaking, escape analysis has two main phases. First, it
traverses the AST while building a data-flow graph of locations and
edges. Second, during "solve", it repeatedly walks the data-flow graph
while carefully propagating information about each location, including
whether a location's address reaches the heap.

Once escape analysis is in the solve phase and repeatedly walking the
data-flow graph, almost all the information it needs is within the
location graph, with a notable exception being the ir.Class of an
ir.Name, which currently must be checked by following a pointer from
the location to its ir.Node.

For typical graphs, that does not matter much, but if the graph becomes
large enough, cache misses in the inner solve loop start to matter more,
and the class is checked many times in the inner loop.

We therefore store the class information on the location in the graph
to reduce how much memory we need to load in the inner loop.

The package github.com/microsoft/typescript-go/internal/checker
has many locations, and compilation currently spends most of its time
in escape analysis.

This CL gives roughly a 30% speedup for wall clock compilation time
for the checker package:

  go1.24.0:      91.79s
  this CL:       64.98s

Linux perf shows a healthy reduction for example in l2_request.miss and
dTLB-load-misses on an amd64 test VM.

We could tweak things a bit more, though initial review feedback
has suggested it would be good to get this in as it stands.

Subsequent CLs in this stack give larger improvements.

Updates #72815

Change-Id: I3117430dff684c99e6da1e0d7763869873379238
Reviewed-on: https://go-review.googlesource.com/c/go/+/657295
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jake Bailey <jacob.b.bailey@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2025-04-21 15:44:05 -07:00
Russ Cox a11643df8f math/big: replace addVW/subVW assembly with fast pure Go
The vast majority of the time, carry propagation is limited and
addVW/subVW only need to consider a single word for carry propagation.
As Josh Bleecher-Snyder pointed out in 2019 (CL 164968), once carrying
is done, the remaining words can be handled faster with copy (memmove).
In the benchmarks below, this is the data=random case.

Even more important, if the source and destination are the same,
the copy can be optimized away entirely, making a small in-place
addition to a big.Int O(1) instead of O(N). To date, only a few
systems (amd64, arm64, and pure Go, meaning wasm) make use of this
asymptotic improvement. This is the data=shortcut case.

This CL deletes the addVW/subVW assembly and replaces it with
an optimized pure Go version. Using Go makes it easy to call
the real copy builtin, which will use optimized memmove code,
instead of recreating a worse memmove in assembly (as arm64 does)
or omitting the copy optimization entirely (as most others do).

The worst case for the Go version versus assembly is the case
of incrementing 2^N-1 by 1, which has to propagate a carry
the entire length of the array. This is the data=carry case.
On balance, we believe this case is rare enough to be worth
taking a hit in that case, in exchange for significant wins
in the other cases and the deletion of significant amounts of
assembly of varying quality. (Remember that half the assembly has
the copy optimization and shortcut, while half does not.)

In the benchmarks, the systems are:

	c2s16     GOARCH=amd64     c2s16 perf gomote (Intel, Google Cloud)
	c3h88     GOARCH=amd64     c3h88 perf gomote (newer Intel, Google Cloud)
	s7        GOARCH=amd64     rsc basement server (AMD Ryzen 9 7950X)
	c4as16    GOARCH=arm64     c4as16 perf gomote (Google Cloud)
	mac       GOARCH=arm64     Apple M3 Pro in MacBook Pro
	386       GOARCH=386       gotip-linux-386 gomote
	arm       GOARCH=arm       gotip-linux-arm gomote
	loong64   GOARCH=loong64   gotip-linux-loong64 gomote
	ppc64le   GOARCH=ppc64le   gotip-linux-ppc64le gomote
	riscv64   GOARCH=riscv64   gotip-linux-riscv64 gomote

benchmark \ system                    c2s16     c3h88       s7    c4as16       mac       386      arm  loong64   ppc64le  riscv64

AddVW/words=1/data=random            -1.15%    -1.74%   -5.89%    -9.80%   -11.54%   +23.71%  -12.74%  -14.25%   +14.67%  +10.27%
AddVW/words=2/data=random            -2.59%         ~   -4.38%   -19.31%   -15.41%   +24.80%        ~  -19.99%   +13.73%  +19.71%
AddVW/words=3/data=random            -3.75%   -19.10%   -3.79%   -23.15%   -17.04%   +20.04%  -10.07%  -23.20%         ~  +15.39%
AddVW/words=4/data=random            -2.84%    +7.05%   -8.77%   -22.64%   -15.77%   +16.01%   -7.36%  -28.22%         ~  +23.00%
AddVW/words=5/data=random           -10.97%    +2.16%  -12.09%   -20.89%   -17.14%    +9.42%   -4.69%  -32.60%         ~  +10.07%
AddVW/words=6/data=random            -9.87%         ~   -7.54%   -19.08%    -6.46%         ~   -3.44%  -34.61%         ~  +12.19%
AddVW/words=7/data=random           -14.36%         ~  -10.09%   -19.10%   -10.47%    -6.20%   -5.06%  -38.14%   -11.54%   +6.79%
AddVW/words=8/data=random           -17.50%         ~  -11.06%   -25.14%   -12.88%    -8.35%   -5.11%  -41.39%   -14.04%  +11.87%
AddVW/words=9/data=random           -19.76%    -4.05%  -15.47%   -24.08%   -16.50%   -12.34%  -21.56%  -44.25%   -14.82%        ~
AddVW/words=10/data=random          -13.89%         ~   -9.69%   -23.06%    -8.04%   -12.58%  -19.25%  -32.80%   -11.68%        ~
AddVW/words=16/data=random          -29.36%   -15.35%  -21.86%   -25.04%   -19.89%   -32.26%  -16.29%  -42.66%   -25.92%   -3.01%
AddVW/words=32/data=random          -39.02%   -28.76%  -39.87%   -11.22%    -2.85%   -55.40%  -31.17%  -55.37%   -37.92%  -16.28%
AddVW/words=64/data=random          -25.94%   -19.09%  -20.60%    -6.90%    +8.91%   -51.00%  -43.72%  -62.27%   -44.11%  -28.74%
AddVW/words=100/data=random         -22.79%   -18.13%  -18.25%         ~   +33.89%   -67.40%  -51.77%  -63.54%   -53.75%  -30.97%
AddVW/words=1000/data=random         -8.98%    -3.84%        ~    -3.15%         ~   -93.35%  -63.92%  -65.66%   -68.67%  -42.30%
AddVW/words=10000/data=random        -1.38%    -0.38%        ~         ~         ~   -89.16%  -65.18%  -44.65%   -70.35%  -20.08%
AddVW/words=100000/data=random            ~         ~        ~         ~         ~   -87.03%  -64.51%  -36.08%   -61.40%  -16.53%

SubVW/words=1/data=random            -3.67%         ~   -8.38%   -10.26%    -3.07%   +45.78%   -6.06%  -11.17%         ~        ~
SubVW/words=2/data=random            -3.48%   -10.07%   -5.76%   -20.14%    -8.45%   +44.28%        ~  -19.09%         ~  +16.98%
SubVW/words=3/data=random            -7.11%   -26.64%   -4.48%   -22.07%    -9.21%   +35.61%        ~  -23.93%   -18.20%        ~
SubVW/words=4/data=random            -4.23%    +7.19%   -8.95%   -22.62%   -13.89%   +33.20%   -8.96%  -29.96%         ~  +22.23%
SubVW/words=5/data=random           -11.49%    +1.92%  -10.86%   -22.27%   -17.53%   +24.48%   -2.88%  -35.19%   -19.55%        ~
SubVW/words=6/data=random            -7.67%         ~   -7.72%   -18.44%    -6.24%   +12.03%   -2.00%  -39.68%   -10.73%        ~
SubVW/words=7/data=random           -13.69%   -18.32%  -11.82%   -18.92%   -11.57%    +6.63%        ~  -43.54%   -30.81%        ~
SubVW/words=8/data=random           -16.02%         ~  -11.07%   -24.50%   -11.92%    +4.32%   -3.01%  -46.95%   -24.14%        ~
SubVW/words=9/data=random           -18.76%    -3.34%  -14.84%   -23.79%   -17.50%         ~  -21.80%  -49.98%   -29.62%        ~
SubVW/words=10/data=random          -13.23%         ~   -9.25%   -21.26%   -11.63%         ~  -18.58%  -39.19%   -20.09%        ~
SubVW/words=16/data=random          -28.25%   -13.24%  -22.66%   -27.18%   -19.13%   -23.38%  -20.24%  -51.01%   -28.06%   -3.05%
SubVW/words=32/data=random          -38.41%   -28.88%  -40.12%   -11.20%    -2.80%   -49.17%  -34.67%  -63.29%   -39.25%  -15.20%
SubVW/words=64/data=random          -25.51%   -19.24%  -22.20%    -6.57%    +9.98%   -48.52%  -48.14%  -69.50%   -49.44%  -27.92%
SubVW/words=100/data=random         -21.69%   -18.51%        ~    +1.92%   +34.42%   -65.88%  -54.67%  -71.24%   -58.88%  -30.71%
SubVW/words=1000/data=random         -9.81%    -4.05%   -2.14%    -3.06%         ~   -93.37%  -67.33%  -74.12%   -68.36%  -42.17%
SubVW/words=10000/data=random             ~    -0.52%        ~         ~         ~   -88.87%  -68.54%  -44.94%   -70.63%  -19.95%
SubVW/words=100000/data=random            ~         ~        ~         ~         ~   -86.69%  -68.09%  -48.36%   -62.42%  -19.32%

AddVW/words=1/data=shortcut         -29.38%   -25.38%  -27.37%   -23.15%   -25.41%    +3.01%  -33.60%  -36.12%   -15.76%        ~
AddVW/words=2/data=shortcut         -32.79%   -34.72%  -31.47%   -24.47%   -28.21%    -3.75%  -34.66%  -43.89%   -23.65%  -21.56%
AddVW/words=3/data=shortcut         -38.50%   -46.83%  -35.67%   -26.38%   -30.29%   -10.41%  -44.89%  -47.68%   -30.93%  -26.85%
AddVW/words=4/data=shortcut         -40.40%   -28.85%  -34.19%   -29.83%   -32.95%   -16.09%  -42.86%  -51.02%   -34.19%  -26.69%
AddVW/words=5/data=shortcut         -43.87%   -35.42%  -36.46%   -32.59%   -37.72%   -20.82%  -45.14%  -54.01%   -35.49%  -30.48%
AddVW/words=6/data=shortcut         -46.98%   -39.34%  -42.22%   -35.43%   -38.18%   -27.46%  -46.72%  -56.61%   -40.21%  -34.07%
AddVW/words=7/data=shortcut         -49.63%   -47.97%  -46.61%   -35.28%   -41.93%   -31.14%  -49.29%  -58.89%   -41.10%  -37.01%
AddVW/words=8/data=shortcut         -50.48%   -42.33%  -45.40%   -40.24%   -41.74%   -32.92%  -50.62%  -60.98%   -44.85%  -38.10%
AddVW/words=9/data=shortcut         -54.27%   -43.52%  -49.06%   -42.16%   -45.22%   -37.57%  -51.84%  -62.91%   -46.04%  -40.82%
AddVW/words=10/data=shortcut        -56.01%   -45.40%  -51.42%   -43.29%   -46.14%   -38.65%  -53.65%  -64.62%   -47.05%  -43.21%
AddVW/words=16/data=shortcut        -62.73%   -55.66%  -59.31%   -56.38%   -54.31%   -53.16%  -61.03%  -72.29%   -58.24%  -52.57%
AddVW/words=32/data=shortcut        -74.00%   -69.42%  -71.75%   -33.65%   -37.35%   -71.73%  -72.59%  -82.44%   -70.87%  -67.69%
AddVW/words=64/data=shortcut        -56.69%   -52.72%  -52.09%   -35.48%   -36.87%   -84.24%  -83.10%  -90.37%   -82.56%  -80.81%
AddVW/words=100/data=shortcut       -56.68%   -53.18%  -51.49%   -33.49%   -37.72%   -89.95%  -88.21%  -93.37%   -88.47%  -86.52%
AddVW/words=1000/data=shortcut      -56.68%   -52.45%  -51.66%   -35.31%   -36.65%   -98.88%  -98.62%  -99.24%   -98.78%  -98.41%
AddVW/words=10000/data=shortcut     -56.70%   -52.40%  -51.92%   -33.49%   -36.98%   -99.89%  -99.86%  -99.92%   -99.87%  -99.91%
AddVW/words=100000/data=shortcut    -56.67%   -52.46%  -52.38%   -35.31%   -37.20%   -99.99%  -99.99%  -99.99%   -99.99%  -99.99%

SubVW/words=1/data=shortcut         -29.80%   -20.71%  -26.94%   -23.24%   -25.33%   +26.97%  -32.02%  -37.85%   -40.20%  -12.67%
SubVW/words=2/data=shortcut         -35.47%   -36.38%  -31.93%   -25.43%   -30.18%   +18.96%  -33.48%  -46.48%   -39.38%  -18.65%
SubVW/words=3/data=shortcut         -39.22%   -49.96%  -36.90%   -25.82%   -30.96%   +12.53%  -40.67%  -51.07%   -43.71%  -23.78%
SubVW/words=4/data=shortcut         -40.46%   -24.90%  -34.66%   -29.87%   -33.97%    +4.60%  -42.32%  -54.92%   -42.83%  -22.45%
SubVW/words=5/data=shortcut         -43.84%   -34.17%  -38.00%   -32.55%   -37.27%    -2.46%  -43.09%  -58.18%   -45.70%  -26.45%
SubVW/words=6/data=shortcut         -47.69%   -37.49%  -42.73%   -35.90%   -37.73%    -8.52%  -46.55%  -61.01%   -44.00%  -30.14%
SubVW/words=7/data=shortcut         -49.45%   -50.66%  -46.88%   -34.77%   -41.64%   -14.46%  -48.92%  -63.46%   -50.47%  -33.39%
SubVW/words=8/data=shortcut         -50.45%   -39.31%  -47.14%   -40.47%   -41.70%   -15.77%  -50.21%  -65.64%   -47.71%  -34.01%
SubVW/words=9/data=shortcut         -54.28%   -43.07%  -49.42%   -41.34%   -44.99%   -19.39%  -51.55%  -67.61%   -56.92%  -36.82%
SubVW/words=10/data=shortcut        -56.85%   -47.88%  -50.92%   -42.76%   -45.67%   -23.60%  -53.04%  -69.34%   -60.18%  -39.43%
SubVW/words=16/data=shortcut        -62.36%   -54.83%  -58.80%   -55.83%   -53.74%   -41.04%  -60.16%  -76.75%   -60.56%  -48.63%
SubVW/words=32/data=shortcut        -73.68%   -68.64%  -71.57%   -33.52%   -37.34%   -64.73%  -72.67%  -85.89%   -71.87%  -64.56%
SubVW/words=64/data=shortcut        -56.68%   -51.66%  -52.56%   -34.75%   -37.54%   -80.30%  -83.58%  -92.39%   -83.41%  -78.70%
SubVW/words=100/data=shortcut       -56.68%   -50.97%  -51.57%   -33.68%   -36.78%   -87.42%  -88.53%  -94.84%   -88.87%  -84.96%
SubVW/words=1000/data=shortcut      -56.68%   -50.89%  -52.10%   -34.94%   -37.77%   -98.59%  -98.71%  -99.43%   -98.80%  -98.20%
SubVW/words=10000/data=shortcut     -56.68%   -51.00%  -52.44%   -33.65%   -37.27%   -99.86%  -99.87%  -99.94%   -99.88%  -99.90%
SubVW/words=100000/data=shortcut    -56.68%   -50.80%  -52.20%   -34.79%   -37.46%   -99.99%  -99.99%  -99.99%   -99.99%  -99.99%

AddVW/words=1/data=carry             -0.51%    -5.29%  -24.03%   -26.48%         ~         ~  -33.14%  -30.23%         ~  -20.74%
AddVW/words=2/data=carry             -6.36%         ~  -21.05%   -39.40%         ~   +10.72%  -29.12%  -31.34%         ~  -17.29%
AddVW/words=3/data=carry                  ~         ~  -17.46%   -19.53%   +17.58%         ~  -26.23%  -23.61%    +7.80%  -14.34%
AddVW/words=4/data=carry            +19.02%   +16.80%        ~         ~   +28.25%         ~  -27.90%  -20.31%   +19.16%        ~
AddVW/words=5/data=carry             +3.97%   +53.02%        ~         ~   +11.31%         ~  -19.05%  -17.47%   +16.81%        ~
AddVW/words=6/data=carry             +2.98%   +19.83%        ~         ~   +14.84%         ~  -18.48%  -14.92%   +18.25%        ~
AddVW/words=7/data=carry                  ~         ~        ~         ~   +27.17%         ~  -15.50%  -12.74%   +13.00%        ~
AddVW/words=8/data=carry             +0.58%   +22.32%        ~    +6.10%   +29.63%         ~  -13.04%        ~   +28.46%   +2.95%
AddVW/words=9/data=carry                  ~   +31.53%        ~         ~   +14.42%         ~  -11.32%        ~   +18.37%   +3.28%
AddVW/words=10/data=carry            +3.94%   +22.36%        ~    +6.29%   +19.22%         ~  -11.27%        ~   +20.10%   +3.91%
AddVW/words=16/data=carry            +2.82%   +14.23%        ~   +10.06%   +25.91%   -16.12%        ~        ~   +52.28%  +10.40%
AddVW/words=32/data=carry                 ~   +25.35%  +13.66%         ~   +34.89%   -34.39%   +6.51%  -18.71%   +41.06%  +19.42%
AddVW/words=64/data=carry           -42.03%         ~  -39.70%    +6.65%   +32.29%   -39.94%  +14.34%        ~   +19.68%  +20.86%
AddVW/words=100/data=carry          -33.95%   -34.28%  -39.65%         ~   +27.72%   -26.80%  +17.40%        ~   +26.39%  +23.32%
AddVW/words=1000/data=carry         -42.49%   -47.87%  -47.44%    +1.25%    +4.25%   -41.76%  +23.40%        ~   +25.48%  +27.99%
AddVW/words=10000/data=carry        -41.85%   -48.49%  -49.43%         ~         ~   -42.09%  +24.61%  -10.32%   +40.55%  +18.35%
AddVW/words=100000/data=carry       -28.18%   -48.13%  -48.24%    +1.35%         ~   -42.90%  +24.73%   -9.79%   +22.55%  +17.16%

SubVW/words=1/data=carry            -10.32%   -17.16%  -24.14%   -26.24%         ~   +18.43%  -34.10%  -29.54%    -9.57%        ~
SubVW/words=2/data=carry            -19.45%   -23.31%  -20.74%   -39.73%         ~   +15.74%  -28.13%  -30.21%         ~  -18.74%
SubVW/words=3/data=carry                  ~   -16.18%  -15.34%   -19.54%   +17.62%   +12.39%  -27.64%  -27.09%         ~  -14.97%
SubVW/words=4/data=carry            +11.67%   +24.42%        ~         ~   +25.11%   +14.07%  -28.08%  -26.18%         ~        ~
SubVW/words=5/data=carry             +8.08%   +25.64%        ~         ~   +10.35%    +8.12%  -21.75%  -25.50%         ~   -4.86%
SubVW/words=6/data=carry                  ~   +13.82%        ~         ~   +12.92%    +6.79%  -20.25%  -24.70%         ~   -2.74%
SubVW/words=7/data=carry                  ~         ~   +8.29%    +4.51%   +26.59%    +4.62%  -18.01%  -24.09%         ~   -1.26%
SubVW/words=8/data=carry                  ~   +23.16%  +16.19%    +6.16%   +25.46%    +6.74%  -15.57%  -22.74%         ~   +1.44%
SubVW/words=9/data=carry                  ~   +30.71%  +20.81%         ~   +12.36%         ~  -12.99%        ~         ~   +3.13%
SubVW/words=10/data=carry            +5.03%   +19.53%  +14.84%   +14.16%   +16.12%         ~  -11.64%  -16.00%   +15.45%   +3.29%
SubVW/words=16/data=carry           +14.42%   +15.58%  +33.07%   +11.43%   +24.65%         ~        ~  -21.90%   +25.59%   +9.40%
SubVW/words=32/data=carry                 ~   +27.57%  +46.58%         ~   +35.35%    -8.49%        ~  -24.04%   +11.86%  +18.40%
SubVW/words=64/data=carry           -24.34%   -27.83%  -20.90%   +13.34%   +37.17%   -14.90%        ~   -8.81%   +12.88%  +18.92%
SubVW/words=100/data=carry          -25.19%   -34.70%  -27.45%   +12.86%   +28.42%   -14.48%        ~        ~   +25.71%  +21.93%
SubVW/words=1000/data=carry         -24.93%   -47.86%  -47.26%    +2.66%         ~   -23.88%        ~        ~   +25.99%  +27.81%
SubVW/words=10000/data=carry        -24.17%   -36.48%  -49.41%    +1.06%         ~   -25.06%        ~  -26.50%   +27.94%  +18.36%
SubVW/words=100000/data=carry       -22.51%   -35.86%  -49.46%    +3.96%         ~   -25.18%        ~  -22.15%   +26.86%  +15.44%

Change-Id: I8f252073040e674780ac6ec9912082fb205329dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/664898
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-18 15:07:59 -07:00
apocelipes 8a8efafa88 cmd/compile: use the builtin clear
To simplify the code a bit.

Change-Id: Ia72f576de59ff161ec389a4992bb635f89783540
GitHub-Last-Rev: eaec8216be
GitHub-Pull-Request: golang/go#73411
Reviewed-on: https://go-review.googlesource.com/c/go/+/666117
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-18 04:21:12 -07:00
Marcel Meyer 5715d73559 all: use strings.ReplaceAll where applicable
```
find . \
-not -path './.git/*' \
-not -path './test/*' \
-not -path './src/cmd/vendor/*' \
-not -wholename './src/strings/example_test.go' \
-type f \
-exec \
sed -i -E 's/strings\.Replace\((.+), -1\)/strings\.ReplaceAll\(\1\)/g' {} \;
```

Change-Id: I59e2e91b3654c41a32f17dd91ec56f250198f0d6
GitHub-Last-Rev: 0868b1eccc
GitHub-Pull-Request: golang/go#73370
Reviewed-on: https://go-review.googlesource.com/c/go/+/665395
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-16 12:26:29 -07:00
thepudds 2c9689ab0e cmd/compile/internal/escape: add hash for bisecting stack allocation of variable-sized makeslice
CL 653856 enabled stack allocation of variable-sized makeslice results.

This CL adds debug hashing of that change, plus a debug flag
to control the byte threshold used.

The debug hashing machinery means we also now have a way to disable just
the CL 653856 optimization by doing -gcflags='all=-d=variablemakehash=n'
or similar, though the stderr output will then typically have many
lines of debug hash output.

Using this CL plus the bisect command, I was able to retroactively
find one of the lines of code responsible for #73199:

  $ bisect -compile=variablemake go test -skip TestListWireGuardDrivers
  [...]
  bisect: FOUND failing change set
  --- change set #1 (enabling changes causes failure)
  ./security_windows.go:1321:38 (variablemake)
  ./security_windows.go:1321:38 (variablemake)
  ---

Previously, I had tracked down those lines by diffing '-gcflags=-m=1'
output and brief code inspection, but seeing the bisect was very nice.

This CL also adds a compiler debug flag to control the threshold for
stack allocation of variably sized make results. This can help
us identify more code that is relying on certain stack allocations.
This might be a temporary flag that we delete prior to Go 1.25
(given we would not want people to rely on it), or maybe it
might make sense to keep it for some period of time beyond the release
of Go 1.25 to help the ecosystem shake out other bugs.

Using these two flags together (and picking a threshold of 64 rather
than the default of 32), it looks for example like this
x/sys/windows code might be relying on stack allocation of
a byte slice:

  $ bisect -compile=variablemake go test -gcflags=-d=variablemakethreshold=64 -skip TestListWireGuardDrivers
  [...]
  bisect: FOUND failing change set
  --- change set #1 (enabling changes causes failure)
  ./syscall_windows_test.go:1178:16 (variablemake)

Updates #73199
Fixes #73253

Change-Id: I160179a0e3c148c3ea86be5c9b6cea8a52c3e5b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/663795
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15 20:19:42 -07:00
Lin Lin fcd73b0ac3 cmd/compile/internal/importer: correct a matching error
Change-Id: I2499d6ef1df0cc6bf0be8903ce64c03e1f296d19
GitHub-Last-Rev: 1f759d89be
GitHub-Pull-Request: golang/go#73064
Reviewed-on: https://go-review.googlesource.com/c/go/+/660978
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-15 13:05:15 -07:00
Than McIntosh 47ab9cbd82 cmd: fix DWARF gen bug with packages that use assembly
When the compiler builds a Go package with DWARF 5 generation enabled,
it emits relocations into various generated DWARF symbols (ex:
SDWARFFCN) that use the R_DWTXTADDR_* flavor of relocations. The
specific size of this relocation is selected based on the total number
of functions in the package -- if the package is tiny (just a couple
funcs) we can use R_DWTXTADDR_U1 relocs (which target just a byte); if
the package is larger we might need to use the 2-byte or 3-byte flavor
of this reloc.

Prior to this patch, the strategy used to pick the right relocation
size was flawed in that it didn't take into account packages with
assembly code. For example, if you have a package P with 200 funcs
written in Go source and 200 funcs written in assembly, you can't use
the R_DWTXTADDR_U1 reloc flavor for indirect text references since the
real function count for the package (asm + go) exceeds 255.

The new strategy (with this patch) is to have the compiler look at the
"symabis" file to determine the count of assembly functions. For the
assembler, rather than create additional plumbing to pass in the Go
source func count we just use an dummy (artificially high) function
count so as to select a relocation that will be large enough.

Fixes #72810.
Updates #26379.

Change-Id: I98d04f3c6aacca1dafe1f1610c99c77db290d1d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/663235
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-04-13 08:09:12 -07:00
Keith Randall 21acfdc4ef cmd/compile: turn off variable-sized make() stack allocation with -N
Give people a way to turn this optimization off.

(Currently the constant-sized make() stack allocation is not disabled
with -N. Kinda inconsistent, but oh well, probably worse to change it now.)

Update #73253

Change-Id: Idb9ffde444f34e70673147fd6a962368904a7a55
Reviewed-on: https://go-review.googlesource.com/c/go/+/664655
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: t hepudds <thepudds1460@gmail.com>
2025-04-11 17:18:12 -07:00
Marcel Meyer 56fad21c22 cmd/compile/internal/ssa: small cleanups
Change-Id: I0420fb3956577c56fa24a31929331d526d480556
GitHub-Last-Rev: d74b0d4d75
GitHub-Pull-Request: golang/go#73339
Reviewed-on: https://go-review.googlesource.com/c/go/+/664975
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-04-11 15:58:07 -07:00
Marcel Meyer c77ada1b78 cmd/compile/internal/ssa: simplify with built-in min, max functions
Change-Id: I08fa2940cd3565c578b1b323656a4fa12e0c65bb
GitHub-Last-Rev: 1f673b190e
GitHub-Pull-Request: golang/go#73322
Reviewed-on: https://go-review.googlesource.com/c/go/+/664675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-11 08:50:53 -07:00
Marcel Meyer bbf4d57c22 cmd/compile/internal/ssa: use built-in min, max function
Change-Id: I6dd6e3f8a581931fcea3c3e0ac30ce450253e1d8
GitHub-Last-Rev: c476f8b9a3
GitHub-Pull-Request: golang/go#73318
Reviewed-on: https://go-review.googlesource.com/c/go/+/664615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-11 08:31:00 -07:00
Marcel Meyer 037112464b cmd/compile/internal/ssa: use built-in min function
Change-Id: Id4276adea58afdf98c6f9b547cca0546fc659ae1
GitHub-Last-Rev: 4c836241c8
GitHub-Pull-Request: golang/go#73323
Reviewed-on: https://go-review.googlesource.com/c/go/+/664695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
2025-04-10 16:38:29 -07:00
Marcel Meyer 9d915d6870 cmd/compile/internal/ssa: remove unused round function
Change-Id: I15ee74ab0be0cd996a74e6233b39e0953da3f327
GitHub-Last-Rev: dc41b1027a
GitHub-Pull-Request: golang/go#73324
Reviewed-on: https://go-review.googlesource.com/c/go/+/664696
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-04-10 16:37:57 -07:00
limeidan 09d76e59d2 cmd/compile: set unalignedOK to make memcombine work properly on loong64
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
ValidTenASCIIChars            7.604n ± 0%   6.805n ± 0%  -10.51% (p=0.000 n=10)
Valid100KASCIIChars           37.41µ ± 0%   16.58µ ± 0%  -55.67% (p=0.000 n=10)
ValidTenJapaneseChars         60.84n ± 0%   58.62n ± 0%   -3.64% (p=0.000 n=10)
ValidLongMostlyASCII          113.5µ ± 0%   113.5µ ± 0%        ~ (p=0.303 n=10)
ValidLongJapanese             204.6µ ± 0%   206.8µ ± 0%   +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars      7.604n ± 0%   6.803n ± 0%  -10.53% (p=0.000 n=10)
ValidString100KASCIIChars     38.05µ ± 0%   17.14µ ± 0%  -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars   60.58n ± 0%   59.48n ± 0%   -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII    113.5µ ± 0%   113.4µ ± 0%   -0.10% (p=0.000 n=10)
ValidStringLongJapanese       205.9µ ± 0%   207.3µ ± 0%   +0.67% (p=0.000 n=10)
geomean                       3.324µ        2.756µ       -17.08%

Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-09 09:18:20 -07:00
David Chase ecc06f0db7 cmd/compile: fix the test for ABI specification so it works right w/ generics
Change-Id: I09ef615bfe69a30fa8f7eef5f0a8ff94a244c920
Reviewed-on: https://go-review.googlesource.com/c/go/+/663776
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-09 00:32:10 -07:00
Keith Randall af278bfb1f cmd/compile: add additional flag constant folding rules
Fixes #73200

Change-Id: I77518d37acd838acf79ed113194bac5e2c30897f
Reviewed-on: https://go-review.googlesource.com/c/go/+/663535
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-04-07 22:48:32 -07:00
Keith Randall 16dbd2be39 cmd/compile: be more conservative about arm64 insns that can take zero register
It's really only needed for stores and store-like instructions
(atomic exchange, compare-and-swap, ...).

Fixes #73180

Change-Id: I8ecd833a301355adf0fa4bff43250091640c6226
Reviewed-on: https://go-review.googlesource.com/c/go/+/663155
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-06 19:11:43 -07:00
Keith Randall 1647896aa2 cmd/compile: on 32-bit, bump up align for values that may contain 64-bit fields
On 32-bit systems, these need to be aligned to 8 bytes, even though the
typechecker doesn't tell us that.

The 64-bit allocations might be the target of atomic operations
that require 64-bit alignment.

Fixes 386 longtest builder.

Fixes #73173

Change-Id: I68f6a4f40c7051d29c57ecd560c8d920876a56a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/663015
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-04 23:25:15 -07:00
khr@golang.org e373771490 cmd/compile: use zero register instead of specialized *zero instructions
This lets us get rid of lots of specialized opcodes for storing zero.
Instead, use regular store opcodes that just happen to use the zero
register as one of their inputs.

Change-Id: I2902a6f9b0831cb598df45189ca6bb57221bef72
Reviewed-on: https://go-review.googlesource.com/c/go/+/633075
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-04 15:26:24 -07:00
Keith Randall 2d050e91a3 cmd/compile: allow pointer-containing elements in stack allocations
For variable-sized allocations.

Turns out that we already implement the correct escape semantics
for this case. Even when the result of the "make" does not escape,
everything assigned into it does.

Change-Id: Ia123c538d39f2f1e1581c24e4135a65af3821c5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/657937
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-04 15:04:34 -07:00
Keith Randall 5fc596ebe7 cmd/compile: aggregate scalar allocations for heap escapes
If multiple small scalars escape to the heap, allocate them together
with a single allocation. They are going to be aggregated together
in the tiny allocator anyway, might as well do just one runtime call.

Change-Id: I4317e29235af63de378a26436a18d7fb0c39e41f
Reviewed-on: https://go-review.googlesource.com/c/go/+/648536
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-04 10:53:05 -07:00
Keith Randall 7a427143b6 cmd/compile: stack allocate variable-sized makeslice
Instead of always allocating variable-sized "make" calls on the heap,
allocate a small, constant-sized array on the stack and use that array
as the backing store if it is big enough.

Requires the result of the "make" doesn't escape.

  if cap <= K {
      var arr [K]E
      slice = arr[:len:cap]
  } else {
      slice = makeslice(E, len, cap)
  }

Pretty conservatively for now, K = 32/sizeof(E). The slice header is
already 24 bytes, so wasting 32 bytes of stack if the requested size
is too big isn't that bad. Larger would waste more stack space but
maybe avoid more allocations.

This CL also requires the element type be pointer-free.  Maybe we
could relax that at some point, but it is hard. If the element type
has pointers we can get heap->stack pointers (in the case where the
requested size is too big and the slice is heap allocated).

Note that this only handles the case of makeslice called directly from
compiler-generated code. It does not handle slices built in the
runtime on behalf of the program (e.g. in growslice). Some of those
are currently handled by passing in a tmpBuf (e.g. concatstrings),
but we could probably do more.

Change-Id: I8378efad527cd00d25948a80b82a68d88fbd93a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/653856
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-04 10:36:58 -07:00
Alexander Musman 16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Mark Freeman 5eaeb7b455 go/types, types2: better error messages for invalid qualified identifiers
This change borrows code from CL 631356 by Emmanuel Odeke (thanks!).

Fixes #70549.

Change-Id: Id6f794ea2a95b4297999456f22c6e02890fce13b
Reviewed-on: https://go-review.googlesource.com/c/go/+/662775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Mark Freeman <mark@golang.org>
2025-04-03 15:36:36 -07:00
Julia Lapenko 13b1261175 cmd/compile/internal/devirtualize: do not select a zero-weight edge as the hottest one
When both a direct call and an interface call appear on the same line,
PGO devirtualization may make a suboptimal decision. In some cases,
the directly called function becomes a candidate for devirtualization
if no other relevant outgoing edges with non-zero weight exist for the
caller's IRNode in the WeightedCG. The edge to this candidate is
considered the hottest. Despite having zero weight, this edge still
causes the interface call to be devirtualized.

This CL prevents devirtualization when the weight of the hottest edge
is 0.

Fixes #72092

Change-Id: I06c0c5e080398d86f832e09244aceaa4aeb98721
Reviewed-on: https://go-review.googlesource.com/c/go/+/655475
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-02 16:39:13 -07:00
Sergey Slukin 116b82354c cmd/compile: changed variable name due to shadowing of package name min
Change-Id: I52e5de04d137238d6f6779edcc662f5c7433c61e
Reviewed-on: https://go-review.googlesource.com/c/go/+/660195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-02 11:25:15 -07:00
Alexander Musman 3033ef0016 cmd/compile: Remove unused 'NoInline' field from CallExpr stucture
Remove the 'NoInline' field from CallExpr stucture, as it's no longer
used after enabling of tail call inlining.

Change-Id: Ief3ada9938589e7a2f181582ef2758ebc4d03aad
Reviewed-on: https://go-review.googlesource.com/c/go/+/655816
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-02 05:55:35 -07:00
Mateusz Poliwczak eec3745bd7 cmd/compile/internal/ssa: replace uses of interface{} with Sym/Aux
Change-Id: I0a3ce2e823697eee5bb5e7d5ea0ef025132c0689
Reviewed-on: https://go-review.googlesource.com/c/go/+/661655
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-31 08:20:16 -07:00
Joel Sing e6c2e12c63 cmd/compile/internal/ssa: optimise more branches with zero on riscv64
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:

   li      t2, 0
   bltu    t2, t1, 0x174b4

This is equivalent to checking if the bounds is not equal to zero. With
this change:

   bnez    t1, 0x174c0

This removes more than 500 instructions from the Go binary on riscv64.

Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 01:27:22 -07:00
Junyang Shao b17a99d6fc cmd/compile: update GOSSAFUNC doc for printing CFG
Updates #30074

Change-Id: I160124afb65849c624a225d384c35313723f9f30
Reviewed-on: https://go-review.googlesource.com/c/go/+/661415
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-27 15:39:38 -07:00
Junyang Shao 4d6722a8fd cmd/compile: match more patterns for shortcircuit
This CL tries to generalize the pattern matching of certain
shortcircuit-able CFGs a bit more:
For a shortcircuit-able CFG:
p   q
 \ /
  b
 / \
t   u
Where the constant branch is t, and b has multiple phi values
other than the control phi.

For the non-constant branch target u, we try to match the
"diamond" shape CFG:
p   q
 \ /
  b
 / \
t   u
 \ /
  m

or

p   q
 \ /
  b
  |\
  | u
  |/
  m

Instead of matching u as a single block, we know try to
generalize it as a subgraph that satisfy condition:
it's a DAG that has a single entry point u, and has a
path to m.

compilebench stats:
                         │   old.txt   │              new.txt               │
                         │   sec/op    │   sec/op     vs base               │
Template                   109.4m ± 3%   109.8m ± 3%       ~ (p=0.796 n=10)
Unicode                    94.23m ± 1%   93.85m ± 1%       ~ (p=0.631 n=10)
GoTypes                    538.2m ± 1%   538.8m ± 1%       ~ (p=0.912 n=10)
Compiler                   90.21m ± 1%   90.02m ± 1%       ~ (p=0.436 n=10)
SSA                         3.318 ± 1%    3.323 ± 1%       ~ (p=1.000 n=10)
Flate                      69.38m ± 1%   69.57m ± 2%       ~ (p=0.529 n=10)
GoParser                   128.5m ± 1%   127.4m ± 1%       ~ (p=0.075 n=10)
Reflect                    267.4m ± 1%   267.2m ± 2%       ~ (p=0.739 n=10)
Tar                        127.7m ± 2%   126.4m ± 1%       ~ (p=0.353 n=10)
XML                        149.5m ± 1%   149.6m ± 2%       ~ (p=0.684 n=10)
LinkCompiler               390.0m ± 1%   388.4m ± 2%       ~ (p=0.353 n=10)
ExternalLinkCompiler        1.296 ± 0%    1.296 ± 1%       ~ (p=0.971 n=10)
LinkWithoutDebugCompiler   226.3m ± 1%   225.5m ± 1%       ~ (p=0.393 n=10)
StdCmd                      13.26 ± 0%    13.25 ± 1%       ~ (p=0.529 n=10)
geomean                    319.3m        318.8m       -0.17%

                         │   old.txt   │               new.txt               │
                         │ user-sec/op │ user-sec/op   vs base               │
Template                   293.1m ± 3%   291.4m ± 11%       ~ (p=0.436 n=10)
Unicode                    91.09m ± 5%   87.61m ±  7%       ~ (p=0.165 n=10)
GoTypes                     1.932 ± 3%    1.926 ±  3%       ~ (p=0.739 n=10)
Compiler                   125.8m ± 3%   121.5m ± 10%       ~ (p=0.481 n=10)
SSA                         18.93 ± 3%    18.89 ±  1%       ~ (p=0.684 n=10)
Flate                      158.5m ± 5%   160.0m ±  7%       ~ (p=0.971 n=10)
GoParser                   316.0m ± 9%   327.4m ±  7%       ~ (p=0.052 n=10)
Reflect                    845.6m ± 6%   861.6m ±  3%       ~ (p=0.579 n=10)
Tar                        358.1m ± 5%   348.5m ±  4%       ~ (p=0.089 n=10)
XML                        382.4m ± 4%   392.2m ±  3%       ~ (p=0.143 n=10)
LinkCompiler               609.1m ± 4%   627.9m ±  3%       ~ (p=0.123 n=10)
ExternalLinkCompiler        1.336 ± 2%    1.343 ±  4%       ~ (p=0.565 n=10)
LinkWithoutDebugCompiler   248.7m ± 3%   248.0m ±  1%       ~ (p=0.853 n=10)
geomean                    506.4m        506.8m        +0.08%

          │   old.txt    │               new.txt               │
          │  text-bytes  │  text-bytes   vs base               │
HelloSize   965.8Ki ± 0%   965.0Ki ± 0%  -0.08% (p=0.000 n=10)
CmdGoSize   12.30Mi ± 0%   12.29Mi ± 0%  -0.08% (p=0.000 n=10)
geomean     3.406Mi        3.403Mi       -0.08%

          │   old.txt    │                new.txt                │
          │  data-bytes  │  data-bytes   vs base                 │
HelloSize   15.08Ki ± 0%   15.08Ki ± 0%       ~ (p=1.000 n=10) ¹
CmdGoSize   408.5Ki ± 0%   408.5Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean     78.49Ki        78.49Ki       +0.00%
¹ all samples are equal

          │   old.txt    │                new.txt                │
          │  bss-bytes   │  bss-bytes    vs base                 │
HelloSize   142.0Ki ± 0%   142.0Ki ± 0%       ~ (p=1.000 n=10) ¹
CmdGoSize   206.4Ki ± 0%   206.4Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean     171.2Ki        171.2Ki       +0.00%
¹ all samples are equal

          │   old.txt    │               new.txt               │
          │  exe-bytes   │  exe-bytes    vs base               │
HelloSize   1.466Mi ± 0%   1.462Mi ± 0%  -0.27% (p=0.000 n=10)
CmdGoSize   18.19Mi ± 0%   18.17Mi ± 0%  -0.10% (p=0.000 n=10)
geomean     5.164Mi        5.154Mi       -0.18%

Fixes #72132

Change-Id: I3d1cb10b6a158c5750adc23c79709d63dbd771f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656255
Auto-Submit: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-27 12:30:03 -07:00
Alan Donovan b3aff930cf go/types: LookupSelection: returns LookupFieldOrMethod as a Selection
Also, rewrite some uses of LookupFieldOrMethod in terms of it.

+ doc, relnote

Fixes #70737

Change-Id: I58a6dd78ee78560d8b6ea2d821381960a72660ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/647196
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-27 12:29:28 -07:00
Mateusz Poliwczak e9242ee812 cmd/compile: remove references to *gc.Node in docs
ssa.Sym is only implemented by *ir.Name or *obj.LSym.

Change-Id: Ia171db618abd8b438fcc2cf402f40f3fe3ec6833
Reviewed-on: https://go-review.googlesource.com/c/go/+/660995
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27 11:25:27 -07:00
Austin Clements 5918101d67 testing: detect a stopped timer in B.Loop
Currently, if the user stops the timer in a B.Loop benchmark loop, the
benchmark will run until it hits the timeout and fails.

Fix this by detecting that the timer is stopped and failing the
benchmark right away. We avoid making the fast path more expensive for
this check by "poisoning" the B.Loop iteration counter when the timer
is stopped so that it falls back to the slow path, which can check the
timer.

This causes b to escape from B.Loop, which is totally harmless because
it was already definitely heap-allocated. But it causes the
test/inline_testingbloop.go errorcheck test to fail. I don't think the
escape messages actually mattered to that test, they just had to be
matched. To fix this, we drop the debug level to -m=1, since -m=2
prints a lot of extra information for escaping parameters that we
don't want to deal with, and change one error check to allow b to
escape.

Fixes #72971.

Change-Id: I7d4abbb1ec1e096685514536f91ba0d581cca6b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/659657
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-24 11:41:09 -07:00
Robert Griesemer 3ada42ffed go/types, types2: align trace output a bit better for easier debugging
Compute alignment padding rather than using a tab in trace output.
This aligns the ":" after the file position in typical cases (files
shorter than 1000 lines, lines shorter than 100 columns), resulting
in nicer trace output for easier debugging.

Before this CL (example trace):

x.go:8:2:  -- checking type A (white, objPath = )
x.go:8:11:  .  -- type B
x.go:9:2:  .  .  -- checking type B (white, objPath = A)
x.go:9:14:  .  .  .  -- type C[D]
x.go:9:13:  .  .  .  .  -- instantiating type C with [D]
x.go:9:13:  .  .  .  .  .  -- type C
x.go:10:2:  .  .  .  .  .  .  -- checking type C (white, objPath = A->B)
x.go:10:6:  .  .  .  .  .  .  .  -- type any
x.go:10:6:  .  .  .  .  .  .  .  => any (under = any) // *Alias
x.go:10:11:  .  .  .  .  .  .  .  -- type struct{}
x.go:10:11:  .  .  .  .  .  .  .  => struct{} // *Struct
x.go:10:2:  .  .  .  .  .  .  => type C[_ any] struct{} (black)

With this CL:

x.go:8:2   :  -- checking type A (white, objPath = )
x.go:8:11  :  .  -- type B
x.go:9:2   :  .  .  -- checking type B (white, objPath = A)
x.go:9:14  :  .  .  .  -- type C[D]
x.go:9:13  :  .  .  .  .  -- instantiating type C with [D]
x.go:9:13  :  .  .  .  .  .  -- type C
x.go:10:2  :  .  .  .  .  .  .  -- checking type C (white, objPath = A->B)
x.go:10:6  :  .  .  .  .  .  .  .  -- type any
x.go:10:6  :  .  .  .  .  .  .  .  => any (under = any) // *Alias
x.go:10:11 :  .  .  .  .  .  .  .  -- type struct{}
x.go:10:11 :  .  .  .  .  .  .  .  => struct{} // *Struct
x.go:10:2  :  .  .  .  .  .  .  => type C[_ any] struct{} (black)

Change-Id: Ibcf346737f57ec5351d1e1e65178e2c3c155d766
Reviewed-on: https://go-review.googlesource.com/c/go/+/659755
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-23 22:15:04 -07:00
Joel Sing b70244ff7a cmd/compile: intrinsify math/bits.Len on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │   clz.b.1   │               clz.b.2               │
                 │   sec/op    │   sec/op     vs base                │
LeadingZeros-4     28.89n ± 0%   12.08n ± 0%  -58.19% (p=0.000 n=10)
LeadingZeros8-4    18.79n ± 0%   14.76n ± 0%  -21.45% (p=0.000 n=10)
LeadingZeros16-4   25.27n ± 0%   14.76n ± 0%  -41.59% (p=0.000 n=10)
LeadingZeros32-4   25.12n ± 0%   12.08n ± 0%  -51.92% (p=0.000 n=10)
LeadingZeros64-4   25.89n ± 0%   12.08n ± 0%  -53.35% (p=0.000 n=10)
geomean            24.55n        13.09n       -46.70%

Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-21 18:21:44 -07:00
Derek Parker afe11db4a7 cmd/compile/internal/abi: fix ComputePadding
Fixes the ComputePadding calculation to take into account the padding
added for the current offset. This fixes an issue where padding can be
added incorrectly for certain structs.

Related: https://github.com/go-delve/delve/issues/3923

Same as https://go-review.googlesource.com/c/go/+/656736 just without
the brittle test.

Fixes #72053

Change-Id: I67f157a42f5fc5d3a54d0e9be03488aa44752bcb
GitHub-Last-Rev: fabed69a31
GitHub-Pull-Request: golang/go#72997
Reviewed-on: https://go-review.googlesource.com/c/go/+/659698
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-21 15:02:04 -07:00
Cherry Mui 69ea62fe95 Revert "cmd/compile/internal/abi: fix ComputePadding"
This reverts CL 656736.

Reason for revert: breaks many builders (all flavors of
linux-amd64 builders).

Change-Id: Ie7190d4078fada227391804c5cf10b9ce9cc9115
Reviewed-on: https://go-review.googlesource.com/c/go/+/659955
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-21 09:29:02 -07:00
Derek Parker a20d583bb9 cmd/compile/internal/abi: fix ComputePadding
Fixes the ComputePadding calculation to take into account
the padding added for the current offset. This fixes an issue
where padding can be added incorrectly for certain structs.

Related: https://github.com/go-delve/delve/issues/3923

Fixes #72053

Change-Id: I277629799168c6b44bc9ed03df4345e0318064ce
GitHub-Last-Rev: 9478b29a13
GitHub-Pull-Request: golang/go#72805
Reviewed-on: https://go-review.googlesource.com/c/go/+/656736
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-20 15:45:30 -07:00
Mateusz Poliwczak 9dc572eab5 cmd/compile/internal/ssa: remove linkedBlocks and its uses
The use of predFn/succFn is not needed since CL 22401.

Change-Id: Icc39190bb7b0e85541c75da2d564093d551751d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/659555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-20 09:10:17 -07:00
Joel Sing 03cb8d408e cmd/compile/internal/ssagen: use an alias for math/bits.OnesCount
Currently, only amd64 has an intrinsic for math/bits.OnesCount, which
generates the same code as math/bits.OnesCount64. Replace this with
an alias that maps math/bits.OnesCount to math/bits.OnesCount64 on
64 bit platforms.

Change-Id: Ifa12a2173a201aacd52c3c22b9a948be6e314405
Reviewed-on: https://go-review.googlesource.com/c/go/+/659215
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-20 08:15:01 -07:00