Commit Graph

5281 Commits

Author SHA1 Message Date
thepudds ed24bb4e60 cmd/compile/internal/escape: propagate constants to interface conversions to avoid allocs
Currently, the integer value in the following interface conversion gets
heap allocated:

   v := 1000
   fmt.Println(v)

In contrast, this conversion does not currently cause the integer value
to be heap allocated:

   fmt.Println(1000)

The second example is able to avoid heap allocation because of an
optimization in walk (by Josh in #18704 and related issues) that
recognizes a literal is being used. In the first example, that
optimization is currently thwarted by the literal getting assigned
to a local variable prior to use in the interface conversion.

This CL propagates constants to interface conversions like
in the first example to avoid heap allocations, instead using
a read-only global. The net effect is roughly turning the first example
into the second.

One place this comes up in practice currently is with logging or
debug prints. For example, if we have something like:

   func conditionalDebugf(format string, args ...interface{}) {
   	if debugEnabled {
   		fmt.Fprintf(io.Discard, format, args...)
   	}
   }

Prior to this CL, this integer is heap allocated, even when the
debugEnabled flag is false, and even when the compiler
inlines conditionalDebugf:

   v := 1000
   conditionalDebugf("hello %d", v)

With this CL, the integer here is no longer heap allocated, even when
the debugEnabled flag is enabled, because the compiler can now see that
it can use a read-only global.

See the writeup in #71359 for more details.

CL 649076 (earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

Updates #71359
Updates #62653
Updates #53465
Updates #8618

Change-Id: I19a51e74b36576ebb0b9cf599267cbd2bd847ce4
Reviewed-on: https://go-review.googlesource.com/c/go/+/649079
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:02:43 -07:00
Xiaolin Zhao 4ce1c8e9e1 cmd/compile: add rules about ORN and ANDN
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279880    279776  -104   -0.0372%
    asm       556638    556410  -228   -0.0410%
    buildid   272272    272072  -200   -0.0735%
    cgo       481522    481318  -204   -0.0424%
    compile   2457788   2457580 -208   -0.0085%
    covdata   323384    323280  -104   -0.0322%
    cover     518450    518234  -216   -0.0417%
    dist      340790    340686  -104   -0.0305%
    distpack  282456    282252  -204   -0.0722%
    doc       789932    789688  -244   -0.0309%
    fix       324332    324228  -104   -0.0321%
    link      704622    704390  -232   -0.0329%
    nm        277132    277028  -104   -0.0375%
    objdump   507862    507758  -104   -0.0205%
    pack      221774    221674  -100   -0.0451%
    pprof     1469816   1469552 -264   -0.0180%
    test2json 254836    254732  -104   -0.0408%
    trace     1100002   1099738 -264   -0.0240%
    vet       781078    780874  -204   -0.0261%
    go        1529116   1528848 -268   -0.0175%
    gofmt     318556    318448  -108   -0.0339%
    total     13792238 13788566 -3672  -0.0266%

Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-21 08:28:37 -07:00
Xiaolin Zhao d37a1bdd48 cmd/compile: fix the implementation of NORconst on loong64
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.

Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 20:24:09 -07:00
Junyang Shao 113b25774e cmd/compile: memcombine different size stores
This CL implements the TODO in combineStores to allow combining
stores of different sizes, as long as the total size aligns to
2, 4, 8.

Fixes #72832.

Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/661855
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 13:00:16 -07:00
Julian Zhu dfebef1c04 cmd/compile: fold negation into addition/subtraction on arm64
Fold negation into addition/subtraction and avoid double negation.

platform: linux/arm64

file      before    after     Δ       %
addr2line 3628108   3628116   +8      +0.000%
asm       6208353   6207857   -496    -0.008%
buildid   3460682   3460418   -264    -0.008%
cgo       5572988   5572492   -496    -0.009%
compile   26042159  26041039  -1120   -0.004%
cover     6304328   6303472   -856    -0.014%
dist      4139330   4139098   -232    -0.006%
doc       9429305   9428065   -1240   -0.013%
fix       3997189   3996733   -456    -0.011%
link      8212128   8210280   -1848   -0.023%
nm        3620056   3619696   -360    -0.010%
objdump   5920289   5919233   -1056   -0.018%
pack      2892250   2891778   -472    -0.016%
pprof     17094569  17092745  -1824   -0.011%
test2json 3335825   3335529   -296    -0.009%
trace     15842080  15841456  -624    -0.004%
vet       9472194   9471106   -1088   -0.011%
go        19081541  19081509  -32     -0.000%
total     154253374 154240622 -12752  -0.008%

platform: darwin/arm64

file    before    after     Δ       %
compile 27152002  27135490  -16512  -0.061%
link    8372914   8356402   -16512  -0.197%
go      19154802  19154778  -24     -0.000%
total   157734180 157701132 -33048  -0.021%

Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/673715
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 11:08:28 -07:00
thepudds 326e5e1b7a cmd/compile/internal/escape: additional constant and zero value tests and logging
This adds additional logging for the work that walk does to reduce
how often an interface conversion results in an allocation.

Also, as part of #71359, we will be updating how escape analysis and
walk handle basic literals, composite literals, and zero values,
so add some tests that uses this new logging.

By the end of our CL stack, we address all of these tests.

Updates #71359

Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/649076
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 09:19:42 -07:00
Keith Randall ce88e341b9 cmd/compile: allocate backing store for append on the stack
When appending, if the backing store doesn't escape and a
constant-sized backing store is big enough, use a constant-sized
stack-allocated backing store instead of allocating it from the heap.

cmd/go is <0.1% bigger.

As an example of how this helps, if you edit strings/strings.go:FieldsFunc
to replace
    spans := make([]span, 0, 32)
with
    var spans []span

then this CL removes the first 2 allocations that are part of the growth sequence:

                            │    base      │                 exp                  │
                            │  allocs/op   │  allocs/op   vs base                 │
FieldsFunc/ASCII/16-24         3.000 ± ∞ ¹   2.000 ± ∞ ¹  -33.33% (p=0.008 n=5)
FieldsFunc/ASCII/256-24        7.000 ± ∞ ¹   5.000 ± ∞ ¹  -28.57% (p=0.008 n=5)
FieldsFunc/ASCII/4096-24      11.000 ± ∞ ¹   9.000 ± ∞ ¹  -18.18% (p=0.008 n=5)
FieldsFunc/ASCII/65536-24      18.00 ± ∞ ¹   16.00 ± ∞ ¹  -11.11% (p=0.008 n=5)
FieldsFunc/ASCII/1048576-24    30.00 ± ∞ ¹   28.00 ± ∞ ¹   -6.67% (p=0.008 n=5)
FieldsFunc/Mixed/16-24         2.000 ± ∞ ¹   2.000 ± ∞ ¹        ~ (p=1.000 n=5)
FieldsFunc/Mixed/256-24        7.000 ± ∞ ¹   5.000 ± ∞ ¹  -28.57% (p=0.008 n=5)
FieldsFunc/Mixed/4096-24      11.000 ± ∞ ¹   9.000 ± ∞ ¹  -18.18% (p=0.008 n=5)
FieldsFunc/Mixed/65536-24      18.00 ± ∞ ¹   16.00 ± ∞ ¹  -11.11% (p=0.008 n=5)
FieldsFunc/Mixed/1048576-24    30.00 ± ∞ ¹   28.00 ± ∞ ¹   -6.67% (p=0.008 n=5)

(Of course, people have spotted and fixed a bunch of allocation sites
like this, but now we're ~automatically doing it everywhere going forward.)

No significant increases in frame sizes in cmd/go.

Change-Id: I301c4d9676667eacdae0058960321041d173751a
Reviewed-on: https://go-review.googlesource.com/c/go/+/664299
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-05-19 16:14:53 -07:00
Keith Randall 3baf53aec6 cmd/compile: derive bounds on signed %N for N a power of 2
-N+1 <= x % N <= N-1

This is useful for cases like:

func setBit(b []byte, i int) {
    b[i/8] |= 1<<(i%8)
}

The shift does not need protection against larger-than-7 cases.
(It does still need protection against <0 cases.)

Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672995
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-19 15:21:54 -07:00
Julian Zhu d52679006c cmd/compile: fold negation into addition/subtraction on mipsx
Fold negation into addition/subtraction and avoid double negation.

file      before    after     Δ       %
addr2line 3742022   3741986   -36     -0.001%
asm       6668616   6668628   +12     +0.000%
buildid   3583786   3583630   -156    -0.004%
cgo       6020370   6019634   -736    -0.012%
compile   29416016  29417336  +1320   +0.004%
cover     6801903   6801675   -228    -0.003%
dist      4485916   4485816   -100    -0.002%
doc       10652787  10652251  -536    -0.005%
fix       4115988   4115560   -428    -0.010%
link      9002328   9001616   -712    -0.008%
nm        3733148   3732780   -368    -0.010%
objdump   6163292   6163068   -224    -0.004%
pack      2944768   2944604   -164    -0.006%
pprof     18909973  18908773  -1200   -0.006%
test2json 3394662   3394778   +116    +0.003%
trace     17350911  17349751  -1160   -0.007%
vet       10077727  10077527  -200    -0.002%
go        19118769  19118609  -160    -0.001%
total     166182982 166178022 -4960   -0.003%

Change-Id: Id55698800fd70f3cb2ff48393584456b87208921
Reviewed-on: https://go-review.googlesource.com/c/go/+/673556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-19 11:27:35 -07:00
Mark Freeman bc5aa2f7d3 go/types, types2: improve error message for init without body
Change-Id: I8a684965e88e0e33a6ff33a16e08d136e3267f7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/663636
TryBot-Bypass: Mark Freeman <mark@golang.org>
Auto-Submit: Mark Freeman <mark@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-05-19 11:00:10 -07:00
Julian Zhu 8097cf14d2 cmd/compile: fold negation into addition/subtraction on mips64x
Fold negation into addition/subtraction and avoid double negation.

file      before    after     Δ       %
addr2line 4007310   4007470   +160    +0.004%
asm       7007636   7007436   -200    -0.003%
buildid   3839268   3838972   -296    -0.008%
cgo       6353466   6352738   -728    -0.011%
compile   30426920  30426896  -24     -0.000%
cover     7005408   7004744   -664    -0.009%
dist      4651192   4650872   -320    -0.007%
doc       10606050  10606034  -16     -0.000%
fix       4446414   4446390   -24     -0.001%
link      9237736   9237024   -712    -0.008%
nm        3999107   3999323   +216    +0.005%
objdump   6762424   6762144   -280    -0.004%
pack      3270757   3270493   -264    -0.008%
pprof     19428299  19361939  -66360  -0.342%
test2json 3717345   3717217   -128    -0.003%
trace     17382273  17381657  -616    -0.004%
vet       10689481  10688985  -496    -0.005%
go        19118769  19118609  -160    -0.001%
total     171949855 171878943 -70912  -0.041%

Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/673555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-16 11:06:06 -07:00
Keith Randall d681270714 cmd/compile: allow load-op merging in additional situations
x += *p

We want to do this with a single load+add operation on amd64.
The tricky part is that we don't want to combine if there are
other uses of x after this instruction.

Implement a simple detector that seems to capture a common situation -
x += *p is in a loop, and the other use of x is after loop exit.
In that case, it does not hurt to do the load+add combo.

Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/672957
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-15 15:21:36 -07:00
Keith Randall 19f05770b0 cmd/compile: schedule induction variable increments late
for ..; ..; i++ {
 ...
}

We want to schedule the i++ late in the block, so that all other
uses of i in the block are scheduled first. That way, i++ can
happen in place in a register instead of requiring a temporary register.

Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df
Reviewed-on: https://go-review.googlesource.com/c/go/+/672998
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-15 14:06:41 -07:00
Yongyue Sun fc641e7fae cmd/compile: create LSym for closures with type conversion
Follow-up to #54959 with another failing case.

The linker needs FuncInfo metadata for all inlined functions. CL 436240 explicitly creates LSym for direct closure calls to ensure we keep the FuncInfo metadata.

However, CL 436240 won't work if the direct closure call is wrapped by a no-effect type conversion, even if that closure could be inlined.

This commit should fix such case.

Fixes #73716

Change-Id: Icda6024da54c8d933f87300e691334c080344695
GitHub-Last-Rev: e9aed02eb6
GitHub-Pull-Request: golang/go#73718
Reviewed-on: https://go-review.googlesource.com/c/go/+/672855
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-15 13:16:39 -07:00
Xiaolin Zhao c31a5c571f cmd/compile: fold negation into addition/subtraction on loong64
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279972    279896  -76    -0.0271%
    asm       556390    556310  -80    -0.0144%
    buildid   272376    272300  -76    -0.0279%
    cgo       481534    481550  +16    +0.0033%
    compile   2457992   2457396 -596   -0.0242%
    covdata   323488    323404  -84    -0.0260%
    cover     518630    518490  -140   -0.0270%
    dist      340894    340814  -80    -0.0235%
    distpack  282568    282484  -84    -0.0297%
    doc       790224    789984  -240   -0.0304%
    fix       324408    324348  -60    -0.0185%
    link      704910    704666  -244   -0.0346%
    nm        277220    277144  -76    -0.0274%
    objdump   508026    507878  -148   -0.0291%
    pack      221810    221786  -24    -0.0108%
    pprof     1470284   1469880 -404   -0.0275%
    test2json 254896    254852  -44    -0.0173%
    trace     1100390   1100074 -316   -0.0287%
    vet       781398    781142  -256   -0.0328%
    go        1529668   1529128 -540   -0.0353%
    gofmt     318668    318568  -100   -0.0314%
    total     13795746 13792094 -3652  -0.0265%

Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-14 17:46:58 -07:00
qmuntal 176a2154aa cmd/link: use >4GB base address for 64-bit PE binaries
Windows prefers 64-bit binaries to be loaded at an address above 4GB.

Having a preferred base address below this boundary triggers a
compatibility mode in Address Space Layout Randomization (ASLR) on
recent versions of Windows that reduces the number of locations to which
ASLR may relocate the binary.

The Go internal linker was using a smaller base address due to an issue
with how dynamic cgo symbols were relocated, which has been fixed in
this CL.

Fixes #73561.

Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-longtest
Change-Id: Ia8cb35d57d921d9be706a8975fa085af7996f124
Reviewed-on: https://go-review.googlesource.com/c/go/+/671515
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-12 11:05:28 -07:00
Michael Anthony Knyszek e46c8e0558 runtime: schedule cleanups across multiple goroutines
This change splits the finalizer and cleanup queues and implements a new
lock-free blocking queue for cleanups. The basic design is as follows:

The cleanup queue is organized in fixed-sized blocks. Individual cleanup
functions are queued, but only whole blocks are dequeued.

Enqueuing cleanups places them in P-local cleanup blocks. These are
flushed to the full list as they get full. Cleanups can only be enqueued
by an active sweeper.

Dequeuing cleanups always dequeues entire blocks from the full list.
Cleanup blocks can be dequeued and executed at any time.

The very last active sweeper in the sweep phase is responsible for
flushing all local cleanup blocks to the full list. It can do this
without any synchronization because the next GC can't start yet, so we
can be very certain that nobody else will be accessing the local blocks.

Cleanup blocks are stored off-heap because the need to be allocated by
the sweeper, which is called from heap allocation paths. As a result,
the GC treats cleanup blocks as roots, just like finalizer blocks.

Flushes to the full list signal to the scheduler that cleanup goroutines
should be awoken. Every time the scheduler goes to wake up a cleanup
goroutine and there were more signals than goroutines to wake, it then
forwards this signal to runtime.AddCleanup, so that it creates another
goroutine the next time it is called, up to gomaxprocs goroutines.

The signals here are a little convoluted, but exist because the sweeper
and the scheduler cannot safely create new goroutines.

For #71772.
For #71825.

Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/650697
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-08 11:10:33 -07:00
Jakub Ciolek c9d0fad5cb cmd/compile: add 2 phiopt cases
Add 2 more cases:

if a { x = value } else { x = a } => x = a && value
if a { x = a } else { x = value } => x = a || value

AND case goes from:

00006 (8)	TESTB	AX, AX
00007 (8)	JNE	9
00008 (13)	MOVL	AX, BX
00009 (13)	MOVL	BX, AX
00010 (13)	RET

to:

00006 (13)	ANDL	BX, AX
00007 (13)	RET

OR goes from:

00006 (19)	TESTB	AX, AX
00007 (19)	JNE	9
00008 (24)	MOVL	BX, AX
00009 (24)	RET

to:

00006 (24)	ORL	BX, AX
00007 (24)	RET

compilecmp linux/amd64:

runtime
runtime.lock2 847 -> 869  (+2.60%)
runtime.addspecial 542 -> 517  (-4.61%)
runtime.tracebackPCs changed
runtime.scanstack changed
runtime.mallocinit changed
runtime.traceback2 2238 -> 2206  (-1.43%)

runtime [cmd/compile]
runtime.lock2 860 -> 882  (+2.56%)
runtime.scanstack changed
runtime.addspecial 542 -> 517  (-4.61%)
runtime.traceback2 2238 -> 2206  (-1.43%)
runtime.lockWithRank 870 -> 890  (+2.30%)
runtime.tracebackPCs changed
runtime.mallocinit changed

strconv
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaFixed64 639 -> 638  (-0.16%)
strconv.readFloat changed
strconv.ryuFtoaShortest changed

strings
strings.(*Replacer).build changed

strconv [cmd/compile]
strconv.readFloat changed
strconv.ryuFtoaFixed64 639 -> 638  (-0.16%)
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaShortest changed

strings [cmd/compile]
strings.(*Replacer).build changed

regexp
regexp.makeOnePass.func1 changed

regexp [cmd/compile]
regexp.makeOnePass.func1 changed

encoding/json
encoding/json.indirect changed

database/sql
database/sql.driverArgsConnLocked changed

vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.Form.transform changed

go/doc/comment
go/doc/comment.parseSpans changed

internal/diff
internal/diff.tgs changed

log/slog
log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877  (-1.11%)

testing/fstest
testing/fstest.(*fsTester).checkGlob changed

runtime/pprof
runtime/pprof.(*profileBuilder).build changed

cmd/internal/dwarf
cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244  (-3.94%)

go/printer
go/printer.keepTypeColumn 302 -> 270  (-10.60%)
go/printer.(*printer).binaryExpr changed

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*scanner).rune changed
cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153  (+0.75%)

Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73
Reviewed-on: https://go-review.googlesource.com/c/go/+/666835
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-08 10:18:37 -07:00
Keith Randall 12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
Joel Sing 4d10d4ad84 cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.

On a Banana Pi F3, with GORISCV64=rva22u64:

              │     oc.1     │                oc.2                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   4.389n ± 0%  -74.08% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.016n ± 0%  -11.10% (p=0.000 n=10)
OnesCount16-8    9.404n ± 0%   5.015n ± 0%  -46.67% (p=0.000 n=10)
OnesCount32-8   13.165n ± 0%   4.388n ± 0%  -66.67% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   4.388n ± 0%  -73.08% (p=0.000 n=10)
geomean          11.40n        4.629n       -59.40%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:

              │     oc.3     │                oc.4                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   5.643n ± 0%  -66.67% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.642n ± 0%        ~ (p=0.447 n=10)
OnesCount16-8   10.030n ± 0%   6.896n ± 0%  -31.25% (p=0.000 n=10)
OnesCount32-8   13.170n ± 0%   5.642n ± 0%  -57.16% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   5.642n ± 0%  -65.39% (p=0.000 n=10)
geomean          11.55n        5.873n       -49.16%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:

              │    oc.3     │                oc.5                 │
              │   sec/op    │   sec/op     vs base                │
OnesCount-8     16.93n ± 0%   29.47n ± 0%  +74.07% (p=0.000 n=10)
OnesCount8-8    5.642n ± 0%   5.643n ± 0%        ~ (p=0.191 n=10)
OnesCount16-8   10.03n ± 0%   15.05n ± 0%  +50.05% (p=0.000 n=10)
OnesCount32-8   13.17n ± 0%   18.18n ± 0%  +38.04% (p=0.000 n=10)
OnesCount64-8   16.30n ± 0%   21.94n ± 0%  +34.60% (p=0.000 n=10)
geomean         11.55n        15.84n       +37.16%

For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.

Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 05:57:41 -07:00
Joel Sing 90e8b8cdae cmd/compile: intrinsify math/bits.Bswap on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │     rb.1     │                rb.2                 │
                 │    sec/op    │   sec/op     vs base                │
ReverseBytes-4     18.790n ± 0%   4.026n ± 0%  -78.57% (p=0.000 n=10)
ReverseBytes16-4    6.710n ± 0%   5.368n ± 0%  -20.00% (p=0.000 n=10)
ReverseBytes32-4   13.420n ± 0%   5.368n ± 0%  -60.00% (p=0.000 n=10)
ReverseBytes64-4   17.450n ± 0%   4.026n ± 0%  -76.93% (p=0.000 n=10)
geomean             13.11n        4.649n       -64.54%

Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-05-01 05:57:13 -07:00
Keith Randall 67e0681aef cmd/compile: put constant value on node inside parentheses
That's where the unified IR writer expects it.

Fixes #73476

Change-Id: Ic22bd8dee5be5991e6d126ae3f6eccb2acdc0b19
Reviewed-on: https://go-review.googlesource.com/c/go/+/667415
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-24 12:17:27 -07:00
Keith Randall 3452d80da3 cmd/compile: add cast in range loop final value computation
When replacing a loop where the iteration variable has a named type,
we need to compute the last iteration value as i = T(len(a)-1), not
just i = len(a)-1.

Fixes #73491

Change-Id: Ic1cc3bdf8571a40c10060f929a9db8a888de2b70
Reviewed-on: https://go-review.googlesource.com/c/go/+/667815
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-24 11:02:26 -07:00
Keith Randall c1fc209c41 runtime: use precise bounds of Go data/bss for race detector
We only want to call into the race detector for Go global variables.
By rounding up the region bounds, we can include some C globals.
Even worse, we can include only *part* of a C global, leading to
race{read,write}range calls which straddle the end of shadow memory.
That causes the race detector to barf.

Fix some off-by-one errors in the assembly comparisons. We want to
skip calling the race detector when addr == racedataend.

Fixes #73483

Change-Id: I436b0f588d6165b61f30cb7653016ba9b7cbf585
Reviewed-on: https://go-review.googlesource.com/c/go/+/667655
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2025-04-23 23:22:12 -07:00
Keith Randall 7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Prabhav Dogra 95611c0eb4 sync: use atomic.Bool for Once.done
Updated the use of atomic.Uint32 to atomic.Bool for sync package.

Change-Id: Ib8da66fea86ef06e1427ac5118016b96fbcda6b1
GitHub-Last-Rev: d36e0f431f
GitHub-Pull-Request: golang/go#73447
Reviewed-on: https://go-review.googlesource.com/c/go/+/666895
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-04-22 08:28:13 -07:00
Keith Randall 336626bac4 cmd/compile: ensure we evaluate side effects of len() arg
For any len() which requires the evaluation of its arg (according to the spec).

Update #72844

Change-Id: Id2b0bcc78073a6d5051abd000131dafdf65e7f26
Reviewed-on: https://go-review.googlesource.com/c/go/+/658097
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-21 15:50:54 -07:00
Keith Randall 8af32240c6 cmd/compile: don't evaluate side effects of range over array
If the thing we're ranging over is an array or ptr to array, and
it doesn't have a function call or channel receive in it, then we
shouldn't evaluate it.

Typecheck the ranged-over value as a constant in that case.
That makes the unified exporter replace the range expression
with a constant int.

Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554
Reviewed-on: https://go-review.googlesource.com/c/go/+/659317
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-21 15:50:43 -07:00
limeidan 09d76e59d2 cmd/compile: set unalignedOK to make memcombine work properly on loong64
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
ValidTenASCIIChars            7.604n ± 0%   6.805n ± 0%  -10.51% (p=0.000 n=10)
Valid100KASCIIChars           37.41µ ± 0%   16.58µ ± 0%  -55.67% (p=0.000 n=10)
ValidTenJapaneseChars         60.84n ± 0%   58.62n ± 0%   -3.64% (p=0.000 n=10)
ValidLongMostlyASCII          113.5µ ± 0%   113.5µ ± 0%        ~ (p=0.303 n=10)
ValidLongJapanese             204.6µ ± 0%   206.8µ ± 0%   +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars      7.604n ± 0%   6.803n ± 0%  -10.53% (p=0.000 n=10)
ValidString100KASCIIChars     38.05µ ± 0%   17.14µ ± 0%  -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars   60.58n ± 0%   59.48n ± 0%   -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII    113.5µ ± 0%   113.4µ ± 0%   -0.10% (p=0.000 n=10)
ValidStringLongJapanese       205.9µ ± 0%   207.3µ ± 0%   +0.67% (p=0.000 n=10)
geomean                       3.324µ        2.756µ       -17.08%

Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-09 09:18:20 -07:00
Keith Randall af278bfb1f cmd/compile: add additional flag constant folding rules
Fixes #73200

Change-Id: I77518d37acd838acf79ed113194bac5e2c30897f
Reviewed-on: https://go-review.googlesource.com/c/go/+/663535
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-04-07 22:48:32 -07:00
Keith Randall 16dbd2be39 cmd/compile: be more conservative about arm64 insns that can take zero register
It's really only needed for stores and store-like instructions
(atomic exchange, compare-and-swap, ...).

Fixes #73180

Change-Id: I8ecd833a301355adf0fa4bff43250091640c6226
Reviewed-on: https://go-review.googlesource.com/c/go/+/663155
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-06 19:11:43 -07:00
Keith Randall 2d050e91a3 cmd/compile: allow pointer-containing elements in stack allocations
For variable-sized allocations.

Turns out that we already implement the correct escape semantics
for this case. Even when the result of the "make" does not escape,
everything assigned into it does.

Change-Id: Ia123c538d39f2f1e1581c24e4135a65af3821c5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/657937
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-04 15:04:34 -07:00
Keith Randall 7a427143b6 cmd/compile: stack allocate variable-sized makeslice
Instead of always allocating variable-sized "make" calls on the heap,
allocate a small, constant-sized array on the stack and use that array
as the backing store if it is big enough.

Requires the result of the "make" doesn't escape.

  if cap <= K {
      var arr [K]E
      slice = arr[:len:cap]
  } else {
      slice = makeslice(E, len, cap)
  }

Pretty conservatively for now, K = 32/sizeof(E). The slice header is
already 24 bytes, so wasting 32 bytes of stack if the requested size
is too big isn't that bad. Larger would waste more stack space but
maybe avoid more allocations.

This CL also requires the element type be pointer-free.  Maybe we
could relax that at some point, but it is hard. If the element type
has pointers we can get heap->stack pointers (in the case where the
requested size is too big and the slice is heap allocated).

Note that this only handles the case of makeslice called directly from
compiler-generated code. It does not handle slices built in the
runtime on behalf of the program (e.g. in growslice). Some of those
are currently handled by passing in a tmpBuf (e.g. concatstrings),
but we could probably do more.

Change-Id: I8378efad527cd00d25948a80b82a68d88fbd93a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/653856
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-04 10:36:58 -07:00
Alexander Musman 16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Joel Sing e6c2e12c63 cmd/compile/internal/ssa: optimise more branches with zero on riscv64
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:

   li      t2, 0
   bltu    t2, t1, 0x174b4

This is equivalent to checking if the bounds is not equal to zero. With
this change:

   bnez    t1, 0x174c0

This removes more than 500 instructions from the Go binary on riscv64.

Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 01:27:22 -07:00
Mark Freeman 6722c008c1 cmd/compile: rename some test packages in codegen
All other files here use the codegen package.

Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-27 13:54:37 -07:00
Joel Sing 6bf95d40bb test/codegen: add combined conversion and shift tests
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.

Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-03-25 06:53:49 -07:00
Austin Clements 5918101d67 testing: detect a stopped timer in B.Loop
Currently, if the user stops the timer in a B.Loop benchmark loop, the
benchmark will run until it hits the timeout and fails.

Fix this by detecting that the timer is stopped and failing the
benchmark right away. We avoid making the fast path more expensive for
this check by "poisoning" the B.Loop iteration counter when the timer
is stopped so that it falls back to the slow path, which can check the
timer.

This causes b to escape from B.Loop, which is totally harmless because
it was already definitely heap-allocated. But it causes the
test/inline_testingbloop.go errorcheck test to fail. I don't think the
escape messages actually mattered to that test, they just had to be
matched. To fix this, we drop the debug level to -m=1, since -m=2
prints a lot of extra information for escaping parameters that we
don't want to deal with, and change one error check to allow b to
escape.

Fixes #72971.

Change-Id: I7d4abbb1ec1e096685514536f91ba0d581cca6b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/659657
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-24 11:41:09 -07:00
Joel Sing b70244ff7a cmd/compile: intrinsify math/bits.Len on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │   clz.b.1   │               clz.b.2               │
                 │   sec/op    │   sec/op     vs base                │
LeadingZeros-4     28.89n ± 0%   12.08n ± 0%  -58.19% (p=0.000 n=10)
LeadingZeros8-4    18.79n ± 0%   14.76n ± 0%  -21.45% (p=0.000 n=10)
LeadingZeros16-4   25.27n ± 0%   14.76n ± 0%  -41.59% (p=0.000 n=10)
LeadingZeros32-4   25.12n ± 0%   12.08n ± 0%  -51.92% (p=0.000 n=10)
LeadingZeros64-4   25.89n ± 0%   12.08n ± 0%  -53.35% (p=0.000 n=10)
geomean            24.55n        13.09n       -46.70%

Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-21 18:21:44 -07:00
Keith Randall deb6790fcf cmd/compile: remove implicit deref from len(p) where p is ptr-to-array
func f() *[4]int { return nil }
_ = len(f())

should not panic. We evaluate f, but there isn't a dereference
according to the spec (just "arg is evaluated").

Update #72844

Change-Id: Ia32cefc1b7aa091cd1c13016e015842b4d12d5b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/658096
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-19 09:55:46 -07:00
Joel Sing 6fb7bdc96d cmd/compile: intrinsify math/bits.TrailingZeros on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                  │   ctz.b.1    │               ctz.b.2               │
                  │    sec/op    │   sec/op     vs base                │
TrailingZeros-4     25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros8-4     14.76n ± 0%   10.74n ± 0%  -27.24% (p=0.000 n=10)
TrailingZeros16-4    26.84n ± 0%   10.74n ± 0%  -59.99% (p=0.000 n=10)
TrailingZeros32-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros64-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
geomean              23.09n        9.035n       -60.88%

Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2025-03-15 19:07:53 -07:00
Joel Sing 21417518a9 cmd/compile: combine negation and word sign extension on riscv64
Use NEGW to produce a negated and sign extended word, rather than doing
the same via two instructions:

   neg     t0, t0
   sext.w  a0, t0

Becomes:

   negw    t0, t0

Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212
Reviewed-on: https://go-review.googlesource.com/c/go/+/652319
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-15 06:05:16 -07:00
Joel Sing 10d070668c cmd/compile/internal/ssa: remove double negation with addition on riscv64
On riscv64, subtraction from a constant is typically implemented as an
ADDI with the negative constant, followed by a negation. However this can
lead to multiple NEG/ADDI/NEG sequences that can be optimised out.

For example, runtime.(*_panic).nextDefer currently contains:

   lbu     t0, 0(t0)
   addi    t0, t0, -8
   neg     t0, t0
   addi    t0, t0, -7
   neg     t0, t0

Which is now optimised to:

   lbu     t0, 0(t0)
   addi    t0, t0, -1

Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80
Reviewed-on: https://go-review.googlesource.com/c/go/+/652318
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:04:28 -07:00
Joel Sing a8f2e63f2f test/codegen: add a test for negation and conversion to int32
Codify the current code generation used on riscv64 in this case.

Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/652317
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-15 06:02:57 -07:00
Joel Sing e1f9013a58 test/codegen: add riscv64 codegen for arithmetic tests
Codify the current riscv64 code generation for various subtract from
constant and addition/subtraction tests.

Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729
Reviewed-on: https://go-review.googlesource.com/c/go/+/652316
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:02:27 -07:00
Joel Sing c01fa0cc21 test/codegen: add riscv64/rva23u64 specifiers to existing tests
Tests that exist for riscv64/rva22u64 should also be applied to
riscv64/rva23u64.

Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842
Reviewed-on: https://go-review.googlesource.com/c/go/+/652315
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 05:58:43 -07:00
Joel Sing c1c7e5902f test/codegen: tighten the TrailingZeros64 test on 386
Make the TrailingZeros64 code generation check more specific for 386.
Just checking for BSFL will match both the generic 64 bit decomposition
and the custom 386 lowering.

Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656996
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-14 15:04:38 -07:00
Keith Randall a1ddbdd3ef cmd/compile: don't move nilCheck operations during tighten
Nil checks need to stay in their original blocks. They cannot
be moved to a following conditionally-executed block.

Fixes #72860

Change-Id: Ic2d66cdf030357d91f8a716a004152ba4c016f77
Reviewed-on: https://go-review.googlesource.com/c/go/+/657715
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-13 21:24:20 -07:00
Joel Sing af92bb594d test/codegen: remove plan9/amd64 specific array zeroing/copying tests
The compiler previously avoided the use of MOVUPS on plan9/amd64. This
was changed in CL 655875, however the codegen tests were not updated
and now fail (seemingly the full codegen tests do not run anywhere,
not even on the longtest builders).

Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/656997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
2025-03-13 05:19:13 -07:00
Xiaolin Zhao b143c98169 cmd/compile: simplify bounded shift on loong64
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros      1.1005n ± 0%   0.8425n ± 1%  -23.44% (p=0.000 n=10)
LeadingZeros8      1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.001 n=10)
LeadingZeros16     1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.000 n=10)
LeadingZeros32    0.9511n ± 0%   0.8050n ± 0%  -15.36% (p=0.000 n=10)
LeadingZeros64    1.1195n ± 0%   0.8423n ± 0%  -24.76% (p=0.000 n=10)
TrailingZeros     0.8086n ± 0%   0.8005n ± 0%   -1.00% (p=0.000 n=10)
TrailingZeros8     1.031n ± 1%    1.035n ± 1%        ~ (p=0.136 n=10)
TrailingZeros16   0.8114n ± 0%   0.8254n ± 1%   +1.73% (p=0.000 n=10)
TrailingZeros32   0.8090n ± 0%   0.8005n ± 0%   -1.05% (p=0.000 n=10)
TrailingZeros64   0.8089n ± 1%   0.8005n ± 0%   -1.04% (p=0.000 n=10)
OnesCount         0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount8        0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
OnesCount16       0.9344n ± 0%   1.2010n ± 0%  +28.53% (p=0.000 n=10)
OnesCount32       0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount64       1.2010n ± 0%   0.8671n ± 0%  -27.80% (p=0.000 n=10)
RotateLeft        0.8009n ± 0%   0.6671n ± 0%  -16.71% (p=0.000 n=10)
RotateLeft8        1.202n ± 0%    1.327n ± 0%  +10.40% (p=0.000 n=10)
RotateLeft16      0.8036n ± 0%   0.8218n ± 0%   +2.26% (p=0.000 n=10)
RotateLeft32      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
RotateLeft64      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
Reverse           0.4067n ± 1%   0.4122n ± 1%   +1.38% (p=0.001 n=10)
Reverse8          0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Reverse16         0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
Reverse32         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.001 n=10)
Reverse64         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.008 n=10)
ReverseBytes      0.4057n ± 1%   0.4133n ± 1%   +1.90% (p=0.000 n=10)
ReverseBytes16    0.8009n ± 0%   0.8004n ± 0%   -0.07% (p=0.000 n=10)
ReverseBytes32    0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
ReverseBytes64    0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Add64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add64multiple      1.832n ± 0%    1.828n ± 0%   -0.22% (p=0.001 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.602n ± 0%    1.601n ± 0%   -0.06% (p=0.000 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Sub64multiple      2.402n ± 0%    2.400n ± 0%   -0.10% (p=0.000 n=10)
Mul               0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul32             0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul64             0.8008n ± 0%   0.8004n ± 0%   -0.05% (p=0.000 n=10)
Div                9.083n ± 0%    7.638n ± 0%  -15.91% (p=0.000 n=10)
Div32              4.011n ± 0%    4.009n ± 0%   -0.05% (p=0.000 n=10)
Div64              9.711n ± 0%    8.204n ± 0%  -15.51% (p=0.000 n=10)
geomean            1.083n         1.078n        -0.40%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.341n ± 4%    1.331n ± 2%   -0.71% (p=0.008 n=10)
LeadingZeros8      1.781n ± 0%    1.766n ± 1%   -0.84% (p=0.011 n=10)
LeadingZeros16     1.782n ± 0%    1.767n ± 0%   -0.79% (p=0.001 n=10)
LeadingZeros32     1.341n ± 1%    1.333n ± 0%   -0.52% (p=0.001 n=10)
LeadingZeros64     1.338n ± 0%    1.333n ± 0%   -0.37% (p=0.008 n=10)
TrailingZeros     0.9025n ± 0%   0.8077n ± 0%  -10.50% (p=0.000 n=10)
TrailingZeros8     1.056n ± 0%    1.089n ± 1%   +3.17% (p=0.001 n=10)
TrailingZeros16    1.101n ± 0%    1.102n ± 0%   +0.09% (p=0.011 n=10)
TrailingZeros32   0.9024n ± 1%   0.8083n ± 0%  -10.43% (p=0.000 n=10)
TrailingZeros64   0.9028n ± 1%   0.8087n ± 0%  -10.43% (p=0.000 n=10)
OnesCount          1.482n ± 1%    1.302n ± 0%  -12.15% (p=0.000 n=10)
OnesCount8         1.206n ± 0%    1.207n ± 2%   +0.12% (p=0.000 n=10)
OnesCount16        1.534n ± 0%    1.402n ± 0%   -8.58% (p=0.000 n=10)
OnesCount32        1.531n ± 1%    1.302n ± 0%  -14.99% (p=0.000 n=10)
OnesCount64        1.302n ± 0%    1.538n ± 1%  +18.16% (p=0.000 n=10)
RotateLeft        0.8083n ± 0%   0.8087n ± 1%        ~ (p=0.579 n=10)
RotateLeft8        1.310n ± 0%    1.323n ± 0%   +0.95% (p=0.001 n=10)
RotateLeft16       1.149n ± 0%    1.165n ± 1%   +1.35% (p=0.001 n=10)
RotateLeft32      0.8093n ± 0%   0.8105n ± 0%        ~ (p=0.393 n=10)
RotateLeft64      0.8088n ± 0%   0.8090n ± 0%        ~ (p=0.739 n=10)
Reverse           0.5109n ± 0%   0.5172n ± 1%   +1.25% (p=0.000 n=10)
Reverse8          0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse16         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.002 n=10)
Reverse32         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse64         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes      0.5122n ± 2%   0.5182n ± 1%        ~ (p=0.060 n=10)
ReverseBytes16    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes32    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes64    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.001 n=10)
Add                1.201n ± 4%    1.202n ± 0%   +0.08% (p=0.028 n=10)
Add32              1.201n ± 0%    1.202n ± 2%   +0.08% (p=0.014 n=10)
Add64              1.201n ± 1%    1.202n ± 0%   +0.08% (p=0.025 n=10)
Add64multiple      1.902n ± 0%    1.913n ± 0%   +0.55% (p=0.004 n=10)
Sub                1.201n ± 0%    1.202n ± 3%   +0.08% (p=0.001 n=10)
Sub32              1.654n ± 0%    1.656n ± 1%        ~ (p=0.117 n=10)
Sub64              1.201n ± 0%    1.202n ± 0%   +0.08% (p=0.001 n=10)
Sub64multiple      2.180n ± 4%    2.159n ± 1%   -0.96% (p=0.006 n=10)
Mul               0.9345n ± 0%   0.9346n ± 0%   +0.01% (p=0.000 n=10)
Mul32              1.030n ± 0%    1.050n ± 1%   +1.94% (p=0.000 n=10)
Mul64             0.9345n ± 0%   0.9346n ± 1%   +0.01% (p=0.000 n=10)
Div                11.57n ± 1%    11.12n ± 0%   -3.85% (p=0.000 n=10)
Div32              4.337n ± 1%    4.341n ± 1%        ~ (p=0.286 n=10)
Div64              12.76n ± 0%    12.02n ± 3%   -5.80% (p=0.000 n=10)
geomean            1.252n         1.235n        -1.32%

Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-12 18:18:03 -07:00