Commit Graph

246 Commits

Author SHA1 Message Date
Matthew Dempsky 925d2fb36c cmd/compile: restore zero-copy string->[]byte optimization
This CL implements the remainder of the zero-copy string->[]byte
conversion optimization initially attempted in go.dev/cl/520395, but
fixes the tracking of mutations due to ODEREF/ODOTPTR assignments, and
adds more comprehensive tests that I should have included originally.

However, this CL also keeps it behind the -d=zerocopy flag. The next
CL will enable it by default (for easier rollback).

Updates #2205.

Change-Id: Ic330260099ead27fc00e2680a59c6ff23cb63c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/520599
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-08-18 11:58:37 +00:00
Matthew Dempsky 639f6f7e78 cmd/compile/internal/ssagen: fix race added in CL 510539
The ssagen pass runs concurrently, so it's not safe to mutate global
variables like this.

Instead, turn it into a constant and add an assertion that the
constant has the correct value.

Fixes #62095.

Change-Id: Ia7f07e33582564892d194153ac3d8759429fc9ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/520598
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
2023-08-17 21:51:00 +00:00
Matthew Dempsky b75ac7a8d5 Revert "cmd/compile: enable zero-copy string->[]byte conversions"
This reverts CL 520395.

Reason for revert: thanm@ pointed out failure cases.

Change-Id: I3fd60b73118be3652be2c08b77ab39e793b42110
Reviewed-on: https://go-review.googlesource.com/c/go/+/520596
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
2023-08-17 20:02:02 +00:00
Matthew Dempsky c8adb3004f cmd/compile: enable zero-copy string->[]byte conversions
This CL enables the latent support for string->[]byte conversions
added go.dev/cl/520259.

One catch is that we need to make sure []byte("") evaluates to a
non-nil slice, even if "" is (nil, 0). This CL addresses that by
adding a "ptr != nil" check for OSTR2BYTESTMP, unless the NonNil flag
is set.

The existing uses of OSTR2BYTESTMP (which aren't concerned about
[]byte("") evaluating to nil) are updated to set this flag.

Fixes #2205.

Change-Id: I35a9cb16c164cd86156b7560915aba5108d8b523
Reviewed-on: https://go-review.googlesource.com/c/go/+/520395
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-08-17 19:37:36 +00:00
Matthew Dempsky e453971005 cmd/compile/internal/ir: add typ parameter to NewNameAt
Start making progress towards constructing IR with proper types.

Change-Id: Iad32c1cf60f30ceb8e07c31c8871b115570ac3bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/520263
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-08-17 19:36:24 +00:00
Matthew Dempsky 36fc721419 cmd/compile/internal/ir: remove ODCLCONST and ODCLTYPE
These aren't constructed by the unified frontend.

Change-Id: Ied87baa9656920bd11055464bc605933ff448e21
Reviewed-on: https://go-review.googlesource.com/c/go/+/520264
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-17 16:36:37 +00:00
Russ Cox 390763aed8 cmd/compile, runtime: make room for rangefunc defers
This is subtle and the compiler and runtime be in sync.
It is easier to develop the rest of the changes (especially when using
toolstash save/restore) if this change is separated out and done first.

Preparation for proposal #61405. The actual logic in the
compiler will be guarded by a GOEXPERIMENT, but it is
easier not to have GOEXPERIMENT-specific data structures
in the runtime, so just make the field always.

Change-Id: I7ec7049b99ae98bf0db365d42966baeec56e3774
Reviewed-on: https://go-review.googlesource.com/c/go/+/510539
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-08-16 16:20:06 +00:00
Russ Cox 53895bf993 runtime/internal/math: add Add64
This makes the intrinsic available on 64-bit platforms,
since the runtime cannot import math/bits.

Change-Id: I5296cc6a97d1cb4756ab369d96dc9605df9f8247
Reviewed-on: https://go-review.googlesource.com/c/go/+/516861
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Bypass: Russ Cox <rsc@golang.org>
2023-08-16 16:03:04 +00:00
Keith Randall 727ebce6ce cmd/compile: separate out unsafe mark for end-of-block instructions
Even if a block is empty, we need to keep track of whether the
end-of-block instructions are preemptible.

This CL allows us to not mark the load+compare in instruction
sequences like

CMPL $0, runtime·writeBarrier(SB)
JEQ  ...

Before, we had to mark the CMPL as uninterruptible because there
was no way to mark just the JEQ. Now there is, so there is no need
to mark the CMPL itself.

Change-Id: I4c27c0dc211c03b14637d420899cd2c2cccf3493
Reviewed-on: https://go-review.googlesource.com/c/go/+/518539
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-08-11 20:25:13 +00:00
Matthew Dempsky bb5974e0cb runtime, cmd/compile: optimize open-coded defers
This CL optimizes open-coded defers in two ways:

1. It modifies local variable sorting to place all open-coded defer
closure slots in order, so that rather than requiring the metadata to
contain each offset individually, we just need a single offset to the
first slot.

2. Because the slots are in ascending order and can be directly
indexed, we can get rid of the count of how many defers are in the
frame. Instead, we just find the top set bit in the active defers
bitmask, and load the corresponding closure.

Change-Id: I6f912295a492211023a9efe12c94a14f449d86ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/516199
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-07 18:05:54 +00:00
Matthew Dempsky 0c2abb3233 runtime, cmd/compile: prune unused _defer fields
These fields are no longer needed since go.dev/cl/513837.

Change-Id: I980fc9db998c293e930094bbb87e8c8f1654e39c
Reviewed-on: https://go-review.googlesource.com/c/go/+/516198
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-08-07 16:05:16 +00:00
Keith Randall 319504ce43 cmd/compile: implement float min/max in hardware for amd64 and arm64
Update #59488

Change-Id: I89f5ea494cbcc887f6fae8560e57bcbd8749be86
Reviewed-on: https://go-review.googlesource.com/c/go/+/514596
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-08-01 20:03:31 +00:00
Junxian Zhu b1474672c6 cmd/compile: intrinsify Sub64 on mips64
This CL intrinsify Sub64 on mips64.

pkg: math/bits
                  _   sec/op    _   sec/op     vs base               _
Sub-4               2.849n _ 0%   1.948n _ 0%  -31.64% (p=0.000 n=8)
Sub32-4             3.447n _ 0%   3.446n _ 0%        ~ (p=0.982 n=8)
Sub64-4             2.815n _ 0%   1.948n _ 0%  -30.78% (p=0.000 n=8)
Sub64multiple-4     6.124n _ 0%   3.340n _ 0%  -45.46% (p=0.000 n=8)

Change-Id: Ibba91a4350e4a549ae0b60d8cafc4bca05034b84
Reviewed-on: https://go-review.googlesource.com/c/go/+/498497
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-07-31 03:59:48 +00:00
Junxian Zhu 5f8a2fdf09 cmd/compile: intrinsify Add64 on mips64
This CL intrinsify Add64 on mips64.

pkg: math/bits
                  _   sec/op    _   sec/op     vs base               _
Add64-4             2.783n _ 0%   1.950n _ 0%  -29.93% (p=0.000 n=8)
Add64multiple-4     5.713n _ 0%   3.063n _ 0%  -46.38% (p=0.000 n=8)

pkg: crypto/elliptic
                                     _    sec/op    _   sec/op     vs base               _
ScalarBaseMult/P256-4                   353.7_ _ 0%   282.7_ _ 0%  -20.09% (p=0.000 n=8)
ScalarBaseMult/P224-4                   330.5_ _ 0%   250.0_ _ 0%  -24.37% (p=0.000 n=8)
ScalarBaseMult/P384-4                  1228.8_ _ 0%   791.5_ _ 0%  -35.59% (p=0.000 n=8)
ScalarBaseMult/P521-4                  15.412m _ 0%   2.438m _ 0%  -84.18% (p=0.000 n=8)
ScalarMult/P256-4                      1189.4_ _ 0%   904.2_ _ 0%  -23.98% (p=0.000 n=8)
ScalarMult/P224-4                      1138.8_ _ 0%   813.8_ _ 0%  -28.54% (p=0.000 n=8)
ScalarMult/P384-4                       4.419m _ 0%   2.692m _ 0%  -39.08% (p=0.000 n=8)
ScalarMult/P521-4                      59.768m _ 0%   8.773m _ 0%  -85.32% (p=0.000 n=8)
MarshalUnmarshal/P256/Uncompressed-4    8.697_ _ 1%   7.923_ _ 1%   -8.91% (p=0.000 n=8)
MarshalUnmarshal/P256/Compressed-4     104.75_ _ 0%   66.29_ _ 0%  -36.72% (p=0.000 n=8)
MarshalUnmarshal/P224/Uncompressed-4    8.728_ _ 1%   7.823_ _ 1%  -10.37% (p=0.000 n=8)
MarshalUnmarshal/P224/Compressed-4     1035.7_ _ 0%   676.5_ _ 2%  -34.69% (p=0.000 n=8)
MarshalUnmarshal/P384/Uncompressed-4    15.32_ _ 1%   11.81_ _ 1%  -22.90% (p=0.000 n=8)
MarshalUnmarshal/P384/Compressed-4      399.8_ _ 0%   217.4_ _ 0%  -45.62% (p=0.000 n=8)
MarshalUnmarshal/P521/Uncompressed-4    96.79_ _ 0%   20.32_ _ 0%  -79.01% (p=0.000 n=8)
MarshalUnmarshal/P521/Compressed-4     6640.4_ _ 0%   790.8_ _ 0%  -88.09% (p=0.000 n=8)

Change-Id: I8a0960b9665720c1d3e57dce36386e74db37fefa
Reviewed-on: https://go-review.googlesource.com/c/go/+/498496
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
2023-07-31 03:58:42 +00:00
Matthew Dempsky 79d4defa75 cmd/compile/internal/ssagen: fix min/max codegen, again
The large-function phi placement algorithm evidently doesn't like the
same pseudo-variable being used to represent expressions of varying
types.

Instead, use the same tactic as used for "valVar" (ssa.go:6585--6587),
which is to just generate a fresh marker node each time.

Maybe we could just use the OMIN/OMAX nodes themselves as the key
(like we do for OANDAND/OOROR), but that just seems needlessly risky
for negligible memory savings. Using fresh marker values each time
seems obviously safe by comparison.

Fixes #61041.

Change-Id: Ie2600c9c37b599c2e26ae01f5f8a433025d7fd08
Reviewed-on: https://go-review.googlesource.com/c/go/+/506679
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-06-28 16:07:47 +00:00
Keith Randall a031f4ef83 cmd/compile: fix min/max builtin code generation
Our large-function phi placement algorithm is incompatible with phi
opcodes already existing in the SSA representation. Instead, use simple
variable assignments and have the phi placement algorithm place the phis
we need for min/max.

Turns out the small-function phi placement algorithm doesn't have this
sensitivity, so this bug only occurs in large functions (>500 basic blocks).

Maybe we should document/check that no phis are present when we start
phi placement (regardless of size).  Leaving for a potential separate CL.

We should probably also fix the placement algorithm to handle existing
phis correctly.  But this CL is probably a lot smaller/safer than
messing with phi placement.

Fixes #60982

Change-Id: I59ba7f506c72b22bc1485099a335d96315ebef67
Reviewed-on: https://go-review.googlesource.com/c/go/+/505756
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-06-24 05:24:25 +00:00
Matthew Dempsky b0f15b4ac0 cmd/compile: implement min/max builtins
Updates #59488.

Change-Id: I254da7cca071eeb5af2f8aecdcd9461703fe8677
Reviewed-on: https://go-review.googlesource.com/c/go/+/496257
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-05-23 18:15:22 +00:00
Junxian Zhu 75add1ce0e cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on MIPS64x
This CL intrinsify atomic{And,Or} on mips64x, which already implemented on mipsx.

goos: linux
goarch: mips64le
pkg: runtime/internal/atomic
                _  oldatomic  _             newatomic              _
                _   sec/op    _   sec/op     vs base               _
AtomicLoad64-4    27.96n _ 0%   28.02n _ 0%   +0.20% (p=0.026 n=8)
AtomicStore64-4   29.14n _ 0%   29.21n _ 0%   +0.22% (p=0.004 n=8)
AtomicLoad-4      27.96n _ 0%   28.02n _ 0%        ~ (p=0.220 n=8)
AtomicStore-4     29.15n _ 0%   29.21n _ 0%   +0.19% (p=0.002 n=8)
And8-4            53.09n _ 0%   41.71n _ 0%  -21.44% (p=0.000 n=8)
And-4             49.87n _ 0%   39.93n _ 0%  -19.93% (p=0.000 n=8)
And8Parallel-4    70.45n _ 0%   68.58n _ 0%   -2.65% (p=0.000 n=8)
AndParallel-4     70.40n _ 0%   67.95n _ 0%   -3.47% (p=0.000 n=8)
Or8-4             52.09n _ 0%   41.11n _ 0%  -21.08% (p=0.000 n=8)
Or-4              49.80n _ 0%   39.87n _ 0%  -19.93% (p=0.000 n=8)
Or8Parallel-4     70.43n _ 0%   68.25n _ 0%   -3.08% (p=0.000 n=8)
OrParallel-4      70.42n _ 0%   67.94n _ 0%   -3.51% (p=0.000 n=8)
Xadd-4            67.83n _ 0%   67.92n _ 0%   +0.13% (p=0.003 n=8)
Xadd64-4          67.85n _ 0%   67.92n _ 0%   +0.09% (p=0.021 n=8)
Cas-4             81.34n _ 0%   81.37n _ 0%        ~ (p=0.859 n=8)
Cas64-4           81.43n _ 0%   81.53n _ 0%   +0.13% (p=0.001 n=8)
Xchg-4            67.15n _ 0%   67.18n _ 0%        ~ (p=0.367 n=8)
Xchg64-4          67.16n _ 0%   67.21n _ 0%   +0.08% (p=0.008 n=8)
geomean           54.04n        51.01n        -5.61%

Change-Id: I9a4353f4b14134f1e9cf0dcf99db3feb951328ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/494875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-18 10:23:17 +00:00
cui fliter 57e3189821 all: fix a lot of comments
Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc.

Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/493035
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-05-10 12:59:20 +00:00
Russ Cox ffc4cc05f5 cmd/compile: standardize on outer-to-inner for pos lists
The call sites that cared all reversed inner-to-outer to outer-to-inner already.
The ones that didn't care left it alone. No one explicitly wanted inner-to-outer.
Also change to a callback-based interface, so that call sites aren't required
to accumulate the results in a slice (the main reason for that before was to
reverse the slice!).

There were three places where these lists were printed:

1. -d=ssa/genssa/dump, explicitly reversing to outer-to-inner
2. node dumps like -W, leaving the default inner-to-outer
3. file positions for HashDebugs, explicitly reversing to outer-to-inner

It makes no sense that (1) and (2) would differ. The reason they do is that
the code for (2) was too lazy to bother to fix it to be the right way.

Consider this program:

	package p

	func f() {
		g()
	}

	func g() {
		println()
	}

Both before and after this change, the ssa dump for f looks like:

	# x.go:3
	       	00000 (3) 	TEXT	<unlinkable>.f(SB), ABIInternal
	       	00001 (3) 	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	       	00002 (3) 	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	 v4    	00003 (-4) 	XCHGL	AX, AX
	# x.go:4
	# x.go:8
	 v5    	00004 (+8) 	PCDATA	$1, $0
	 v5    	00005 (+8) 	CALL	runtime.printlock(SB)
	 v7    	00006 (-8) 	CALL	runtime.printnl(SB)
	 v9    	00007 (-8) 	CALL	runtime.printunlock(SB)
	# x.go:5
	 b2    	00008 (5) 	RET
	       	00009 (?) 	END

Note # x.go:4 (f) then # x.go:8 (g, called from f) between v4 and v5.

The -W node dumps used the opposite order:

	before walk f
	.   AS2 Def tc(1) # x.go:4:3
	.   INLMARK # +x.go:4:3
	.   PRINTN tc(1) # x.go:8:9,x.go:4:3
	.   LABEL p..i0 # x.go:4:3

Now they match the ssa dump order, and they use spaces as separators,
to avoid potential problems with commas in some editors.

	before walk f
	.   AS2 Def tc(1) # x.go:4:3
	.   INLMARK # +x.go:4:3
	.   PRINTN tc(1) # x.go:4:3 x.go:8:9
	.   LABEL p..i0 # x.go:4:3

I'm unaware of any argument for the old order other than it was easier
to compute without allocation. The new code uses recursion to reverse
the order without allocation.

Now that the callers get the results outer-to-inner, most don't need
any slices at all.

This change is particularly important for HashDebug, which had been
using a locked temporary slice to walk the inline stack without allocation.
Now the temporary slice is gone.

Change-Id: I5cb6d76b2f950db67b248acc928e47a0460569f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/493735
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2023-05-09 16:07:01 +00:00
Junxian Zhu 5cad8d41ca math: optimize math.Abs on mipsx
This commit optimized math.Abs function implementation on mipsx.
Tested on loongson 3A2000.

goos: linux
goarch: mipsle
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   282.6n ± 0%   282.3n ± 0%        ~ (p=0.140 n=7)
Acosh-4                  506.1n ± 0%   451.8n ± 0%  -10.73% (p=0.001 n=7)
Asin-4                   272.3n ± 0%   272.2n ± 0%        ~ (p=0.808 n=7)
Asinh-4                  529.7n ± 0%   475.3n ± 0%  -10.27% (p=0.001 n=7)
Atan-4                   208.2n ± 0%   207.9n ± 0%        ~ (p=0.134 n=7)
Atanh-4                  503.4n ± 1%   449.7n ± 0%  -10.67% (p=0.001 n=7)
Atan2-4                  310.5n ± 0%   310.5n ± 0%        ~ (p=0.928 n=7)
Cbrt-4                   359.3n ± 0%   358.8n ± 0%        ~ (p=0.121 n=7)
Ceil-4                   203.9n ± 0%   204.0n ± 0%        ~ (p=0.600 n=7)
Compare-4                23.11n ± 0%   23.11n ± 0%        ~ (p=0.702 n=7)
Compare32-4              19.09n ± 0%   19.12n ± 0%        ~ (p=0.070 n=7)
Copysign-4               33.20n ± 0%   34.02n ± 0%   +2.47% (p=0.001 n=7)
Cos-4                    422.5n ± 0%   385.4n ± 1%   -8.78% (p=0.001 n=7)
Cosh-4                   628.0n ± 0%   545.5n ± 0%  -13.14% (p=0.001 n=7)
Erf-4                    193.7n ± 2%   192.7n ± 1%        ~ (p=0.430 n=7)
Erfc-4                   192.8n ± 1%   193.0n ± 0%        ~ (p=0.245 n=7)
Erfinv-4                 220.7n ± 1%   221.5n ± 2%        ~ (p=0.272 n=7)
Erfcinv-4                221.3n ± 1%   220.4n ± 2%        ~ (p=0.738 n=7)
Exp-4                    471.4n ± 0%   435.1n ± 0%   -7.70% (p=0.001 n=7)
ExpGo-4                  470.6n ± 0%   434.0n ± 0%   -7.78% (p=0.001 n=7)
Expm1-4                  243.1n ± 0%   243.4n ± 0%        ~ (p=0.417 n=7)
Exp2-4                   463.1n ± 0%   427.0n ± 0%   -7.80% (p=0.001 n=7)
Exp2Go-4                 462.4n ± 0%   426.2n ± 5%   -7.83% (p=0.001 n=7)
Abs-4                   37.000n ± 0%   8.039n ± 9%  -78.27% (p=0.001 n=7)
Dim-4                    18.09n ± 0%   18.11n ± 0%        ~ (p=0.094 n=7)
Floor-4                  151.9n ± 0%   151.8n ± 0%        ~ (p=0.190 n=7)
Max-4                    116.7n ± 1%   116.7n ± 1%        ~ (p=0.842 n=7)
Min-4                    116.6n ± 1%   116.6n ± 0%        ~ (p=0.464 n=7)
Mod-4                   1244.0n ± 0%   980.9n ± 0%  -21.15% (p=0.001 n=7)
Frexp-4                  199.0n ± 0%   146.7n ± 0%  -26.28% (p=0.001 n=7)
Gamma-4                  516.4n ± 0%   479.3n ± 1%   -7.18% (p=0.001 n=7)
Hypot-4                  169.8n ± 0%   117.8n ± 2%  -30.62% (p=0.001 n=7)
HypotGo-4                170.8n ± 0%   117.5n ± 0%  -31.21% (p=0.001 n=7)
Ilogb-4                  160.8n ± 0%   109.5n ± 0%  -31.90% (p=0.001 n=7)
J0-4                     1.359µ ± 0%   1.305µ ± 0%   -3.97% (p=0.001 n=7)
J1-4                     1.386µ ± 0%   1.334µ ± 0%   -3.75% (p=0.001 n=7)
Jn-4                     2.864µ ± 0%   2.758µ ± 0%   -3.70% (p=0.001 n=7)
Ldexp-4                  202.9n ± 0%   151.7n ± 0%  -25.23% (p=0.001 n=7)
Lgamma-4                 234.0n ± 0%   234.3n ± 0%        ~ (p=0.199 n=7)
Log-4                    444.1n ± 0%   407.9n ± 0%   -8.15% (p=0.001 n=7)
Logb-4                   157.8n ± 0%   121.6n ± 0%  -22.94% (p=0.001 n=7)
Log1p-4                  354.8n ± 0%   315.4n ± 0%  -11.10% (p=0.001 n=7)
Log10-4                  453.9n ± 0%   417.9n ± 0%   -7.93% (p=0.001 n=7)
Log2-4                   245.3n ± 0%   209.1n ± 0%  -14.76% (p=0.001 n=7)
Modf-4                   126.6n ± 0%   126.6n ± 0%        ~ (p=0.126 n=7)
Nextafter32-4            112.5n ± 0%   112.5n ± 0%        ~ (p=0.853 n=7)
Nextafter64-4            141.7n ± 0%   141.6n ± 0%        ~ (p=0.331 n=7)
PowInt-4                 878.8n ± 1%   758.3n ± 1%  -13.71% (p=0.001 n=7)
PowFrac-4                1.809µ ± 0%   1.615µ ± 0%  -10.72% (p=0.001 n=7)
Pow10Pos-4               18.10n ± 0%   18.12n ± 0%        ~ (p=0.464 n=7)
Pow10Neg-4               17.09n ± 0%   17.09n ± 0%        ~ (p=0.263 n=7)
Round-4                  68.36n ± 0%   68.33n ± 0%        ~ (p=0.325 n=7)
RoundToEven-4            78.40n ± 0%   78.40n ± 0%        ~ (p=0.934 n=7)
Remainder-4              894.0n ± 1%   753.4n ± 1%  -15.73% (p=0.001 n=7)
Signbit-4                18.09n ± 0%   18.09n ± 0%        ~ (p=0.761 n=7)
Sin-4                    389.8n ± 1%   389.8n ± 0%        ~ (p=0.995 n=7)
Sincos-4                 416.0n ± 0%   415.9n ± 0%        ~ (p=0.361 n=7)
Sinh-4                   634.6n ± 4%   585.6n ± 1%   -7.72% (p=0.001 n=7)
SqrtIndirect-4           8.035n ± 0%   8.036n ± 0%        ~ (p=0.523 n=7)
SqrtLatency-4            8.039n ± 0%   8.037n ± 0%        ~ (p=0.218 n=7)
SqrtIndirectLatency-4    8.040n ± 0%   8.040n ± 0%        ~ (p=0.652 n=7)
SqrtGoLatency-4          895.7n ± 0%   896.6n ± 0%   +0.10% (p=0.004 n=7)
SqrtPrime-4              5.406µ ± 0%   5.407µ ± 0%        ~ (p=0.592 n=7)
Tan-4                    406.1n ± 0%   405.8n ± 1%        ~ (p=0.435 n=7)
Tanh-4                   627.6n ± 0%   545.5n ± 0%  -13.08% (p=0.001 n=7)
Trunc-4                  146.7n ± 1%   146.7n ± 0%        ~ (p=0.755 n=7)
Y0-4                     1.359µ ± 0%   1.310µ ± 0%   -3.61% (p=0.001 n=7)
Y1-4                     1.351µ ± 0%   1.301µ ± 0%   -3.70% (p=0.001 n=7)
Yn-4                     2.829µ ± 0%   2.729µ ± 0%   -3.53% (p=0.001 n=7)
Float64bits-4            14.08n ± 0%   14.07n ± 0%        ~ (p=0.069 n=7)
Float64frombits-4        19.09n ± 0%   19.10n ± 0%        ~ (p=0.755 n=7)
Float32bits-4            13.06n ± 0%   13.07n ± 1%        ~ (p=0.586 n=7)
Float32frombits-4        13.06n ± 0%   13.06n ± 0%        ~ (p=0.853 n=7)
FMA-4                    606.9n ± 0%   606.8n ± 0%        ~ (p=0.393 n=7)
geomean                  201.1n        185.4n        -7.81%

Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664
Reviewed-on: https://go-review.googlesource.com/c/go/+/484675
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
2023-05-08 15:53:28 +00:00
Junxian Zhu 574431cfcd math: optimize math.Abs on mips64x
This commit optimized math.Abs function implementation on mips64x.
Tested on loongson 3A2000.

goos: linux
goarch: mips64le
pkg: math
                      │    oldmath    │               newmath               │
                      │    sec/op     │    sec/op     vs base               │
Acos-4                   258.0n ± ∞ ¹   257.1n ± ∞ ¹   -0.35% (p=0.008 n=5)
Acosh-4                  417.0n ± ∞ ¹   377.9n ± ∞ ¹   -9.38% (p=0.008 n=5)
Asin-4                   248.0n ± ∞ ¹   259.9n ± ∞ ¹   +4.80% (p=0.008 n=5)
Asinh-4                  439.6n ± ∞ ¹   408.3n ± ∞ ¹   -7.12% (p=0.008 n=5)
Atan-4                   189.6n ± ∞ ¹   188.8n ± ∞ ¹        ~ (p=0.056 n=5)
Atanh-4                  390.0n ± ∞ ¹   356.4n ± ∞ ¹   -8.62% (p=0.008 n=5)
Atan2-4                  279.0n ± ∞ ¹   263.9n ± ∞ ¹   -5.41% (p=0.008 n=5)
Cbrt-4                   314.2n ± ∞ ¹   322.3n ± ∞ ¹   +2.58% (p=0.008 n=5)
Ceil-4                   139.7n ± ∞ ¹   136.6n ± ∞ ¹   -2.22% (p=0.008 n=5)
Compare-4                21.11n ± ∞ ¹   21.09n ± ∞ ¹        ~ (p=0.405 n=5)
Compare32-4              20.10n ± ∞ ¹   20.12n ± ∞ ¹        ~ (p=0.206 n=5)
Copysign-4               32.17n ± ∞ ¹   35.71n ± ∞ ¹  +11.00% (p=0.008 n=5)
Cos-4                    222.8n ± ∞ ¹   169.8n ± ∞ ¹  -23.79% (p=0.008 n=5)
Cosh-4                   550.2n ± ∞ ¹   477.4n ± ∞ ¹  -13.23% (p=0.008 n=5)
Erf-4                    171.6n ± ∞ ¹   174.5n ± ∞ ¹        ~ (p=0.635 n=5)
Erfc-4                   182.6n ± ∞ ¹   170.2n ± ∞ ¹   -6.79% (p=0.008 n=5)
Erfinv-4                 177.6n ± ∞ ¹   196.6n ± ∞ ¹  +10.70% (p=0.008 n=5)
Erfcinv-4                177.8n ± ∞ ¹   197.8n ± ∞ ¹  +11.25% (p=0.008 n=5)
Exp-4                    422.8n ± ∞ ¹   382.1n ± ∞ ¹   -9.63% (p=0.008 n=5)
ExpGo-4                  416.1n ± ∞ ¹   383.2n ± ∞ ¹   -7.91% (p=0.008 n=5)
Expm1-4                  232.9n ± ∞ ¹   252.2n ± ∞ ¹   +8.29% (p=0.008 n=5)
Exp2-4                   404.8n ± ∞ ¹   389.1n ± ∞ ¹   -3.88% (p=0.008 n=5)
Exp2Go-4                 407.0n ± ∞ ¹   372.3n ± ∞ ¹   -8.53% (p=0.008 n=5)
Abs-4                   30.120n ± ∞ ¹   3.014n ± ∞ ¹  -89.99% (p=0.008 n=5)
Dim-4                    5.021n ± ∞ ¹   5.023n ± ∞ ¹        ~ (p=0.071 n=5)
Floor-4                  127.8n ± ∞ ¹   127.1n ± ∞ ¹   -0.55% (p=0.008 n=5)
Max-4                    77.69n ± ∞ ¹   76.33n ± ∞ ¹   -1.75% (p=0.008 n=5)
Min-4                    83.27n ± ∞ ¹   77.87n ± ∞ ¹   -6.48% (p=0.008 n=5)
Mod-4                    906.2n ± ∞ ¹   692.9n ± ∞ ¹  -23.54% (p=0.008 n=5)
Frexp-4                  150.6n ± ∞ ¹   108.6n ± ∞ ¹  -27.89% (p=0.008 n=5)
Gamma-4                  418.4n ± ∞ ¹   386.1n ± ∞ ¹   -7.72% (p=0.008 n=5)
Hypot-4                 148.20n ± ∞ ¹   93.78n ± ∞ ¹  -36.72% (p=0.008 n=5)
HypotGo-4               148.20n ± ∞ ¹   94.47n ± ∞ ¹  -36.26% (p=0.008 n=5)
Ilogb-4                 135.50n ± ∞ ¹   92.38n ± ∞ ¹  -31.82% (p=0.008 n=5)
J0-4                     937.7n ± ∞ ¹   861.7n ± ∞ ¹   -8.10% (p=0.008 n=5)
J1-4                     915.4n ± ∞ ¹   875.9n ± ∞ ¹   -4.32% (p=0.008 n=5)
Jn-4                     1.974µ ± ∞ ¹   1.863µ ± ∞ ¹   -5.62% (p=0.008 n=5)
Ldexp-4                  158.5n ± ∞ ¹   129.3n ± ∞ ¹  -18.42% (p=0.008 n=5)
Lgamma-4                 209.0n ± ∞ ¹   211.8n ± ∞ ¹        ~ (p=0.095 n=5)
Log-4                    326.4n ± ∞ ¹   295.2n ± ∞ ¹   -9.56% (p=0.008 n=5)
Logb-4                   147.7n ± ∞ ¹   105.0n ± ∞ ¹  -28.91% (p=0.008 n=5)
Log1p-4                  303.4n ± ∞ ¹   266.3n ± ∞ ¹  -12.23% (p=0.008 n=5)
Log10-4                  329.2n ± ∞ ¹   298.3n ± ∞ ¹   -9.39% (p=0.008 n=5)
Log2-4                   187.4n ± ∞ ¹   153.0n ± ∞ ¹  -18.36% (p=0.008 n=5)
Modf-4                   110.5n ± ∞ ¹   103.5n ± ∞ ¹   -6.33% (p=0.008 n=5)
Nextafter32-4            128.4n ± ∞ ¹   121.5n ± ∞ ¹   -5.37% (p=0.016 n=5)
Nextafter64-4            109.5n ± ∞ ¹   110.5n ± ∞ ¹   +0.91% (p=0.008 n=5)
PowInt-4                 603.3n ± ∞ ¹   516.4n ± ∞ ¹  -14.40% (p=0.008 n=5)
PowFrac-4                1.365µ ± ∞ ¹   1.183µ ± ∞ ¹  -13.33% (p=0.008 n=5)
Pow10Pos-4               15.07n ± ∞ ¹   15.07n ± ∞ ¹        ~ (p=0.738 n=5)
Pow10Neg-4               21.11n ± ∞ ¹   21.10n ± ∞ ¹        ~ (p=0.190 n=5)
Round-4                  44.23n ± ∞ ¹   44.22n ± ∞ ¹        ~ (p=0.635 n=5)
RoundToEven-4            50.25n ± ∞ ¹   46.27n ± ∞ ¹   -7.92% (p=0.008 n=5)
Remainder-4              675.6n ± ∞ ¹   530.4n ± ∞ ¹  -21.49% (p=0.008 n=5)
Signbit-4                17.07n ± ∞ ¹   17.95n ± ∞ ¹   +5.16% (p=0.008 n=5)
Sin-4                    171.6n ± ∞ ¹   189.1n ± ∞ ¹  +10.20% (p=0.008 n=5)
Sincos-4                 201.5n ± ∞ ¹   200.5n ± ∞ ¹        ~ (p=0.421 n=5)
Sinh-4                   529.6n ± ∞ ¹   484.6n ± ∞ ¹   -8.50% (p=0.008 n=5)
SqrtIndirect-4           5.021n ± ∞ ¹   5.023n ± ∞ ¹   +0.04% (p=0.048 n=5)
SqrtLatency-4            8.032n ± ∞ ¹   8.039n ± ∞ ¹   +0.09% (p=0.024 n=5)
SqrtIndirectLatency-4    8.036n ± ∞ ¹   8.038n ± ∞ ¹        ~ (p=0.056 n=5)
SqrtGoLatency-4          338.8n ± ∞ ¹   338.7n ± ∞ ¹        ~ (p=0.841 n=5)
SqrtPrime-4              5.379µ ± ∞ ¹   5.382µ ± ∞ ¹   +0.06% (p=0.048 n=5)
Tan-4                    182.7n ± ∞ ¹   191.8n ± ∞ ¹   +4.98% (p=0.008 n=5)
Tanh-4                   558.7n ± ∞ ¹   497.6n ± ∞ ¹  -10.94% (p=0.008 n=5)
Trunc-4                  122.5n ± ∞ ¹   122.6n ± ∞ ¹        ~ (p=0.405 n=5)
Y0-4                     892.8n ± ∞ ¹   851.7n ± ∞ ¹   -4.60% (p=0.008 n=5)
Y1-4                     887.2n ± ∞ ¹   863.2n ± ∞ ¹   -2.71% (p=0.008 n=5)
Yn-4                     1.889µ ± ∞ ¹   1.832µ ± ∞ ¹   -3.02% (p=0.008 n=5)
Float64bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.040 n=5)
Float64frombits-4        13.05n ± ∞ ¹   13.06n ± ∞ ¹        ~ (p=0.143 n=5)
Float32bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.008 n=5)
Float32frombits-4        13.05n ± ∞ ¹   13.08n ± ∞ ¹   +0.23% (p=0.016 n=5)
FMA-4                    445.7n ± ∞ ¹   448.1n ± ∞ ¹   +0.54% (p=0.008 n=5)
geomean                  157.2n         142.8n         -9.17%

Change-Id: I9bf104848b588c9ecf79401a81d483d7fcdb0a79
Reviewed-on: https://go-review.googlesource.com/c/go/+/481575
Reviewed-by: M Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
2023-05-05 14:54:39 +00:00
Austin Clements 7843ca83e7 internal/abi, runtime, cmd: merge PCDATA_* and FUNCDATA_* consts into internal/abi
We also rename the constants related to unsafe-points: currently, they
follow the same naming scheme as the PCDATA table indexes, but are not
PCDATA table indexes.

For #59670.

Change-Id: I06529fecfae535be5fe7d9ac56c886b9106c74fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/485497
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-04-21 19:28:49 +00:00
Michael Pratt 598cf5e6ac cmd/compile: expose ir.Func to ssa
ssagen.ssafn already holds the ir.Func, and ssa.Frontend.SetWBPos and
ssa.Frontend.Lsym are simple wrappers around parts of the ir.Func.

Expose the ir.Func through ssa.Frontend, allowing us to remove these
wrapper methods and allowing future access to additional features of the
ir.Func if needed.

While we're here, drop ssa.Frontend.Line, which is unused.

For #58298.

Change-Id: I30c4cbd2743e9ad991d8c6b388484a7d1e95f3ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/484215
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-20 21:51:46 +00:00
Keith Randall 0d9eb8bea2 cmd/compile: casts from slices to array pointers are known to be non-nil
The cast is proceeded by a bounds check. If the bounds check passes
then we know the pointer in the slice is non-nil.

... except casts to pointers of 0-sized arrays. They are strange, as
the bounds check can pass for a nil input.

Change-Id: Ic01cf4a82d59fbe3071d4b271c94efca9cafaec1
Reviewed-on: https://go-review.googlesource.com/c/go/+/479335
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-03-29 21:55:11 +00:00
Keith Randall 61bc17f04e cmd/compile: don't assume pointer of a slice is non-nil
unsafe.SliceData can return pointers which are nil. That function gets
lowered to the SSA OpSlicePtr, which the compiler assumes is non-nil.
This used to be the case as OpSlicePtr was only used in situations
where the bounds check already passed. But with unsafe.SliceData that
is no longer the case.

There are situations where we know it is nil. Use Bounded() to
indicate that.

I looked through all the uses of OSPTR and added SetBounded where it
made sense. Most OSPTR results are passed directly to runtime calls
(e.g. memmove), so even if we know they are non-nil that info isn't
helpful.

Fixes #59293

Change-Id: I437a15330db48e0082acfb1f89caf8c56723fc51
Reviewed-on: https://go-review.googlesource.com/c/go/+/479896
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-03-28 19:55:43 +00:00
Wayne Zuo cedfcba3e8 cmd/compile: instrinsify TrailingZeros{8,32,64} for 386
This CL add support for instrinsifying the TrialingZeros{8,32,64}
functions for 386 architecture. We need handle the case when the input
is 0, which could lead to undefined output from the BSFL instruction.

Next CL will remove the assembly code in runtime/internal/sys package.

Change-Id: Ic168edf68e81bf69a536102100fdd3f56f0f4a1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/475735
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-14 08:10:32 +00:00
Andy Pan 24017148ca cmd/compile: clarify a few redundant deletions of internal/ssagen.state.vars
Fixes #58729

The reason why these deletions exist is that the old state.variable method
will assign the new value to the given key of map when the key doesn't exist,
but after this commit: 5a6e511c61 (diff-e754f9fc8eaf878714250cfc03844eb3b58185ac806a8c1c4f9fbabd86cda921L3972)
the state.variable doesn't do that anymore, thus these deletions became redundant.

Change-Id: Ie6e2471ca445f907a2bb1607c293f9301f0d73e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/471355
Run-TryBot: Andy Pan <panjf2000@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-03-09 19:25:20 +00:00
David Chase 5a7793b7b8 cmd/compile: add flag to FOR/RANGE to preserve loop semantics across inlines
This modifies the loopvar change to be tied to the
package if it is specified that way, and preserves
the change across inlining.

Down the road, this will be triggered (and flow correctly)
if the changed semantics are tied to Go version specified
in go.mod (or rather, for the compiler, by the specified
version for compilation).

Includes tests.

Change-Id: If54e8b6dd23273b86be5ba47838c90d38af9bd1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/463595
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-03-06 18:34:53 +00:00
Keith Randall cbb9cd03f8 cmd/compile: ensure FuncForPC works on closures that start with NOPs
A 0-sized no-op shouldn't prevent us from detecting that the first
instruction is from an inlined callee.

Update #58300

Change-Id: Ic5f6ed108c54a32c05e9b2264b516f2cc17e4619
Reviewed-on: https://go-review.googlesource.com/c/go/+/467977
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-03 16:35:00 +00:00
Cuong Manh Le ab86d29bb5 cmd/compile: update documentation for ONAME node with nil Func
After CL 436435 chain, the only case left where we create an ONAME node
with nil Func is interface method from imported package.

Change-Id: I9d9144916d01712283f2b116973f88965715fea3
Reviewed-on: https://go-review.googlesource.com/c/go/+/468816
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-02-28 18:50:39 +00:00
Keith Randall 21d82e6ac8 cmd/compile: batch write barrier calls
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.

Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24 00:21:13 +00:00
Keith Randall d49719b1f7 cmd/compile: move raw writes out of write barrier code
Previously, the write barrier calls themselves did the actual
writes to memory. Instead, move those writes out to a common location
that both the wb-enabled and wb-disabled code paths share.

This enables us to optimize the write barrier path without having
to worry about performing the actual writes.

Change-Id: Ia71ab651908ec124cc33141afb52e4ca19733ac6
Reviewed-on: https://go-review.googlesource.com/c/go/+/447780
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-17 22:21:22 +00:00
Keith Randall 55044288ad runtime: reimplement GODEBUG=cgocheck=2 as a GOEXPERIMENT
Move this knob from a binary-startup thing to a build-time thing.
This will enable followon optmizations to the write barrier.

Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018
Reviewed-on: https://go-review.googlesource.com/c/go/+/447778
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-16 00:16:24 +00:00
Keith Randall 44d22e75dd cmd/compile: detect write barrier completion differently
Instead of keeping track of in which blocks write barriers complete,
introduce a new op that marks the exact memory state where the
write barrier completes.

For future use. This allows us to move some of the write barrier code
to between the start of the merging block and the WBend marker.

Change-Id: If3809b260292667d91bf0ee18d7b4d0eb1e929f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/447777
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-02-16 00:16:13 +00:00
Wayne Zuo d7ac5d1480 cmd/compile: intrinsify math/bits/ReverseBytes{32|64} for 386
The BSWAPL instruction is supported in i486 and newer.
https://github.com/golang/go/wiki/MinimumRequirements#386 says we
support "All Pentium MMX or later". The Pentium is also referred to as
i586, so that we are safe with these instructions.

Change-Id: I6dea1f9d864a45bb07c8f8f35a81cfe16cca216c
Reviewed-on: https://go-review.googlesource.com/c/go/+/465515
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-02-08 03:43:23 +00:00
Keith Randall 103f37497f cmd/compile: ensure first instruction in a function is not inlined
People are using this to get the name of the function from a function type:

runtime.FuncForPC(reflect.ValueOf(fn).Pointer()).Name()

Unfortunately, this technique falls down when the first instruction
of the function is from an inlined callee. Then the expression above
gets you the name of the inlined function instead of the function itself.

To fix this, ensure that the first instruction is never from an inlinee.
Normally functions have prologs so those are already fine. In just the
cases where a function is a leaf with no local variables, and an instruction
from an inlinee appears first in the prog list, add a nop at the start
of the function to hold a non-inlined position.

Consider the nop a "mini-prolog" for leaf functions.

Fixes #58300

Change-Id: Ie37092f4ac3167fe8e5ef4a2207b14abc1786897
Reviewed-on: https://go-review.googlesource.com/c/go/+/465076
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-02-06 20:39:54 +00:00
Archana R cd1fc87156 cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.

Performance improvement seen on Power10:
ReverseBytes32  1.38ns ± 0%  1.18ns ± 0%  -14.2
ReverseBytes64  1.52ns ± 0%  1.11ns ± 0%  -26.87
ReverseBytes16  1.41ns ± 1%  1.18ns ± 0%  -16.47

Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-02-03 19:01:06 +00:00
Matthew Dempsky f0b1563535 cmd/compile/internal/types: remove unneeded functionality
This CL removes a handful of features that were only needed for the
pre-unified frontends.

In particular, Type.Pkg was a hack for iexport so that
go/types.Var.Pkg could be precisely populated for struct fields and
signature parameters by gcimporter, but it's no longer necessary with
the unified export data format because we now write export data
directly from types2-supplied type descriptors.

Several other features (e.g., OrigType, implicit interfaces, type
parameters on signatures) are no longer relevant to the unified
frontend, because it only uses types1 to represent instantiated
generic types.

Updates #57410.

Change-Id: I84fd1da5e0b65d2ab91d244a7bb593821ee916e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/458622
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-01-26 21:56:49 +00:00
Keith Randall 12befc3ce3 cmd/compile: improve scheduling pass
Convert the scheduling pass from scheduling backwards to scheduling forwards.

Forward scheduling makes it easier to prioritize scheduling values as
soon as they are ready, which is important for things like nil checks,
select ops, etc.

Forward scheduling is also quite a bit clearer. It was originally
backwards because computing uses is tricky, but I found a way to do it
simply and with n lg n complexity. The new scheme also makes it easy
to add new scheduling edges if needed.

Fixes #42673
Update #56568

Change-Id: Ibbb38c52d191f50ce7a94f8c1cbd3cd9b614ea8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/270940
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-01-20 04:54:01 +00:00
Keith Randall 45dc81d856 cmd/compile: add memory argument to GetCallerSP
We need to make sure that when we get the stack pointer, we get it
at the right time.

V = GetCallerSP
Call()
W = GetCallerSP

If Call causes a stack growth, then we will be in a situation
where V != W. So it matters when GetCallerSP operations get scheduled.
Add a memory argument to GetCallerSP so it can't be reordered with
things like calls.

Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184
Reviewed-on: https://go-review.googlesource.com/c/go/+/453515
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-19 22:43:22 +00:00
cui fliter 3a7a528c2d all: fix some comments for method
Change-Id: I4cff6b2a1fed6acdf754539c3c53a61eaa3b3f84
Reviewed-on: https://go-review.googlesource.com/c/go/+/450176
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-12-03 17:08:51 +00:00
Russ Cox a8510f92e6 cmd/compile: fix error message wording
CL 450136 fixed a different copy of this error but missed this one.

With the compiler fix from CL 451555 rolled back to produce the error,
this is the text before this CL:

	b.go:9:15: internal compiler error: 'init': Value live at entry. It shouldn't be. func init, node a.i, value nil

And this CL changes it to:

	b.go:9:15: internal compiler error: 'init': value a.i (nil) incorrectly live at entry

matching the same change in the earlier CL.

Change-Id: I33e6b91477e1a213a6918c3ebdea81273be7d235
Reviewed-on: https://go-review.googlesource.com/c/go/+/452816
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-23 19:34:12 +00:00
cui fliter b2faff18ce all: add missing periods in comments
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-11-18 17:59:44 +00:00
Garet Halliday 50557edf10 runtime: add wasm bulk memory operations
The existing implementation uses loops to implement bulk memory
operations such as memcpy and memclr. Now that bulk memory operations
have been standardized and are implemented in all major browsers and
engines (see https://webassembly.org/roadmap/), we should use them
to improve performance.

Updates #28360

Change-Id: I28df0e0350287d5e7e1d1c09a4064ea1054e7575
Reviewed-on: https://go-review.googlesource.com/c/go/+/444935
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Richard Musiol <neelance@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Richard Musiol <neelance@gmail.com>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2022-10-27 10:37:01 +00:00
Youlin Feng 7ae652b7c0 runtime: replace all uses of CtzXX with TrailingZerosXX
Replace all uses of Ctz64/32/8 with TrailingZeros64/32/8, because they
are the same and maybe duplicated. Also renamed CtzXX functions in 386
assembly code.

Change-Id: I19290204858083750f4be589bb0923393950ae6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/438935
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2022-10-18 18:06:27 +00:00
Wayne Zuo 90a3527427 cmd/compile: intrinsify Sub64 on loong64
This is a follow up of CL 420095  on loong64.

file                                    before    after     Δ       %
compile/internal/ssa.a                  35649482  35653274  +3792   +0.011%
compile/internal/ssagen.a               4099858   4098728   -1130   -0.028%
ecdh.a                                  227896    226896    -1000   -0.439%
internal/nistec/fiat.a                  1212254   1128184   -84070  -6.935%
tls.a                                   3256800   3256802   +2      +0.000%
big.a                                   1708518   1702496   -6022   -0.352%
bits.a                                  106762    105734    -1028   -0.963%
math.a                                  578762    577288    -1474   -0.255%
netip.a                                 555922    555610    -312    -0.056%
net.a                                   3286528   3286530   +2      +0.000%
golang.org/x/crypto/internal/poly1305.a 109546    107686    -1860   -1.698%
total                                   260392768 260299668 -93100  -0.036%

Change-Id: Ieffca705aae5666501f284502d986ca179dde494
Reviewed-on: https://go-review.googlesource.com/c/go/+/428557
Reviewed-by: Carlos Amedee <carlos@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
2022-10-07 18:16:26 +00:00
Wayne Zuo 97760ed651 cmd/compile: intrinsify Add64 on loong64
This is a follow up of CL 420094  on loong64.

Reduce go toolchain size slightly on linux/loong64.

compilecmp HEAD~1 -> HEAD
HEAD~1 (8a32354219): internal/trace: use strings.Builder
HEAD (1767784ac3): cmd/compile: intrinsify Add64 on loong64
platform: linux/loong64

file      before    after     Δ       %
addr2line 3882616   3882536   -80     -0.002%
api       5528866   5528450   -416    -0.008%
asm       5133780   5133796   +16     +0.000%
cgo       4668787   4668491   -296    -0.006%
compile   25163409  25164729  +1320   +0.005%
cover     4658055   4658007   -48     -0.001%
dist      3437783   3437727   -56     -0.002%
doc       3883069   3883205   +136    +0.004%
fix       3383254   3383070   -184    -0.005%
link      6747559   6747023   -536    -0.008%
nm        3793923   3793939   +16     +0.000%
objdump   4256628   4256812   +184    +0.004%
pack      2356328   2356144   -184    -0.008%
pprof     14233370  14131910  -101460 -0.713%
test2json 2638668   2638476   -192    -0.007%
trace     13392065  13360781  -31284  -0.234%
vet       7456388   7455588   -800    -0.011%
total     132498256 132364392 -133864 -0.101%

file                                    before    after     Δ       %
compile/internal/ssa.a                  35644590  35649482  +4892   +0.014%
compile/internal/ssagen.a               4101250   4099858   -1392   -0.034%
internal/edwards25519/field.a           226064    201718    -24346  -10.770%
internal/nistec/fiat.a                  1689922   1212254   -477668 -28.266%
tls.a                                   3256798   3256800   +2      +0.000%
big.a                                   1718552   1708518   -10034  -0.584%
bits.a                                  107786    106762    -1024   -0.950%
cmplx.a                                 169434    168214    -1220   -0.720%
math.a                                  581302    578762    -2540   -0.437%
netip.a                                 556096    555922    -174    -0.031%
net.a                                   3286526   3286528   +2      +0.000%
runtime.a                               8644786   8644510   -276    -0.003%
strconv.a                               519098    518374    -724    -0.139%
golang.org/x/crypto/internal/poly1305.a 115398    109546    -5852   -5.071%
total                                   260913122 260392768 -520354 -0.199%

Change-Id: I75b2bb7761fa5a0d0d032d4ebe3582d092ea77be
Reviewed-on: https://go-review.googlesource.com/c/go/+/428556
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-07 18:16:10 +00:00
Keith Randall 6485e8f503 cmd/compile: use stricter rule for possible partial overlap
Partial overlaps can only happen for strict sub-pieces of larger arrays.
That's a much stronger condition than the current optimization rules.

Update #54467

Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/433056
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2022-09-27 20:09:33 +00:00
Emmanuel T Odeke 4baa486983 all: remove unnecessary allocations from w.WriteString(fmt.Sprint*(...)) by fmt.Fprint*(w, ...)
Noticed in a manual audit from a customer codebase that the pattern

    w.WriteString(fmt.Sprint*(args...))

was less efficient and in most cases we can just invoke:

    fmt.Fprint*(w, args...)

and from the simple benchmarks we can see quick wins in all dimensions:

$ benchstat before.txt after.txt
name            old time/op    new time/op    delta
DetailString-8    5.48µs ±23%    4.40µs ±11%  -19.79%  (p=0.000 n=20+17)

name            old alloc/op   new alloc/op   delta
DetailString-8    2.63kB ± 0%    2.11kB ± 0%  -19.76%  (p=0.000 n=20+20)

name            old allocs/op  new allocs/op  delta
DetailString-8      63.0 ± 0%      50.0 ± 0%  -20.63%  (p=0.000 n=20+20)

Change-Id: I47a2827cd34d6b92644900b1bd5f4c0a3287bdb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/429861
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: hopehook <hopehook@golangcn.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2022-09-14 16:11:21 +00:00