Commit Graph

161 Commits

Author SHA1 Message Date
Ian Lance Taylor 41298239cf all: remove coverageredesign experiment
The coverageredesign experiment was turned on by default by
CL 436236 in September, 2022. We've documented it and people
are using it. This CL removes the ability to turn off the experiment.
This removes some old code that is no longer being executed.

For #51430

Change-Id: I88d4998c8b5ea98eef8145d7ca6ebd96f64fbc2b
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-darwin-amd64-longtest,gotip-linux-arm64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/644997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Than McIntosh <thanm@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
2025-02-03 12:10:28 -08:00
Michael Anthony Knyszek 579eb79f62 all: skip and fix various tests with -asan and -msan
First, skip all the allocation count tests.

In some cases this aligns with existing skips for -race, but in others
we've got new issues. These are debug modes, so some performance loss is
expected, and this is clearly no worse than today where the tests fail.

Next, skip internal linking and static linking tests for msan and asan.

With asan we get an explicit failure that neither are supported by the C
and/or Go compilers. With msan, we only get the Go compiler telling us
internal linking is unavailable. With static linking, we segfault
instead. Filed #70080 to track that.

Next, skip some malloc tests with asan that don't quite work because of
the redzone.

This is because of some sizeclass assumptions that get broken with the
redzone and the fact that the tiny allocator is effectively disabled
(again, due to the redzone).

Next, skip some runtime/pprof tests with asan, because of extra
allocations.

Next, skip some malloc tests with asan that also fail because of extra
allocations.

Next, fix up memstats accounting for arenas when asan is enabled. There
is a bug where more is added to the stats than subtracted. This also
simplifies the accounting a little.

Next, skip race tests with msan or asan enabled; they're mutually
incompatible.

Fixes #70054.
Fixes #64256.
Fixes #64257.
For #70079.
For #70080.

Change-Id: I99c02a0b9d621e44f1f918b307aa4a4944c3ec60
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15
Reviewed-on: https://go-review.googlesource.com/c/go/+/622855
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2024-10-28 21:04:51 +00:00
Michael Anthony Knyszek a1c4fb4361 runtime: specialize heapSetType
Last CL we separated mallocgc into several specialized paths. Let's
split up heapSetType too. This will make the specialized heapSetType
functions inlineable and cut out some branches as well as a function
call.

Microbenchmark results at this point in the stack:

                   │ before.out  │            after-5.out             │
                   │   sec/op    │   sec/op     vs base               │
Malloc8-4            13.52n ± 3%   12.15n ± 2%  -10.13% (p=0.002 n=6)
Malloc16-4           21.49n ± 2%   18.32n ± 4%  -14.75% (p=0.002 n=6)
MallocTypeInfo8-4    27.12n ± 1%   18.64n ± 2%  -31.30% (p=0.002 n=6)
MallocTypeInfo16-4   28.71n ± 3%   21.63n ± 5%  -24.65% (p=0.002 n=6)
geomean              21.81n        17.31n       -20.64%

Change-Id: I5de9ac5089b9eb49bf563af2a74e6dc564420e05
Reviewed-on: https://go-review.googlesource.com/c/go/+/614795
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-10-21 15:56:28 +00:00
Michael Pratt c39bc22c14 all: wire up swisstable maps
Use the new SwissTable-based map in internal/runtime/maps as the basis
for the runtime map when GOEXPERIMENT=swissmap.

Integration is complete enough to pass all.bash. Notable missing
features:

* Race integration / concurrent write detection
* Stack-allocated maps
* Specialized "fast" map variants
* Indirect key / elem

For #54766.

Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap
Change-Id: Ie97b656b6d8e05c0403311ae08fef9f51756a639
Reviewed-on: https://go-review.googlesource.com/c/go/+/594596
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-14 19:58:47 +00:00
Paschalis Tsilias fe69121bc5 cmd/compile: optimize []byte(string1 + string2)
This CL optimizes the compilation of string-to-bytes conversion in the
case of string additions.

Fixes #62407

Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/527935
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Tim King <taking@google.com>
2024-09-10 21:20:57 +00:00
Alan Donovan cd9a300afc all: fix printf(var) mistakes detected by latest printf checker
These will cause build failures once we vendor x/tools.

In once case I renamed a function err to errf to indicate
that it is printf-like.

Updates golang/go#68796

Change-Id: I04d57b34ee5362f530554b7e8b817f70a9088d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/610739
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
2024-09-04 18:16:59 +00:00
Michael Pratt 0a4215c234 cmd/compile: keep internal/runtime packages sorted
This is a minor cleanup from CL 600436.

For #65355.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-darwin-amd64-longtest
Change-Id: I8e27f0c6ba6bd35f4aa2b9d53c394fb5f1eb433d
Reviewed-on: https://go-review.googlesource.com/c/go/+/595116
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-07-31 22:11:37 +00:00
David Chase fc5073bc15 runtime,internal: move runtime/internal/sys to internal/runtime/sys
Cleanup and friction reduction

For #65355.

Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23 19:05:35 +00:00
David Chase f9eb3e3cd5 runtime,internal: move runtime/internal/math to internal/runtime/math
Cleanup and friction reduction.

Updates #65355.

Change-Id: I6c4fcd409d044c00d16561fe9ed2257877d73f5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/600435
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23 19:05:16 +00:00
Than McIntosh e01b1eb289 cmd/compile/internal: stack slot merging region formation enhancements
This patch revises the algorithm/strategy used for overlapping the
stack slots of disjointly accessed local variables. The main change
here is to allow merging the stack slot of B into the slot for A if
B's size is less then A (prior to this they had to be identical), and
to also allow merging a non-pointer variables into pointer-variable
slots.

The new algorithm sorts the candidate list first by pointerness
(pointer variables first), then by alignment, then by size, and
finally by name. We no longer check that two variables have the same
GC shape before merging: since it should never be the case that we
have two vars X and Y both live across a given callsite where X and Y
share a stack slot, their gc shape doesn't matter.

Doing things this new way increases the total number of bytes saved
(across all functions) from 91256 to 124336 for the sweet benchmarks.

Updates #62737.
Updates #65532.
Updates #65495.

Change-Id: I1daaac1b1240aa47a6975e98ccd24e03304ab602
Reviewed-on: https://go-review.googlesource.com/c/go/+/577615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-18 15:43:53 +00:00
Than McIntosh 82e929e4f8 cmd/compile/internal/liveness: enhance mergelocals for addr-taken candidates
It is possible to have situations where a given ir.Name is
non-address-taken at the source level, but whose address is
materialized in order to accommodate the needs of arch-dependent
memory ops. The issue here is that the SymAddr op will show up as
touching a variable of interest, but the subsequent memory op will
not. This is generally not an issue for computing whether something is
live across a call, but it is problematic for collecting the more
fine-grained live interval info that drives stack slot merging.

As an example, consider this Go code:

    package p
    type T struct {
	    x [10]int
	    f float64
    }
    func ABC(i, j int) int {
	    var t T
	    t.x[i&3] = j
	    return t.x[j&3]
    }

On amd64 the code sequences we'll see for accesses to "t" might look like

    v10 = VarDef <mem> {t} v1
    v5 = MOVOstoreconst <mem> {t} [val=0,off=0] v2 v10
    v23 = LEAQ <*T> {t} [8] v2 : DI
    v12 = DUFFZERO <mem> [80] v23 v5
    v14 = ANDQconst <int> [3] v7 : AX
    v19 = MOVQstoreidx8 <mem> {t} v2 v14 v8 v12
    v22 = ANDQconst <int> [3] v8 : BX
    v24 = MOVQloadidx8 <int> {t} v2 v22 v19 : AX
    v25 = MakeResult <int,mem> v24 v19 : <>

Note that the the loads and stores (ex: v19, v24) all refer directly
to "t", which means that regular live analysis will work fine for
identifying variable lifetimes. The DUFFZERO is (in effect) an
indirect write, but since there are accesses immediately after it we
wind up with the same live intervals.

Now the same code with GOARCH=ppc64:

    v10 = VarDef <mem> {t} v1
    v20 = MOVDaddr <*T> {t} v2 : R20
    v12 = LoweredZero <mem> [88] v20 v10
     v3 = CLRLSLDI <int> [212543] v7 : R5
    v15 = MOVDaddr <*T> {t} v2 : R6
    v19 = MOVDstoreidx <mem> v15 v3 v8 v12
    v29 = CLRLSLDI <int> [212543] v8 : R4
    v24 = MOVDloadidx <int> v15 v29 v19 : R3
    v25 = MakeResult <int,mem> v24 v19 : <>

Here instead of memory ops that refer directly to the symbol, we take
the address of "t" (ex: v15) and then pass the address to memory ops
(where the ops themselves no longer refer to the symbol).

This patch enhances the stack slot merging liveness analysis to handle
cases like the PPC64 one above. We add a new phase in candidate
selection that collects more precise use information for merge
candidates, and screens out candidates that are too difficult to
analyze. The phase make a forward pass over each basic block looking
for instructions of the form vK := SymAddr(N) where N is a raw
candidate. It then creates an entry in a map with key vK and value
holding name and the vK use count. As the walk continues, we check for
uses of of vK: when we see one, record it in a side table as an
upwards exposed use of N. At each vK use we also decrement the use
count in the map entry, and if we hit zero, remove the map entry. If
we hit the end of the basic block and we still have map entries, this
implies that the address in question "escapes" the block -- at that
point to be conservative we just evict the name in question from the
candidate set.

Although this CL fixes the issues that forced a revert of the original
merging CL, this CL doesn't enable stack slot merging by default; a
subsequent CL will do that.

Updates #62737.
Updates #65532.
Updates #65495.

Change-Id: Id41d359a677767a8e7ac1e962ae23f7becb4031f
Reviewed-on: https://go-review.googlesource.com/c/go/+/576735
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-09 17:42:19 +00:00
Than McIntosh 875332b8a2 cmd/compile/internal: merge stack slots for selected local auto vars
[This is a partial roll-forward of CL 553055, the main change here
is that the stack slot overlap operation is flagged off by default
(can be enabled by hand with -gcflags=-d=mergelocals=1) ]

Preliminary compiler support for merging/overlapping stack slots of
local variables whose access patterns are disjoint.

This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.

After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.

This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.

Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".

Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".

Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps

Updates #62737.
Updates #65532.
Updates #65495.

Change-Id: Ifda26bc48cde5667de245c8a9671b3f0a30bb45d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-09 16:41:23 +00:00
Michael Anthony Knyszek 9f3f4c64db runtime: remove the allocheaders GOEXPERIMENT
This change removes the allocheaders, deleting all the old code and
merging mbitmap_allocheaders.go back into mbitmap.go.

This change also deletes the SetType benchmarks which were already
broken in the new GOEXPERIMENT (it's harder to set up than before). We
weren't really watching these benchmarks at all, and they don't provide
additional test coverage.

Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d
Reviewed-on: https://go-review.googlesource.com/c/go/+/567200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-09 04:07:57 +00:00
Cuong Manh Le 0bf6071066 Revert "cmd/compile/internal: merge stack slots for selected local auto vars"
This reverts CL 553055.

Reason for revert: causes crypto/ecdsa failures on linux ppc64/s390x builders

Change-Id: I9266b030693a5b6b1e667a009de89d613755b048
Reviewed-on: https://go-review.googlesource.com/c/go/+/575236
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-30 00:11:00 +00:00
Than McIntosh 89f7805c2e cmd/compile/internal: merge stack slots for selected local auto vars
Preliminary compiler support for merging/overlapping stack
slots of local variables whose access patterns are disjoint.

This patch includes changes in AllocFrame to do the actual
merging/overlapping based on information returned from a new
liveness.MergeLocals helper. The MergeLocals helper identifies
candidates by looking for sets of AUTO variables that either A) have
the same size and GC shape (if types contain pointers), or B) have the
same size (but potentially different types as long as those types have
no pointers). Variables must be greater than (3*types.PtrSize) in size
to be considered for merging.

After forming candidates, MergeLocals collects variables into "can be
overlapped" equivalence classes or partitions; this process is driven
by an additional liveness analysis pass. Ideally it would be nice to
move the existing stackmap liveness pass up before AllocFrame
and "widen" it to include merge candidates so that we can do just a
single liveness as opposed to two passes, however this may be difficult
given that the merge-locals liveness has to take into account
writes corresponding to dead stores.

This patch also required a change to the way ssa.OpVarDef pseudo-ops
are generated; prior to this point they would only be created for
variables whose type included pointers; if stack slot merging is
enabled then the ssagen code creates OpVarDef ops for all auto vars
that are merge candidates.

Note that some temporaries created late in the compilation process
(e.g. during ssa backend) are difficult to reason about, especially in
cases where we take the address of a temp and pass it to the runtime.
For the time being we mark most of the vars created post-ssagen as
"not a merge candidate".

Stack slot merging for locals/autos is enabled by default if "-N" is
not in effect, and can be disabled via "-gcflags=-d=mergelocals=0".

Fixmes/todos/restrictions:
- try lowering size restrictions
- re-evaluate the various skips that happen in SSA-created autotmps

Fixes #62737.
Updates #65532.
Updates #65495.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: Ibc22e8a76c87e47bc9fafe4959804d9ea923623d
Reviewed-on: https://go-review.googlesource.com/c/go/+/553055
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-29 23:09:29 +00:00
Michael Pratt 63deaf00ea cmd/compile,cmd/preprofile: move logic to shared common package
The processing performed in cmd/preprofile is a simple version of the
same initial processing performed by cmd/compile/internal/pgo. Refactor
this processing into the new IR-independent cmd/internal/pgo package.

Now cmd/preprofile and cmd/compile run the same code for initial
processing of a pprof profile, guaranteeing that they always stay in
sync.

Since it is now trivial, this CL makes one change to the serialization
format: the entries are ordered by weight. This allows us to avoid
sorting ByWeight on deserialization.

Impact on PGO parsing when compiling cmd/compile with PGO:

* Without preprocessing: PGO parsing ~13.7% of CPU time
* With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%)
* With preprocessing (sorted): ~1.3% of CPU time

The remaining 1.3% of CPU time approximately breaks down as:

* ~0.5% parsing the preprocessed profile
* ~0.7% building weighted IR call graph
  * ~0.5% walking function IR to find direct calls
  * ~0.2% performing lookups for indirect calls targets

For #58102.

Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/569337
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-03-27 20:20:01 +00:00
cui fliter 1e12eab870 all: fix a large number of comments
Partial typo corrections, following https://go.dev/wiki/Spelling

Change-Id: I2357906ff2ea04305c6357418e4e9556e20375d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/573776
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-03-26 19:58:28 +00:00
Andy Pan 4c2b1e0feb runtime: migrate internal/atomic to internal/runtime
For #65355

Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-25 19:53:03 +00:00
Jin Lin 185f31bf30 cmd/compile: update the incorrect assignment of call site offset.
The call site calculation in the previous version is incorrect. For
the PGO preprocess file, the compiler should directly use the call
site offset value. Additionly, this change refactors the preprocess
tool to clean up unused fields including startline, the flat and the
cum.

Change-Id: I7bffed3215d4c016d9a9e4034bfd373bf50ab43f
Reviewed-on: https://go-review.googlesource.com/c/go/+/562795
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-02-15 21:30:35 +00:00
Michael Pratt 532c6f1c8d cmd/compile: fail noder.LookupFunc gracefully if function generic
PGO uses noder.LookupFunc to look for devirtualization targets in
export data.  LookupFunc does not support type-parameterized
functions, and will currently fail the build when attempting to lookup
a type-parameterized function because objIdx is passed the wrong
number of type arguments.

This doesn't usually come up, as a PGO profile will report a generic
function with a symbol name like Func[.go.shape.foo]. In export data,
this is just Func, so when we do LookupFunc("Func[.go.shape.foo]")
lookup simply fails because the name doesn't exist.

However, if Func is not generic when the profile is collected, but the
source has since changed to make Func generic, then LookupFunc("Func")
will find the object successfully, only to fail the build because we
failed to provide type arguments.

Handle this with a objIdxMayFail, which allows graceful failure if the
object requires type arguments.

Bumping the language version to 1.21 in pgo_devirtualize_test.go is
required for type inference of the uses of mult.MultFn in
cmd/compile/internal/test/testdata/pgo/devirtualize/devirt_test.go.

Fixes #65615.

Change-Id: I84d9344840b851182f5321b8f7a29a591221b29f
Reviewed-on: https://go-review.googlesource.com/c/go/+/562737
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-02-09 20:03:18 +00:00
Jin Lin e069587c8d cmd/preprofile: Implement a tool to preprocess the PGO profile.
It fixes the issue https://github.com/golang/go/issues/65220.
It also includes  https://go.dev/cl/557458 from Michael.

Change-Id: Ic6109e1b6a9045459ff4a54dea11cbfe732b01e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/557918
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-01-31 22:40:25 +00:00
Jes Cok 7e91cfb877 all: fix typos
Found by codespell.

Change-Id: I2ebefbbbc8eacf9bc83051407519fa8fd5e4495f
GitHub-Last-Rev: 9cc550f3db
GitHub-Pull-Request: golang/go#65346
Reviewed-on: https://go-review.googlesource.com/c/go/+/559095
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-01-30 20:09:45 +00:00
Michael Pratt a807295d43 Revert "cmd/preprofile: Add preprocess tool to pre-parse the profile file."
This reverts CL 529738.

Reason for revert: Breaking longtest builders

For #58102.
Fixes #65220.

Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def
Reviewed-on: https://go-review.googlesource.com/c/go/+/557460
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-22 23:07:48 +00:00
Jin Lin b7ae16e04b cmd/preprofile: Add preprocess tool to pre-parse the profile file.
The pgo compilation time is very long if the profile file is large.
We added a preprocess tool to pre-parse profile file in order to
expedite the compile time.

Change-Id: I6f50bbd01f242448e2463607a9b63483c6ca9a12
Reviewed-on: https://go-review.googlesource.com/c/go/+/529738
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-01-22 21:47:07 +00:00
Danil Timerbulatov 527829a7cb all: remove newline characters after return statements
This commit is aimed at improving the readability and consistency
of the code base. Extraneous newline characters were present after
some return statements, creating unnecessary separation in the code.

Fixes #64610

Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/548316
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-12-14 17:22:18 +00:00
Russ Cox c29444ef39 math/rand, math/rand/v2: use ChaCha8 for global rand
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).

The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.

The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.

Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.

(Wyrand is still reasonable for scheduling and cache decisions.)

Good randomness does have a cost: about twice wyrand.

Also rationalize the various runtime rand references.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ bbb48afeb7.amd64 │           5cf807d1ea.amd64           │
                        │      sec/op      │    sec/op     vs base                │
ChaCha8-32                     1.862n ± 2%    1.861n ± 2%        ~ (p=0.825 n=20)
PCG_DXSM-32                    1.471n ± 1%    1.460n ± 2%        ~ (p=0.153 n=20)
SourceUint64-32                1.636n ± 2%    1.582n ± 1%   -3.30% (p=0.000 n=20)
GlobalInt64-32                 2.087n ± 1%    3.663n ± 1%  +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1042n ± 1%   0.2026n ± 1%  +94.48% (p=0.000 n=20)
GlobalUint64-32                2.263n ± 2%    3.724n ± 1%  +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1019n ± 1%   0.1973n ± 1%  +93.67% (p=0.000 n=20)
Int64-32                       1.771n ± 1%    1.774n ± 1%        ~ (p=0.449 n=20)
Uint64-32                      1.863n ± 2%    1.866n ± 1%        ~ (p=0.364 n=20)
GlobalIntN1000-32              3.134n ± 3%    4.730n ± 2%  +50.95% (p=0.000 n=20)
IntN1000-32                    2.489n ± 1%    2.489n ± 1%        ~ (p=0.683 n=20)
Int64N1000-32                  2.521n ± 1%    2.516n ± 1%        ~ (p=0.394 n=20)
Int64N1e8-32                   2.479n ± 1%    2.478n ± 2%        ~ (p=0.743 n=20)
Int64N1e9-32                   2.530n ± 2%    2.514n ± 2%        ~ (p=0.193 n=20)
Int64N2e9-32                   2.501n ± 1%    2.494n ± 1%        ~ (p=0.616 n=20)
Int64N1e18-32                  3.227n ± 1%    3.205n ± 1%        ~ (p=0.101 n=20)
Int64N2e18-32                  3.647n ± 1%    3.599n ± 1%        ~ (p=0.019 n=20)
Int64N4e18-32                  5.135n ± 1%    5.069n ± 2%        ~ (p=0.034 n=20)
Int32N1000-32                  2.657n ± 1%    2.637n ± 1%        ~ (p=0.180 n=20)
Int32N1e8-32                   2.636n ± 1%    2.636n ± 1%        ~ (p=0.763 n=20)
Int32N1e9-32                   2.660n ± 2%    2.638n ± 1%        ~ (p=0.358 n=20)
Int32N2e9-32                   2.662n ± 2%    2.618n ± 2%        ~ (p=0.064 n=20)
Float32-32                     2.272n ± 2%    2.239n ± 2%        ~ (p=0.194 n=20)
Float64-32                     2.272n ± 1%    2.286n ± 2%        ~ (p=0.763 n=20)
ExpFloat64-32                  3.762n ± 1%    3.744n ± 1%        ~ (p=0.171 n=20)
NormFloat64-32                 3.706n ± 1%    3.655n ± 2%        ~ (p=0.066 n=20)
Perm3-32                       32.93n ± 3%    34.62n ± 1%   +5.13% (p=0.000 n=20)
Perm30-32                      202.9n ± 1%    204.0n ± 1%        ~ (p=0.482 n=20)
Perm30ViaShuffle-32            115.0n ± 1%    114.9n ± 1%        ~ (p=0.358 n=20)
ShuffleOverhead-32             112.8n ± 1%    112.7n ± 1%        ~ (p=0.692 n=20)
Concurrent-32                  2.107n ± 0%    3.725n ± 1%  +76.75% (p=0.000 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ bbb48afeb7.arm64 │           5cf807d1ea.arm64            │
                       │      sec/op      │    sec/op     vs base                 │
ChaCha8-8                     2.480n ± 0%    2.429n ± 0%    -2.04% (p=0.000 n=20)
PCG_DXSM-8                    2.531n ± 0%    2.530n ± 0%         ~ (p=0.877 n=20)
SourceUint64-8                2.534n ± 0%    2.533n ± 0%         ~ (p=0.732 n=20)
GlobalInt64-8                 2.172n ± 1%    4.794n ± 0%  +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8        0.4320n ± 0%   0.9605n ± 0%  +122.32% (p=0.000 n=20)
GlobalUint64-8                2.182n ± 0%    4.770n ± 0%  +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8       0.4307n ± 0%   0.9583n ± 0%  +122.51% (p=0.000 n=20)
Int64-8                       4.107n ± 0%    4.104n ± 0%         ~ (p=0.416 n=20)
Uint64-8                      4.080n ± 0%    4.080n ± 0%         ~ (p=0.052 n=20)
GlobalIntN1000-8              2.814n ± 2%    5.643n ± 0%  +100.50% (p=0.000 n=20)
IntN1000-8                    4.141n ± 0%    4.139n ± 0%         ~ (p=0.140 n=20)
Int64N1000-8                  4.140n ± 0%    4.140n ± 0%         ~ (p=0.313 n=20)
Int64N1e8-8                   4.140n ± 0%    4.139n ± 0%         ~ (p=0.103 n=20)
Int64N1e9-8                   4.139n ± 0%    4.140n ± 0%         ~ (p=0.761 n=20)
Int64N2e9-8                   4.140n ± 0%    4.140n ± 0%         ~ (p=0.636 n=20)
Int64N1e18-8                  5.266n ± 0%    5.326n ± 1%    +1.14% (p=0.001 n=20)
Int64N2e18-8                  6.052n ± 0%    6.167n ± 0%    +1.90% (p=0.000 n=20)
Int64N4e18-8                  8.826n ± 0%    9.051n ± 0%    +2.55% (p=0.000 n=20)
Int32N1000-8                  4.127n ± 0%    4.132n ± 0%    +0.12% (p=0.000 n=20)
Int32N1e8-8                   4.126n ± 0%    4.131n ± 0%    +0.12% (p=0.000 n=20)
Int32N1e9-8                   4.127n ± 0%    4.132n ± 0%    +0.12% (p=0.000 n=20)
Int32N2e9-8                   4.132n ± 0%    4.131n ± 0%         ~ (p=0.017 n=20)
Float32-8                     4.109n ± 0%    4.105n ± 0%         ~ (p=0.379 n=20)
Float64-8                     4.107n ± 0%    4.106n ± 0%         ~ (p=0.867 n=20)
ExpFloat64-8                  5.339n ± 0%    5.383n ± 0%    +0.82% (p=0.000 n=20)
NormFloat64-8                 5.735n ± 0%    5.737n ± 1%         ~ (p=0.856 n=20)
Perm3-8                       26.65n ± 0%    26.80n ± 1%    +0.58% (p=0.000 n=20)
Perm30-8                      194.8n ± 1%    197.0n ± 0%    +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8            156.6n ± 0%    157.6n ± 1%    +0.61% (p=0.000 n=20)
ShuffleOverhead-8             124.9n ± 0%    125.5n ± 0%    +0.52% (p=0.000 n=20)
Concurrent-8                  2.434n ± 3%    5.066n ± 0%  +108.09% (p=0.000 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ bbb48afeb7.386 │            5cf807d1ea.386             │
                        │     sec/op     │    sec/op     vs base                 │
ChaCha8-32                  11.295n ± 1%    4.748n ± 2%   -57.96% (p=0.000 n=20)
PCG_DXSM-32                  7.693n ± 1%    7.738n ± 2%         ~ (p=0.542 n=20)
SourceUint64-32              7.658n ± 2%    7.622n ± 2%         ~ (p=0.344 n=20)
GlobalInt64-32               3.473n ± 2%    7.526n ± 2%  +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32      0.3198n ± 0%   0.5444n ± 0%   +70.22% (p=0.000 n=20)
GlobalUint64-32              3.612n ± 0%    7.575n ± 1%  +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3168n ± 0%   0.5403n ± 0%   +70.51% (p=0.000 n=20)
Int64-32                     7.673n ± 2%    7.789n ± 1%         ~ (p=0.122 n=20)
Uint64-32                    7.773n ± 1%    7.827n ± 2%         ~ (p=0.920 n=20)
GlobalIntN1000-32            6.268n ± 1%    9.581n ± 1%   +52.87% (p=0.000 n=20)
IntN1000-32                  10.33n ± 2%    10.45n ± 1%         ~ (p=0.233 n=20)
Int64N1000-32                10.98n ± 2%    11.01n ± 1%         ~ (p=0.401 n=20)
Int64N1e8-32                 11.19n ± 2%    10.97n ± 1%         ~ (p=0.033 n=20)
Int64N1e9-32                 11.06n ± 1%    11.08n ± 1%         ~ (p=0.498 n=20)
Int64N2e9-32                 11.10n ± 1%    11.01n ± 2%         ~ (p=0.995 n=20)
Int64N1e18-32                15.23n ± 2%    15.04n ± 1%         ~ (p=0.973 n=20)
Int64N2e18-32                15.89n ± 1%    15.85n ± 1%         ~ (p=0.409 n=20)
Int64N4e18-32                18.96n ± 2%    19.34n ± 2%         ~ (p=0.048 n=20)
Int32N1000-32                10.46n ± 2%    10.44n ± 2%         ~ (p=0.480 n=20)
Int32N1e8-32                 10.46n ± 2%    10.49n ± 2%         ~ (p=0.951 n=20)
Int32N1e9-32                 10.28n ± 2%    10.26n ± 1%         ~ (p=0.431 n=20)
Int32N2e9-32                 10.50n ± 2%    10.44n ± 2%         ~ (p=0.249 n=20)
Float32-32                   13.80n ± 2%    13.80n ± 2%         ~ (p=0.751 n=20)
Float64-32                   23.55n ± 2%    23.87n ± 0%         ~ (p=0.408 n=20)
ExpFloat64-32                15.36n ± 1%    15.29n ± 2%         ~ (p=0.316 n=20)
NormFloat64-32               13.57n ± 1%    13.79n ± 1%    +1.66% (p=0.005 n=20)
Perm3-32                     45.70n ± 2%    46.99n ± 2%    +2.81% (p=0.001 n=20)
Perm30-32                    399.0n ± 1%    403.8n ± 1%    +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32          349.0n ± 1%    350.4n ± 1%         ~ (p=0.909 n=20)
ShuffleOverhead-32           322.3n ± 1%    323.8n ± 1%         ~ (p=0.410 n=20)
Concurrent-32                3.331n ± 1%    7.312n ± 1%  +119.50% (p=0.000 n=20)

For #61716.

Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-05 20:34:30 +00:00
Joel Sing 70c7fb75e9 cmd/compile: correct code generation for right shifts on riscv64
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.

Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.

Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.

Fixes #64285

Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-12-01 19:30:59 +00:00
Keith Randall bda1ef13f8 cmd/compile: fix memcombine pass for big endian, > 1 byte elements
The shift amounts were wrong in this case, leading to miscompilation
of load combining.

Also the store combining was not triggering when it should.

Fixes #64468

Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b
Reviewed-on: https://go-review.googlesource.com/c/go/+/546355
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-30 18:35:50 +00:00
Jes Cok 631a6c2abf all: add missing copyright header
Change-Id: Ic61fb181923159e80a86a41582e83ec466ab9bc4
GitHub-Last-Rev: 9246984566
GitHub-Pull-Request: golang/go#64080
Reviewed-on: https://go-review.googlesource.com/c/go/+/541741
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Jes Cok <xigua67damn@gmail.com>
2023-11-17 23:34:11 +00:00
Michael Pratt 42bd21be1c cmd/compile: support lookup of functions from export data
As of CL 539699, PGO-based devirtualization supports devirtualization of
function values in addition to interface method calls. As with CL
497175, we need to explicitly look up functions from export data that
may not be imported already.

Symbol naming is ambiguous (`foo.Bar.func1` could be a closure or a
method), so we simply attempt to do both types of lookup. That said,
closures are defined in export data only as OCLOSURE nodes in the
enclosing function, which this CL does not yet attempt to expand.

For #61577.

Change-Id: Ic7205b046218a4dfb8c4162ece3620ed1c3cb40a
Reviewed-on: https://go-review.googlesource.com/c/go/+/540258
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-11-13 18:17:57 +00:00
Michael Pratt fb6ff1e4ca cmd/compile: initial function value devirtualization
Today, PGO-based devirtualization only applies to interface calls. This
CL extends initial support to function values (i.e., function/closure
pointers passed as arguments or stored in a struct).

This CL is a minimal implementation with several limitations.

* Export data lookup of function value callees not implemented
  (equivalent of CL 497175; done in CL 540258).
* Callees must be standard static functions. Callees that are closures
  (requiring closure context) are not supported.

For #61577.

Change-Id: I7d328859035249e176294cd0d9885b2d08c853f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/539699
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-13 18:17:47 +00:00
Michael Anthony Knyszek 43ffe2a892 runtime: add execution tracer v2 behind GOEXPERIMENT=exectracer2
This change mostly implements the design described in #60773 and
includes a new scalable parser for the new trace format, available in
internal/trace/v2. I'll leave this commit message short because this is
clearly an enormous CL with a lot of detail.

This change does not hook up the new tracer into cmd/trace yet. A
follow-up CL will handle that.

For #60773.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0
Reviewed-on: https://go-review.googlesource.com/c/go/+/494187
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-10 15:49:59 +00:00
Michael Anthony Knyszek f7c5cbb820 runtime: fix user arena heap bits writing on big endian platforms
Currently the user arena code writes heap bits to the (*mspan).heapBits
space with the platform-specific byte ordering (the heap bits are
written and managed as uintptrs). However, the compiler always emits GC
metadata for types in little endian.

Because the scanning part of the code that loads through the type
pointer in the allocation header expects little endian ordering, we end
up with the wrong byte ordering in GC when trying to scan arena memory.

Fix this by writing out the user arena heap bits in little endian on big
endian platforms.

This means that the space returned by (*mspan).heapBits has a different
meaning for user arenas and small object spans, which is a little odd,
so I documented it. To reduce the chance of misuse of the writeHeapBits
API, which now writes out heap bits in a different ordering than
writeSmallHeapBits on big endian platforms, this change also renames
writeHeapBits to writeUserArenaHeapBits.

Much of this can be avoided in the future if the compiler were to write
out the pointer/scalar bits as an array of uintptr values instead of
plain bytes. That's too big of a change for right now though.

This change is a no-op on little endian platforms. I confirmed it by
checking for any assembly code differences in the runtime test binary.
There were none. With this change, the arena tests pass on ppc64.

Fixes #64048.

Change-Id: If077d003872fcccf5a154ff5d8441a58582061bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/541315
Run-TryBot: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-10 04:46:18 +00:00
Michael Anthony Knyszek 38ac7c41aa runtime: implement experiment to replace heap bitmap with alloc headers
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.

As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.

Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.

Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.

The full implementation is behind GOEXPERIMENT=allocheaders.

The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.

See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)

Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09 19:58:08 +00:00
Cherry Mui 8c92897e15 cmd/compile: rework TestPGOHash to not rebuild dependencies
TestPGOHash may rebuild dependencies as we pass -trimpath to the
go command. This CL makes it pass -trimpath compiler flag to only
the current package instead, as we only need the current package
to have a stable source file path.

Also refactor buildPGOInliningTest to only take compiler flags,
not go flags, to avoid accidental rebuild.

Should fix #63733.

Change-Id: Iec6c4e90cf659790e21083ee2e697f518234c5b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/535915
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-27 17:54:18 +00:00
Michael Pratt bfb8924653 cmd/compile: lookup indirect callees from export data for devirtualization
Today, the PGO IR graph only contains entries for ir.Func loaded into
the package. This can include functions from transitive dependencies,
but only if they happen to be referenced by something in the current
package. If they are not referenced, noder never bothers to load them.

This leads to a deficiency in PGO devirtualization: some callee methods
are available in transitive dependencies but do not devirtualize because
they happen to not get loaded from export data.

Resolve this by adding an explicit lookup from export data of callees
mentioned in the profile.

I have chosen to do this during loading of the profile for simplicity:
the PGO IR graph always contains all of the functions we might need.
That said, it isn't strictly necessary. PGO devirtualization could do
the lookup lazily if it decides it actually needs a method. This saves
work at the expense of a bit more complexity, but I've chosen the
simpler approach for now as I measured the cost of this as significantly
less than the rest of PGO loading.

For #61577.

Change-Id: Ieafb2a549510587027270ee6b4c3aefd149a901f
Reviewed-on: https://go-review.googlesource.com/c/go/+/497175
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-13 13:56:32 +00:00
Keith Randall cbcf8efa5f cmd/compile: use cache in front of type assert runtime call
That way we don't need to call into the runtime for every
type assertion (to an interface type).

name           old time/op  new time/op  delta
TypeAssert-24  3.78ns ± 3%  1.00ns ± 1%  -73.53%  (p=0.000 n=10+8)

Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde
Reviewed-on: https://go-review.googlesource.com/c/go/+/529316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-06 17:02:53 +00:00
Keith Randall 39263f34a3 cmd/compile: add a cache to interface type switches
That way we don't need to call into the runtime when the type being
switched on has been seen many times before.

The cache is just a hash table of a sample of all the concrete types
that have been switched on at that source location.  We record the
matching case number and the resulting itab for each concrete input
type.

The caches seldom get large. The only two in a run of all.bash that
get more than 100 entries, even with the sampling rate set to 1, are

test/fixedbugs/issue29264.go, with 101
test/fixedbugs/issue29312.go, with 254

Both happen at the type switch in fmt.(*pp).handleMethods, perhaps
unsurprisingly.

name                                 old time/op  new time/op  delta
SwitchInterfaceTypePredictable-24    25.8ns ± 2%   2.5ns ± 3%  -90.43%  (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24  37.5ns ± 2%  11.2ns ± 1%  -70.02%  (p=0.000 n=10+10)

Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/526658
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-06 15:44:08 +00:00
Keith Randall 28f4ea16a2 cmd/compile: improve interface type switches
For type switches where the targets are interface types,
call into the runtime once instead of doing a sequence
of assert* calls.

name                                 old time/op  new time/op  delta
SwitchInterfaceTypePredictable-24    26.6ns ± 1%  25.8ns ± 2%  -2.86%  (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24  39.3ns ± 1%  37.5ns ± 2%  -4.57%  (p=0.000 n=10+10)

Not super helpful by itself, but this code organization allows
followon CLs that add caching to the lookups.

Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c
Reviewed-on: https://go-review.googlesource.com/c/go/+/526657
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-06 15:42:30 +00:00
David Chase a903639608 cmd/compile: adjust GOSSAFUNC html dumping to be more ABI-aware
Uses ,ABI instead of <ABI> because of problems with shell escaping
and windows file names, however if someone goes to all the trouble
of escaping the linker syntax and uses that instead, that works too.

Examples:
```
GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html

GOSSADIR=`pwd` GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/runtime.exitsyscall,0.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/runtime.exitsyscall,1.html

GOSSAFUNC=runtime.exitsyscall,0 go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html

GOSSAFUNC=runtime.exitsyscall\<1\> go build main.go
\# runtime
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html
```

Change-Id: Ia1138b61c797d0de49dbfae702dc306b9650a7f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/532475
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
2023-10-04 15:11:40 +00:00
Cherry Mui 3fb86fb864 cmd/compile: add pgohash for debugging/bisecting PGO optimizations
When a PGO build fails or produces incorrect program, it is often
unclear what the problem is. Add pgo hash so we can bisect to
individual optimization decisions, which often helps debugging.

Related to #58153.

Change-Id: I651ffd9c53bad60f2f28c8ec2a90a3f532982712
Reviewed-on: https://go-review.googlesource.com/c/go/+/528400
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-09-19 15:42:58 +00:00
Matthew Dempsky c0e2a9dffd cmd/compile/internal/abi: use Type.Registers
Now that types can self-report how many registers they need, it's much
easier to determine whether a parameter will fit into the available
registers: simply compare the number of registers needed against the
number of registers still available.

This also eliminates the need for the NumParamRegs cache.

Also, the new code in NumParamRegs is stricter in only allowing it to
be called on types that can actually be passed in registers, which
requires a test to be corrected for that. While here, change mkstruct
to a variadic function, so the call sites require less boilerplate.

Change-Id: Iebe1a0456a8053a10e551e5da796014e5b1b695b
Reviewed-on: https://go-review.googlesource.com/c/go/+/527339
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-09-13 18:34:00 +00:00
Dmitri Shuralyov 0dfb22ed70 all: use ^TestName$ regular pattern for invoking a single test
Use ^ and $ in the -run flag regular expression value when the intention
is to invoke a single named test. This removes the reliance on there not
being another similarly named test to achieve the intended result.

In particular, package syscall has tests named TestUnshareMountNameSpace
and TestUnshareMountNameSpaceChroot that both trigger themselves setting
GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a
consequence of overlap in their test names, the former was inadvertently
triggering one too many helpers.

Spotted while reviewing CL 525196. Apply the same change in other places
to make it easier for code readers to see that said tests aren't running
extraneous tests. The unlikely cases of -run=TestSomething intentionally
being used to run all tests that have the TestSomething substring in the
name can be better written as -run=^.*TestSomething.*$ or with a comment
so it is clear it wasn't an oversight.

Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/524948
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-05 23:35:29 +00:00
Egon Elbre 1d84b02b22 runtime: introduce nextslicecap
This allows to reuse the slice cap computation across
specialized growslice funcs.

Updates #49480

Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961
Reviewed-on: https://go-review.googlesource.com/c/go/+/495876
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Egon Elbre <egonelbre@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
2023-09-04 17:50:50 +00:00
Keith Randall 4711299661 cmd/compile: use jump tables for large type switches
For large interface -> concrete type switches, we can use a jump
table on some bits of the type hash instead of a binary search on
the type hash.

name                        old time/op  new time/op  delta
SwitchTypePredictable-24    1.99ns ± 2%  1.78ns ± 5%  -10.87%  (p=0.000 n=10+10)
SwitchTypeUnpredictable-24  11.0ns ± 1%   9.1ns ± 2%  -17.55%  (p=0.000 n=7+9)

Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259
Reviewed-on: https://go-review.googlesource.com/c/go/+/521497
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-08-23 00:30:54 +00:00
Matthew Dempsky 7af28fa90e cmd/compile/internal/ir: remove AsNode
Except for a single call site in escape analysis, every use of
ir.AsNode involves a types.Object that's known to contain
an *ir.Name. Asserting directly to that type makes the code simpler
and more efficient.

The one use in escape analysis is extended to handle nil correctly
without it.

Change-Id: I694ae516903e541341d82c2f65a9155e4b0a9809
Reviewed-on: https://go-review.googlesource.com/c/go/+/520775
TryBot-Bypass: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-08-18 22:41:47 +00:00
Matthew Dempsky e453971005 cmd/compile/internal/ir: add typ parameter to NewNameAt
Start making progress towards constructing IR with proper types.

Change-Id: Iad32c1cf60f30ceb8e07c31c8871b115570ac3bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/520263
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-08-17 19:36:24 +00:00
Cherry Mui 38e2376f35 cmd/compile: adjust PGO devirtualization diagnostic message
Make it more consistent with the static devirtualization
diagnostic message. Keep the print of concrete callee's method
name, as it is clearer.

Change-Id: Ibe9b40253eaff2c0071353a2b388177213488822
Reviewed-on: https://go-review.googlesource.com/c/go/+/500960
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-06 02:36:18 +00:00
thepudds 3637132233 cmd/compile/internal/devirtualize: devirtualize methods in other packages if current package has a concrete reference
The new PGO-driven indirect call specialization from CL 492436
in theory should allow for devirtualization on methods
in another package when those methods are directly referenced
in the current package.

However, inline.InlineImpossible was checking for a zero-length
fn.Body and would cause devirtualization to fail
with a debug log message like:

 "should not PGO devirtualize (*Speaker1).Speak: no function body"

Previously, the logic in inline.InlineImpossible was only
called on local functions, but with PGO-based devirtualization,
it can now be called on imported functions, where inlinable
imported functions will have a zero-length fn.Body but a
non-nil fn.Inl.

We update inline.InlineImpossible to handle imported functions
by adding a call to typecheck.HaveInlineBody in the check
that was previously failing.

For the test, we need to have a hopefully temporary workaround
of adding explicit references to the callees in another package
for devirtualization to work. CL 497175 or similar should
enable removing this workaround.

Fixes #60561
Updates #59959

Change-Id: I48449b7d8b329d84151bd3b506b8093c262eb2a3
GitHub-Last-Rev: 2d53c55fd8
GitHub-Pull-Request: golang/go#60565
Reviewed-on: https://go-review.googlesource.com/c/go/+/500155
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-03 01:13:08 +00:00
Michael Pratt 8c445b7c9f cmd/compile: enable PGO-driven call devirtualization
This CL is originally based on CL 484838 from rajbarik@uber.com.

Add a new PGO-based devirtualize pass. This pass conditionally
devirtualizes interface calls for the hottest callee. That is, it
performs a transformation like:

	type Iface interface {
		Foo()
	}

	type Concrete struct{}

	func (Concrete) Foo() {}

	func foo(i Iface) {
		i.Foo()
	}

to:

	func foo(i Iface) {
		if c, ok := i.(Concrete); ok {
			c.Foo()
		} else {
			i.Foo()
		}
	}

The primary benefit of this transformation is enabling inlining of the
direct calls.

Today this change has no impact on the escape behavior, as the fallback
interface always forces an escape. But improving escape analysis to take
advantage of this is an area of potential work.

This CL is the bare minimum of a devirtualization implementation. There
are still numerous limitations:

* Callees not directly referenced in the current package can be missed
  (even if they are in the transitive dependences).
* Callees not in the transitive dependencies of the current package are
  missed.
* Only interface method calls are supported, not other indirect function
  calls.
* Multiple calls to compatible interfaces on the same line cannot be
  distinguished and will use the same callee target.
* Callees that only partially implement an interface (they are embedded
  in another type that completes the interface) cannot be devirtualized.
* Others, mentioned in TODOs.

Fixes #59959

Change-Id: I8bedb516139695ee4069650b099d05957b7ce5ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/492436
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-22 19:37:24 +00:00