Commit Graph

63309 Commits

Author SHA1 Message Date
thepudds c3bb27bbc7 cmd/compile/internal/walk: use global zeroVal in interface conversions for zero values
This is a small-ish adjustment to the change earlier in our
stack in CL 649555, which started creating read-only global storage
for a composite literal used in an interface conversion and setting
the interface data pointer to point to that global storage.

In some cases, there are execution-time performance benefits to point
to runtime.zeroVal in particular. In reflect, pointer checks against
the runtime.zeroVal memory address are used to side-step some work,
such as in reflect.Value.Set and reflect.Value.IsZero.

In this CL, we therefore dig up the zeroVal symbol, and we use the
machinery from earlier in our stack to use a pointer to zeroVal for
the interface data pointer if we see examples like:

    sink = S{}
or:
    s := S{}
    sink = s

CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

We add a benchmark in reflect to show examples of performance benefit.
The left column is our immediately prior CL 649555, and the right is
this CL. (The arrays of structs here do not seem to benefit, which
we attempt to address in our next CL).

goos: linux
goarch: amd64
pkg: reflect
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
                                          │  cl-649555   │           new                       │
                                          │    sec/op    │   sec/op     vs base                │
Zero/IsZero/ByteArray/size=16-4              4.176n ± 0%   4.171n ± 0%        ~ (p=0.151 n=20)
Zero/IsZero/ByteArray/size=64-4              6.921n ± 0%   3.864n ± 0%  -44.16% (p=0.000 n=20)
Zero/IsZero/ByteArray/size=1024-4           21.210n ± 0%   3.878n ± 0%  -81.72% (p=0.000 n=20)
Zero/IsZero/BigStruct/size=1024-4           25.505n ± 0%   5.061n ± 0%  -80.15% (p=0.000 n=20)
Zero/IsZero/SmallStruct/size=16-4            4.188n ± 0%   4.191n ± 0%        ~ (p=0.106 n=20)
Zero/IsZero/SmallStructArray/size=64-4       8.639n ± 0%   8.636n ± 0%        ~ (p=0.973 n=20)
Zero/IsZero/SmallStructArray/size=1024-4     79.99n ± 0%   80.06n ± 0%        ~ (p=0.213 n=20)
Zero/IsZero/Time/size=24-4                   7.232n ± 0%   3.865n ± 0%  -46.56% (p=0.000 n=20)
Zero/SetZero/ByteArray/size=16-4             13.47n ± 0%   13.09n ± 0%   -2.78% (p=0.000 n=20)
Zero/SetZero/ByteArray/size=64-4             14.14n ± 0%   13.70n ± 0%   -3.15% (p=0.000 n=20)
Zero/SetZero/ByteArray/size=1024-4           24.22n ± 0%   20.18n ± 0%  -16.68% (p=0.000 n=20)
Zero/SetZero/BigStruct/size=1024-4           24.24n ± 0%   20.18n ± 0%  -16.73% (p=0.000 n=20)
Zero/SetZero/SmallStruct/size=16-4           13.45n ± 0%   13.10n ± 0%   -2.60% (p=0.000 n=20)
Zero/SetZero/SmallStructArray/size=64-4      14.12n ± 0%   13.69n ± 0%   -3.05% (p=0.000 n=20)
Zero/SetZero/SmallStructArray/size=1024-4    24.62n ± 0%   21.61n ± 0%  -12.26% (p=0.000 n=20)
Zero/SetZero/Time/size=24-4                  13.59n ± 0%   13.40n ± 0%   -1.40% (p=0.000 n=20)
geomean                                      14.06n        10.19n       -27.54%

Finally, here are results from the benchmark example from #71323.
Note however that almost all the benefit shown here is from our earlier
CL 649555, which is a more general purpose change and eliminates
the allocation using a different read-only global than this CL.

             │   go1.24       │               new                    │
             │     sec/op     │    sec/op     vs base                │
InterfaceAny   112.6000n ± 5%   0.8078n ± 3%  -99.28% (p=0.000 n=20)
ReflectValue      11.63n ± 2%    11.59n ± 0%        ~ (p=0.330 n=20)

             │  go1.24.out  │                 new.out                 │
             │     B/op     │    B/op     vs base                     │
InterfaceAny   224.0 ± 0%       0.0 ± 0%  -100.00% (p=0.000 n=20)
ReflectValue   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹

             │  go1.24.out  │                 new.out                 │
             │  allocs/op   │ allocs/op   vs base                     │
InterfaceAny   1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=20)
ReflectValue   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹

Updates #71359
Updates #71323

Change-Id: I64d8cf1a7900f011d2ec59b948388aeda1150676
Reviewed-on: https://go-review.googlesource.com/c/go/+/649078
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-21 12:24:22 -07:00
thepudds f4de2ecffb cmd/compile/internal/walk: convert composite literals to interfaces without allocating
Today, this interface conversion causes the struct literal
to be heap allocated:

    var sink any

    func example1() {
        sink = S{1, 1}
    }

For basic literals like integers that are directly used in
an interface conversion that would otherwise allocate, the compiler
is able to use read-only global storage (see #18704).

This CL extends that to struct and array literals as well by creating
read-only global storage that is able to represent for example S{1, 1},
and then using a pointer to that storage in the interface
when the interface conversion happens.

A more challenging example is:

    func example2() {
        v := S{1, 1}
        sink = v
    }

In this case, the struct literal is not directly part of the
interface conversion, but is instead assigned to a local variable.

To still avoid heap allocation in cases like this, in walk we
construct a cache that maps from expressions used in interface
conversions to earlier expressions that can be used to represent the
same value (via ir.ReassignOracle.StaticValue). This is somewhat
analogous to how we avoided heap allocation for basic literals in
CL 649077 earlier in our stack, though here we also need to do a
little more work to create the read-only global.

CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

See the writeup in #71359 for details.

Fixes #71359
Fixes #71323
Updates #62653
Updates #53465
Updates #8618

Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/649555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:23:26 -07:00
Filippo Valsorda ce46c9db86 internal/godebug,crypto/fips140: make fips140 setting immutable
Updates #70123

Co-authored-by: qmuntal <quimmuntal@gmail.com>
Change-Id: I6a6a4656fd23ecd82428cccbd7c48692287fc75a
Reviewed-on: https://go-review.googlesource.com/c/go/+/657116
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
2025-05-21 12:21:44 -07:00
Filippo Valsorda d327e52d43 crypto/internal/fips140: use hash.Hash
Since package hash is just the interface definition, not an
implementation, we can make a good argument that it doesn't impact the
security of the module and can be imported from outside.

For #69521

Change-Id: I6a6a4656b9c3cac8bb9ab8e8df11fa3238dc5d1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/674917
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
2025-05-21 12:20:10 -07:00
Junyang Shao d6c29c7156 cmd/compile: fix offset calculation error in memcombine
Fixes #73812

Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196
Reviewed-on: https://go-review.googlesource.com/c/go/+/675175
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 12:17:08 -07:00
Daniel McCarney a21b71daf5 crypto/tls: have servers prefer TLS 1.3 when supported
Previously the common Config.mutualVersion() code prioritized the
selected version based on the provided peerVersions being sent in peer
preference order.

Instead we would prefer to see TLS 1.3 used whenever it is
supported, even if the peer would prefer an older protocol version.
This commit updates mutualVersions() to implement this policy change.

Our new behaviour matches the behaviour of other TLS stacks, notably
BoringSSL, and so also allows enabling the IgnoreClientVersionOrder BoGo
test that we otherwise must skip.

Updates #72006

Change-Id: I27a2cd231e4b8762b0d9e2dbd3d8ddd5b87fd5cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/673236
Auto-Submit: Daniel McCarney <daniel@binaryparadox.net>
TryBot-Bypass: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-05-21 12:17:01 -07:00
Roland Shoemaker c5a1fc1f97 crypto/tls: add GetEncryptedClientHelloKeys
This allows servers to rotate their ECH keys without needing to restart
the server.

Fixes #71920

Change-Id: I55591ab3303d5fde639038541c50edcf1fafc9aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/670655
TryBot-Bypass: Roland Shoemaker <roland@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
2025-05-21 12:15:37 -07:00
Filippo Valsorda a731955f0f crypto/sha1: use cryptotest.TestAllImplementations and impl.Register
Not running TryBots on s390x because the new LUCI builder is broken.

Change-Id: I6a6a4656a8d52fa5ace9effa67a88fbfd7d19b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/674915
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
2025-05-21 12:11:13 -07:00
Roland Shoemaker 272262750f crypto: drop pre-AVX2 SHA assembly implementations
Drop the entire pre-AVX2 assembly implementation of SHA-1, SHA-256, and
SHA-512. This also technically impacts the SHA-1 AVX2 implementation,
since it previously called the pre-AVX2 implementation for the last
block if the number of blocks wasn't a multiple of 2.

Instead of keeping the entire implementation just for that case, we
just call the generic implementation for the last block. This will be a
little slower, but still seems like a win.

Updates #69587

Change-Id: Id5234c42910d8c6ec6b8df700a721c0953dff02b
Reviewed-on: https://go-review.googlesource.com/c/go/+/657716
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2025-05-21 12:10:59 -07:00
Roland Shoemaker 919d9858bc crypto/internal/fips140/sha3: remove usages of WORD for s390x
We support KIMD and KLMD now, paves the way for banning usage of BYTE
and WORD instructions in crypto assembly.

Change-Id: I0f93744663f23866b2269591db70389e0c77fa4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671095
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-05-21 12:10:20 -07:00
Julian Zhu 3dbc775d60 cmd/compile/internal: intrinsify publicationBarrier on mipsx
This enables publicationBarrier to be used as an intrinsic on mipsx.

Change-Id: Ic199f34b84b3058bcfab79aac8f2399ff21a97ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/674856
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:08:18 -07:00
Mateusz Poliwczak 3a7a856951 crypto/x509: disallow negative path length
pathLenConstraint is restricted to unsigned integers.
Also the -1 value of cert.MaxPathLength has a special
meaning, so we shouldn't allow unmarshaling -1.

BasicConstraints ::= SEQUENCE {
     cA                      BOOLEAN DEFAULT FALSE,
     pathLenConstraint       INTEGER (0..MAX) OPTIONAL }

Change-Id: I485a6aa7223127becc86c423e1ef9ed2fbd48209
GitHub-Last-Rev: 75a11b47b9
GitHub-Pull-Request: golang/go#60706
Reviewed-on: https://go-review.googlesource.com/c/go/+/502076
Reviewed-by: Roland Shoemaker <roland@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2025-05-21 12:07:24 -07:00
Julian Zhu 94e3caeec1 cmd/compile/internal: intrinsify publicationBarrier on mips64x
This enables publicationBarrier to be used as an intrinsic on mips64x.

Change-Id: I4030ea65086c37ee1dcc1675d0d5d40ef8683851
Reviewed-on: https://go-review.googlesource.com/c/go/+/674855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2025-05-21 12:06:44 -07:00
Michael Anthony Knyszek 77345f41ee internal/trace: skip clock snapshot checks on Windows in stress mode
Windows' monotonic and wall clock granularity is just too coarse to get
reasonable values out of stress mode, which is creating new trace
generations constantly.

Fixes #73813.

Change-Id: Id9cb2fed9775ce8d78a736d0164daa7bf45075e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/675096
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 12:03:26 -07:00
thepudds ed24bb4e60 cmd/compile/internal/escape: propagate constants to interface conversions to avoid allocs
Currently, the integer value in the following interface conversion gets
heap allocated:

   v := 1000
   fmt.Println(v)

In contrast, this conversion does not currently cause the integer value
to be heap allocated:

   fmt.Println(1000)

The second example is able to avoid heap allocation because of an
optimization in walk (by Josh in #18704 and related issues) that
recognizes a literal is being used. In the first example, that
optimization is currently thwarted by the literal getting assigned
to a local variable prior to use in the interface conversion.

This CL propagates constants to interface conversions like
in the first example to avoid heap allocations, instead using
a read-only global. The net effect is roughly turning the first example
into the second.

One place this comes up in practice currently is with logging or
debug prints. For example, if we have something like:

   func conditionalDebugf(format string, args ...interface{}) {
   	if debugEnabled {
   		fmt.Fprintf(io.Discard, format, args...)
   	}
   }

Prior to this CL, this integer is heap allocated, even when the
debugEnabled flag is false, and even when the compiler
inlines conditionalDebugf:

   v := 1000
   conditionalDebugf("hello %d", v)

With this CL, the integer here is no longer heap allocated, even when
the debugEnabled flag is enabled, because the compiler can now see that
it can use a read-only global.

See the writeup in #71359 for more details.

CL 649076 (earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

Updates #71359
Updates #62653
Updates #53465
Updates #8618

Change-Id: I19a51e74b36576ebb0b9cf599267cbd2bd847ce4
Reviewed-on: https://go-review.googlesource.com/c/go/+/649079
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:02:43 -07:00
thepudds e89791983a cmd/compile/internal/escape: use an ir.ReassignOracle
Using the new-ish ir.ReassignOracle is more efficient than calling
ir.StaticValue repeatedly.

This CL now uses an ir.ReassignOracle for the recent
make constant propagation introduced in CL 649035.

We also pull the main change from CL 649035 into a new function,
which we will update later in our stack. We will also use the
ReassignOracles introduced here later in our stack.

(We originally did most of this work in CL 649077, but we abandoned
that in favor of CL 649035).

We could also use an ir.ReassignOracle in the older processing of
ir.OCALLFUNC in (*escape).call, but for now, we just leave that
as a TODO.

Updates #71359

Change-Id: I6e02eeac269bde3a302622b4dfe0c8dc63ec9ffc
Reviewed-on: https://go-review.googlesource.com/c/go/+/673795
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-21 12:01:01 -07:00
Damien Neil 4b7aa542eb os: add Root.ReadFile and Root.WriteFile
For #73126

Change-Id: Ie69cc274e7b59f958c239520318b89ff0141e26b
Reviewed-on: https://go-review.googlesource.com/c/go/+/674315
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
2025-05-21 11:59:27 -07:00
Sean Liao 3ae95aafb5 log/slog: add GroupAttrs
GroupAttrs is a more efficient version of Group
that takes a slice of Attr values.

Fixes #66365

Change-Id: Ic3046704825e17098f2fea5751f2959dce1073e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672915
Reviewed-by: Jonathan Amsterdam <jba@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 11:29:28 -07:00
Michael Pratt ce49eb488a runtime: skip windows stack tests in race mode
These became race instrumented in CL 643897, but race mode uses more
memory, so the test doesn't make much sense.

For #71395.

Change-Id: I6a6a636cf09ba29625aa9a22550314845fb2e611
Reviewed-on: https://go-review.googlesource.com/c/go/+/675077
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 11:05:35 -07:00
Michael Pratt d2f229db7a runtime: avoid register clobber in s390x racecall
This is a regression in CL 643875. Loading gsignal clobbers R8, which
contains the m pointer needed for loading g0.

For #71395.

Change-Id: I6a6a636ca95442767efe0eb1b358f2139d18c5b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/675035
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 11:05:32 -07:00
Lokesh Kumar 304d9e2fd1 bufio: update buffer documentation
Fixes #73778

Change-Id: If6d87a92786c9b0ee2bd790b57937919afe0fc5c
GitHub-Last-Rev: 4b4c7595d5
GitHub-Pull-Request: golang/go#73804
Reviewed-on: https://go-review.googlesource.com/c/go/+/674695
Auto-Submit: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-05-21 11:05:26 -07:00
Filippo Valsorda d3d22cc5e4 lib/fips140: set inprocess.txt to v1.0.0
Fixes #70200

Change-Id: I6a6a46567ce0834fb4b7f28bf06646326f8e5105
Reviewed-on: https://go-review.googlesource.com/c/go/+/674916
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 10:59:42 -07:00
Michael Pratt 419367969c cmd/link: require cgo internal linking in TestIssue33979
This was a typo regression in CL 643897, which accidentally dropped the
requirement for cgo internal linking. As a result, this test is
continuously failing on windows-arm64.

For #71395.

Cq-Include-Trybots: luci.golang.try:gotip-windows-arm64
Change-Id: I6a6a636c25fd399cda6649ef94655aa112f10f63
Reviewed-on: https://go-review.googlesource.com/c/go/+/675015
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 10:42:43 -07:00
Roland Shoemaker 360600b1d2 crypto/tls: replace custom intern cache with weak cache
Uses the new weak package to replace the existing custom intern cache
with a map of weak.Pointers instead. This simplifies the cache, and
means we don't need to store a slice of handles on the Conn anymore.

Change-Id: I5c2bf6ef35fac4255e140e184f4e48574b34174c
Reviewed-on: https://go-review.googlesource.com/c/go/+/644176
TryBot-Bypass: Roland Shoemaker <roland@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
2025-05-21 10:31:41 -07:00
Michael Pratt e6dacf91ff runtime: use cgroup CPU limit to set GOMAXPROCS
This CL adds two related features enabled by default via compatibility
GODEBUGs containermaxprocs and updatemaxprocs.

On Linux, containermaxprocs makes the Go runtime consider cgroup CPU
bandwidth limits (quota/period) when setting GOMAXPROCS. If the cgroup
limit is lower than the number of logical CPUs available, then the
cgroup limit takes precedence.

On all OSes, updatemaxprocs makes the Go runtime periodically
recalculate the default GOMAXPROCS value and update GOMAXPROCS if it has
changed. If GOMAXPROCS is set manually, this update does not occur. This
is intended primarily to detect changes to cgroup limits, but it applies
on all OSes because the CPU affinity mask can change as well.

The runtime only considers the limit in the leaf cgroup (the one that
actually contains the process), caching the CPU limit file
descriptor(s), which are periodically reread for updates. This is a
small departure from the original proposed design. It will not consider
limits of parent cgroups (which may be lower than the leaf), and it will
not detection cgroup migration after process start.

We can consider changing this in the future, but the simpler approach is
less invasive; less risk to packages that have some awareness of runtime
internals. e.g., if the runtime periodically opens new files during
execution, file descriptor leak detection is difficult to implement in a
stable way.

For #73193.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: I6a6a636c631c1ae577fb8254960377ba91c5dc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/670497
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 10:21:55 -07:00
Michael Pratt f12c66fbed internal/runtime/cgroup: CPU cgroup limit discovery
For #73193.

Change-Id: I6a6a636ca9fa9cba429cf053468c56c2939cb1ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/668638
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 10:21:47 -07:00
Michael Pratt 06450a82b0 internal/runtime/cgroup: add line-by-line reader using a single scratch buffer
Change-Id: I6a6a636ca21edcc6f16705fbb72a5241d4f7f22d
Reviewed-on: https://go-review.googlesource.com/c/go/+/668637
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 10:21:38 -07:00
Damien Neil e59e128f90 os: add Root.MkdirAll
For #67002

Change-Id: Idd74b5b59e787e89bdfad82171b6a7719465f501
Reviewed-on: https://go-review.googlesource.com/c/go/+/674116
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 10:14:43 -07:00
Roland Shoemaker 63dcc7b906 crypto/sha1: add sha-ni AMD64 implementation
Based on the Intel docs. Provides a ~44% speed-up compared to the AVX
implementation and a ~57% speed-up compared to the generic AMD64
assembly implementation.

                    │ /usr/local/google/home/bracewell/sha1-avx.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
                    │                     sec/op                      │            sec/op             vs base                │
Hash8Bytes/New-24                                        157.60n ± 0%                    92.51n ± 0%  -41.30% (p=0.000 n=20)
Hash8Bytes/Sum-24                                        147.00n ± 0%                    85.06n ± 0%  -42.14% (p=0.000 n=20)
Hash320Bytes/New-24                                       625.3n ± 0%                    276.7n ± 0%  -55.75% (p=0.000 n=20)
Hash320Bytes/Sum-24                                       626.2n ± 0%                    272.4n ± 0%  -56.51% (p=0.000 n=20)
Hash1K/New-24                                            1206.5n ± 0%                    692.2n ± 0%  -42.63% (p=0.000 n=20)
Hash1K/Sum-24                                            1210.0n ± 0%                    688.2n ± 0%  -43.13% (p=0.000 n=20)
Hash8K/New-24                                             7.744µ ± 0%                    4.920µ ± 0%  -36.46% (p=0.000 n=20)
Hash8K/Sum-24                                             7.737µ ± 0%                    4.913µ ± 0%  -36.50% (p=0.000 n=20)
geomean                                                   971.5n                         536.1n       -44.81%

                    │ /usr/local/google/home/bracewell/sha1-avx.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
                    │                       B/s                       │             B/s              vs base                 │
Hash8Bytes/New-24                                        48.41Mi ± 0%                  82.47Mi ± 0%   +70.37% (p=0.000 n=20)
Hash8Bytes/Sum-24                                        51.90Mi ± 0%                  89.70Mi ± 0%   +72.82% (p=0.000 n=20)
Hash320Bytes/New-24                                      488.0Mi ± 0%                 1103.0Mi ± 0%  +126.01% (p=0.000 n=20)
Hash320Bytes/Sum-24                                      487.4Mi ± 0%                 1120.5Mi ± 0%  +129.91% (p=0.000 n=20)
Hash1K/New-24                                            809.6Mi ± 0%                 1410.8Mi ± 0%   +74.26% (p=0.000 n=20)
Hash1K/Sum-24                                            806.9Mi ± 0%                 1419.1Mi ± 0%   +75.86% (p=0.000 n=20)
Hash8K/New-24                                           1008.9Mi ± 0%                 1588.0Mi ± 0%   +57.40% (p=0.000 n=20)
Hash8K/Sum-24                                           1009.8Mi ± 0%                 1590.1Mi ± 0%   +57.47% (p=0.000 n=20)
geomean                                                  375.8Mi                       680.9Mi        +81.20%

                    │ /usr/local/google/home/bracewell/sha1-amd64.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
                    │                      sec/op                       │            sec/op             vs base                │
Hash8Bytes/New-24                                          153.90n ± 0%                    92.51n ± 0%  -39.89% (p=0.000 n=20)
Hash8Bytes/Sum-24                                          145.90n ± 0%                    85.06n ± 0%  -41.70% (p=0.000 n=20)
Hash320Bytes/New-24                                         666.8n ± 0%                    276.7n ± 0%  -58.50% (p=0.000 n=20)
Hash320Bytes/Sum-24                                         660.3n ± 0%                    272.4n ± 0%  -58.75% (p=0.000 n=20)
Hash1K/New-24                                              1810.5n ± 0%                    692.2n ± 0%  -61.77% (p=0.000 n=20)
Hash1K/Sum-24                                              1806.0n ± 0%                    688.2n ± 0%  -61.90% (p=0.000 n=20)
Hash8K/New-24                                              13.509µ ± 0%                    4.920µ ± 0%  -63.58% (p=0.000 n=20)
Hash8K/Sum-24                                              13.515µ ± 0%                    4.913µ ± 0%  -63.65% (p=0.000 n=20)
geomean                                                     1.248µ                         536.1n       -57.05%

                    │ /usr/local/google/home/bracewell/sha1-amd64.bench │ /usr/local/google/home/bracewell/sha1-ni-stack.bench │
                    │                        B/s                        │             B/s              vs base                 │
Hash8Bytes/New-24                                          49.57Mi ± 0%                  82.47Mi ± 0%   +66.37% (p=0.000 n=20)
Hash8Bytes/Sum-24                                          52.29Mi ± 0%                  89.70Mi ± 0%   +71.52% (p=0.000 n=20)
Hash320Bytes/New-24                                        457.7Mi ± 0%                 1103.0Mi ± 0%  +140.97% (p=0.000 n=20)
Hash320Bytes/Sum-24                                        462.2Mi ± 0%                 1120.5Mi ± 0%  +142.45% (p=0.000 n=20)
Hash1K/New-24                                              539.4Mi ± 0%                 1410.8Mi ± 0%  +161.57% (p=0.000 n=20)
Hash1K/Sum-24                                              540.7Mi ± 0%                 1419.1Mi ± 0%  +162.44% (p=0.000 n=20)
Hash8K/New-24                                              578.4Mi ± 0%                 1588.0Mi ± 0%  +174.57% (p=0.000 n=20)
Hash8K/Sum-24                                              578.1Mi ± 0%                 1590.1Mi ± 0%  +175.07% (p=0.000 n=20)
geomean                                                    292.4Mi                       680.9Mi       +132.86%

Change-Id: Ife90386ba410a80c2e6222c1fe4df2368c4e12b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/642157
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Neal Patel <nealpatel@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 10:14:30 -07:00
Roland Shoemaker 40b19b56a9 runtime: add valgrind instrumentation
Add build tag gated Valgrind annotations to the runtime which let it
understand how the runtime manages memory. This allows for Go binaries
to be run under Valgrind without emitting spurious errors.

Instead of adding the Valgrind headers to the tree, and using cgo to
call the various Valgrind client request macros, we just add an assembly
function which emits the necessary instructions to trigger client
requests.

In particular we add instrumentation of the memory allocator, using a
two-level mempool structure (as described in the Valgrind manual [0]).
We also add annotations which allow Valgrind to track which memory we
use for stacks, which seems necessary to let it properly function.

We describe the memory model to Valgrind as follows: we treat heap
arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we
can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE).
Within the pool we treat spans as "superblocks", annotated with
VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans
with VALGRIND_MALLOCLIKE_BLOCK.

It should be noted that running binaries under Valgrind can be _quite
slow_, and certain operations, such as running the GC, can be _very
slow_. It is recommended to run programs with GOGC=off. Additionally,
async preemption should be turned off, since it'll cause strange
behavior (GODEBUG=asyncpreemptoff=1).

Running Valgrind with --leak-check=yes will result in some errors
resulting from some things not being marked fully free'd. These likely
need more annotations to rectify, but for now it is recommended to run
with --leak-check=off.

Updates #73602

[0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools

Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301
Reviewed-on: https://go-review.googlesource.com/c/go/+/674077
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 10:08:08 -07:00
Michael Matloob 2a5ac1a993 cmd/doc: allow go doc -http without package in current directory
go doc tries to find a package to display documentation for. In the case
that no package is provided, it uses "." just like go list does. So if
go doc -http is run without any arguments, it tries to show the
documentation for the package in the current directory. As a special
case, if no arguments are provided, allow no package to match the
current directory and just open the root pkgsite page.

For #68106

Change-Id: I6d65b160a838591db953fac630eced6b09106877
Reviewed-on: https://go-review.googlesource.com/c/go/+/675075
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 09:58:00 -07:00
Michael Anthony Knyszek 8b45a3f78b runtime: guarantee checkfinalizers test allocates in a shared tiny block
Currently the checkfinalizers test (TestDetectCleanupOrFinalizerLeak)
only *tries* to ensure the tiny alloc with a cleanup attached shares a
block with other objects. However, what it does is insufficient, because
it could get unlucky and have the last object allocated be the first
object of a new block.

This change changes the test to guarantee that a tiny object is not at
the start of a fresh block by looking at the alignment of the object's
pointer. If the object's pointer is odd, then that's good enough to know
that it shares a block with something else, since the blocks themselves
are aligned to a much higher power of two.

This fixes a failure I've seen on the builders.

Fixes #73810.

Change-Id: Ieafdbb9cccb0d2dc3659a9a5d9d9233718461635
Reviewed-on: https://go-review.googlesource.com/c/go/+/674655
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-05-21 09:50:31 -07:00
Damien Neil 8960970009 os: add Root.RemoveAll
For #67002

Change-Id: If59dab4fd934a115d8ff383826525330de750b54
Reviewed-on: https://go-review.googlesource.com/c/go/+/661595
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
2025-05-21 09:30:51 -07:00
Mark Freeman 26e05b95c2 internal/pkgbits: specify that RelIdx is an element index
Without this, it's not clear what this is relative to or the
granularity of the index.

Change-Id: Ibaabe47e089f0ba9b084523969c5347ed4c9dbee
Reviewed-on: https://go-review.googlesource.com/c/go/+/674636
Auto-Submit: Mark Freeman <mark@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 08:48:01 -07:00
Xiaolin Zhao 4ce1c8e9e1 cmd/compile: add rules about ORN and ANDN
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279880    279776  -104   -0.0372%
    asm       556638    556410  -228   -0.0410%
    buildid   272272    272072  -200   -0.0735%
    cgo       481522    481318  -204   -0.0424%
    compile   2457788   2457580 -208   -0.0085%
    covdata   323384    323280  -104   -0.0322%
    cover     518450    518234  -216   -0.0417%
    dist      340790    340686  -104   -0.0305%
    distpack  282456    282252  -204   -0.0722%
    doc       789932    789688  -244   -0.0309%
    fix       324332    324228  -104   -0.0321%
    link      704622    704390  -232   -0.0329%
    nm        277132    277028  -104   -0.0375%
    objdump   507862    507758  -104   -0.0205%
    pack      221774    221674  -100   -0.0451%
    pprof     1469816   1469552 -264   -0.0180%
    test2json 254836    254732  -104   -0.0408%
    trace     1100002   1099738 -264   -0.0240%
    vet       781078    780874  -204   -0.0261%
    go        1529116   1528848 -268   -0.0175%
    gofmt     318556    318448  -108   -0.0339%
    total     13792238 13788566 -3672  -0.0266%

Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-21 08:28:37 -07:00
Guoqi Chen 0810fd2d92 cmd/internal/obj/loong64: remove unused register alias definitions
Change-Id: Ie788747372cd47cb3780e75b35750bb08bd166fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/542835
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 08:27:58 -07:00
Mark Freeman bfb8f13274 internal/pkgbits: indent productions and hoist some types up
The types being hoisted are those which cannot be referenced; that is,
where Ref[T] is illegal. These are most clearly owned by pkgbits. The
types which follow are those which can be referenced.

Referenceable types are more hazy due to the reference mechanism of UIR
- sections. These are a detail of the UIR file format and are surfaced
directly to importers.

I suspect that pkgbits would benefit from a reference mechanism not
dependent on sections. This would permit us to push down many types
from the noder into pkgbits, reducing the interface surface without
giving up deduplication.

Change-Id: Ifaf5cd9de20c767ad0941413385b308d628aac6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/674635
Auto-Submit: Mark Freeman <mark@golang.org>
TryBot-Bypass: Mark Freeman <mark@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-05-21 08:25:32 -07:00
Felix Geisendörfer 07b94b2db2 internal/trace: add generator tests for sync events
Add generator tests that verify the timestamps for the sync events
emitted in the go1.25 trace format and earlier versions.

Add the ability to configure the properties of the per-generation sync
batches in testgen. Also refactor testgen to produce more realistic
timestamps by keeping track of lastTs and using it for structural
batches that don't have their own timestamps. Otherwise they default to
zero which means the minTs of the generation can't be controlled.

For #69869

Change-Id: I92a49b8281bc4169b63e13c030c1de7720cd6f26
Reviewed-on: https://go-review.googlesource.com/c/go/+/653876
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 08:23:41 -07:00
Felix Geisendörfer b22da3f544 internal/trace/internal/testgen: make generated trace version configurable
Replace hard coded references to version.Go122 with the trace version
passed to NewTrace. This allows writing testgen tests for newer trace
versions.

For #69869

Change-Id: Id25350cea1c397a09ca23465526ff259e34a4752
Reviewed-on: https://go-review.googlesource.com/c/go/+/653875
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 08:23:37 -07:00
Felix Geisendörfer 847f157166 internal/trace: add a validator test for the new clock snapshots
Check that the clock snapshots, when expected to be present, are
non-zero and monotonically increasing.

This required some refactoring to make the validator aware of the
version of the trace it is validating.

Change-Id: I04c4dd10fe6975cbac12bb0ddaebcec3a5284e7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/669715
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-21 08:22:16 -07:00
Felix Geisendörfer 2d216141a1 internal/trace: expose clock snapshot timestamps on sync event
Add ClockSnapshot field to the Sync event type and populate it with the
information from the new EvClockSnapshot event when available.

For #69869

Change-Id: I3b24b5bfa15cc7a7dba270f5e6bf189adb096840
Reviewed-on: https://go-review.googlesource.com/c/go/+/653576
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-21 08:22:01 -07:00
Felix Geisendörfer 112c23612f runtime,internal/trace: emit clock snapshots at the start of trace generations
Replace the per-generation EvEventBatch containing a lone EvFrequency
event with a per-generation EvEventBatch containing a EvSync header
followed by an EvFrequency and EvClockSnapshot event.

The new EvClockSnapshot event contains trace, mono and wall clock
snapshots taken in close time proximity. Ignoring minor resolution
differences, the trace and mono clock are the same on linux, but not on
windows (which still uses a TSC based trace clock).

Emit the new sync batch at the very beginning of every new generation
rather than the end to be in harmony with the internal/trace reader
which emits a sync event at the beginning of every generation as well
and guarantees monotonically increasing event timestamps.

Bump the version of the trace file format to 1.25 since this change is
not backwards compatible.

Update the internal/trace reader implementation to decode the new
events, but do not expose them to the public reader API yet. This is
done in the next CL.

For #69869

Change-Id: I5bfedccdd23dc0adaf2401ec0970cbcc32363393
Reviewed-on: https://go-review.googlesource.com/c/go/+/653575
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-21 08:20:34 -07:00
Mark Ryan 0d7dc6842b cmd/internal/obj/riscv: fix vector integer multiply add
The RISC-V integer vector multiply add instructions are not encoded
correctly; the first and second arguments are swapped. For example,
the instruction

VMACCVV V1, V2, V3

encodes to

b620a1d7 or vmacc.vv v3,v1,v2

and not

b61121d7 or vmacc.vv v3,v2,v1

as expected.

This is inconsistent with the argument ordering we use for 3
argument vector instructions, in which the argument order, as given
in the RISC-V specifications, is reversed, and also with the vector
FMA instructions which have the same argument ordering as the vector
integer multiply add instructions in the "The RISC-V Instruction Set
Manual Volume I". For example, in the ISA manual we have the
following instruction definitions

; Integer multiply-add, overwrite addend
vmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]

; FP multiply-accumulate, overwrites addend
vfmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]

It's reasonable to expect that the Go assembler would use the same
argument ordering for both of these instructions. It currently does
not.

We fix the issue by switching the argument ordering for the vector
integer multiply add instructions to match those of the vector FMA
instructions.

Change-Id: Ib98e9999617f991969e5c831734b3bb3324439f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/670335
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-21 07:19:19 -07:00
Damien Neil 0375edd901 os: skip TestOpenFileCreateExclDanglingSymlink when no symlinks
Skip this test on plan9, and any other platform that doesn't
have symlinks.

Fixes #73729

Change-Id: I8052db24ed54c3361530bd4f54c96c9d10c4714c
Reviewed-on: https://go-review.googlesource.com/c/go/+/674697
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Richard Miller <millerresearch@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-05-21 06:56:39 -07:00
Guoqi Chen 7f806c1052 runtime, internal/fuzz: optimize build tag combination on loong64
Change-Id: I971b789beb08e0c6b11169fd5547a8d4ab74fab5
Reviewed-on: https://go-review.googlesource.com/c/go/+/668155
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-05-21 00:07:41 -07:00
Xiaolin Zhao 5b17e2f927 crypto/subtle: optimize function xorBytes using SIMD on loong64
On the Loongson-3A6000-HV and Loongson-3A5000, there has been
a significant improvement in all performance metrics except
for '8Bytes', which has experienced a decline, as follows.

goos: linux
goarch: loong64
pkg: crypto/subtle
cpu: Loongson-3A6000-HV @ 2500.00MHz
                                   |  bench.old   |              bench.new              |
                                   |    sec/op    |   sec/op     vs base                |
XORBytes/8Bytes                       7.282n ± 0%   8.805n ± 0%  +20.91% (p=0.000 n=10)
XORBytes/128Bytes                     14.43n ± 0%   10.01n ± 0%  -30.63% (p=0.000 n=10)
XORBytes/2048Bytes                   110.60n ± 0%   46.57n ± 0%  -57.89% (p=0.000 n=10)
XORBytes/8192Bytes                    418.7n ± 0%   161.8n ± 0%  -61.36% (p=0.000 n=10)
XORBytes/32768Bytes                   3.220µ ± 0%   1.673µ ± 0%  -48.04% (p=0.000 n=10)
XORBytesAlignment/8Bytes0Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes1Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes2Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes3Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes4Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes5Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes6Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/8Bytes7Offset       7.621n ± 0%   9.305n ± 0%  +22.10% (p=0.000 n=10)
XORBytesAlignment/128Bytes0Offset    14.430n ± 0%   9.973n ± 0%  -30.88% (p=0.000 n=10)
XORBytesAlignment/128Bytes1Offset     20.83n ± 0%   11.03n ± 0%  -47.05% (p=0.000 n=10)
XORBytesAlignment/128Bytes2Offset     20.83n ± 0%   11.03n ± 0%  -47.07% (p=0.000 n=10)
XORBytesAlignment/128Bytes3Offset     20.83n ± 0%   11.03n ± 0%  -47.07% (p=0.000 n=10)
XORBytesAlignment/128Bytes4Offset     20.83n ± 0%   11.03n ± 0%  -47.05% (p=0.000 n=10)
XORBytesAlignment/128Bytes5Offset     20.83n ± 0%   11.03n ± 0%  -47.05% (p=0.000 n=10)
XORBytesAlignment/128Bytes6Offset     20.83n ± 0%   11.03n ± 0%  -47.05% (p=0.000 n=10)
XORBytesAlignment/128Bytes7Offset     20.83n ± 0%   11.03n ± 0%  -47.05% (p=0.000 n=10)
XORBytesAlignment/2048Bytes0Offset   110.60n ± 0%   46.82n ± 0%  -57.67% (p=0.000 n=10)
XORBytesAlignment/2048Bytes1Offset    234.4n ± 0%   109.3n ± 0%  -53.37% (p=0.000 n=10)
XORBytesAlignment/2048Bytes2Offset    234.4n ± 0%   109.3n ± 0%  -53.37% (p=0.000 n=10)
XORBytesAlignment/2048Bytes3Offset    234.4n ± 0%   109.3n ± 0%  -53.37% (p=0.000 n=10)
XORBytesAlignment/2048Bytes4Offset    234.5n ± 0%   109.3n ± 0%  -53.39% (p=0.000 n=10)
XORBytesAlignment/2048Bytes5Offset    234.4n ± 0%   109.3n ± 0%  -53.37% (p=0.000 n=10)
XORBytesAlignment/2048Bytes6Offset    234.4n ± 0%   109.3n ± 0%  -53.37% (p=0.000 n=10)
XORBytesAlignment/2048Bytes7Offset    234.5n ± 0%   109.3n ± 0%  -53.39% (p=0.000 n=10)
geomean                               39.42n        26.00n       -34.05%

goos: linux
goarch: loong64
pkg: crypto/subtle
cpu: Loongson-3A5000 @ 2500.00MHz
                                   |  bench.old   |              bench.new              |
                                   |    sec/op    |   sec/op     vs base                |
XORBytes/8Bytes                       11.21n ± 0%   12.41n ± 1%  +10.70% (p=0.000 n=10)
XORBytes/128Bytes                     18.22n ± 0%   13.61n ± 0%  -25.30% (p=0.000 n=10)
XORBytes/2048Bytes                   162.20n ± 0%   48.46n ± 0%  -70.13% (p=0.000 n=10)
XORBytes/8192Bytes                    629.8n ± 0%   163.8n ± 0%  -73.99% (p=0.000 n=10)
XORBytes/32768Bytes                  4731.0n ± 1%   632.8n ± 0%  -86.63% (p=0.000 n=10)
XORBytesAlignment/8Bytes0Offset       11.61n ± 1%   12.42n ± 0%   +6.98% (p=0.000 n=10)
XORBytesAlignment/8Bytes1Offset       11.61n ± 0%   12.41n ± 0%   +6.89% (p=0.000 n=10)
XORBytesAlignment/8Bytes2Offset       11.61n ± 0%   12.42n ± 0%   +6.98% (p=0.000 n=10)
XORBytesAlignment/8Bytes3Offset       11.61n ± 0%   12.41n ± 0%   +6.89% (p=0.000 n=10)
XORBytesAlignment/8Bytes4Offset       11.61n ± 0%   12.42n ± 0%   +6.98% (p=0.000 n=10)
XORBytesAlignment/8Bytes5Offset       11.61n ± 0%   12.41n ± 0%   +6.89% (p=0.000 n=10)
XORBytesAlignment/8Bytes6Offset       11.61n ± 0%   12.41n ± 1%   +6.89% (p=0.000 n=10)
XORBytesAlignment/8Bytes7Offset       11.61n ± 0%   12.42n ± 0%   +6.98% (p=0.000 n=10)
XORBytesAlignment/128Bytes0Offset     17.82n ± 0%   13.62n ± 0%  -23.57% (p=0.000 n=10)
XORBytesAlignment/128Bytes1Offset     26.62n ± 0%   18.43n ± 0%  -30.78% (p=0.000 n=10)
XORBytesAlignment/128Bytes2Offset     26.64n ± 0%   18.43n ± 0%  -30.85% (p=0.000 n=10)
XORBytesAlignment/128Bytes3Offset     26.65n ± 0%   18.42n ± 0%  -30.90% (p=0.000 n=10)
XORBytesAlignment/128Bytes4Offset     26.65n ± 0%   18.42n ± 0%  -30.88% (p=0.000 n=10)
XORBytesAlignment/128Bytes5Offset     26.62n ± 0%   18.42n ± 0%  -30.82% (p=0.000 n=10)
XORBytesAlignment/128Bytes6Offset     26.63n ± 0%   18.42n ± 0%  -30.84% (p=0.000 n=10)
XORBytesAlignment/128Bytes7Offset     26.64n ± 0%   18.42n ± 0%  -30.86% (p=0.000 n=10)
XORBytesAlignment/2048Bytes0Offset   161.80n ± 0%   48.25n ± 0%  -70.18% (p=0.000 n=10)
XORBytesAlignment/2048Bytes1Offset    354.6n ± 0%   189.2n ± 0%  -46.64% (p=0.000 n=10)
XORBytesAlignment/2048Bytes2Offset    354.6n ± 0%   189.2n ± 0%  -46.64% (p=0.000 n=10)
XORBytesAlignment/2048Bytes3Offset    354.7n ± 0%   189.2n ± 0%  -46.66% (p=0.000 n=10)
XORBytesAlignment/2048Bytes4Offset    354.7n ± 0%   189.2n ± 1%  -46.66% (p=0.000 n=10)
XORBytesAlignment/2048Bytes5Offset    354.7n ± 0%   189.2n ± 0%  -46.66% (p=0.000 n=10)
XORBytesAlignment/2048Bytes6Offset    354.7n ± 0%   189.2n ± 0%  -46.66% (p=0.000 n=10)
XORBytesAlignment/2048Bytes7Offset    354.8n ± 0%   189.2n ± 0%  -46.67% (p=0.000 n=10)
geomean                               56.46n        36.46n       -35.42%

Change-Id: I66e150b132517e9ff4827abf796812ffe608c052
Reviewed-on: https://go-review.googlesource.com/c/go/+/673355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-20 20:28:34 -07:00
limeidan a2eb643cbf cmd/dist, internal/platform: enable internal linking feature and test on loong64
Change-Id: Ifea676e9eb44281465832fc4050f6286e50f4543
Reviewed-on: https://go-review.googlesource.com/c/go/+/533717
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
2025-05-20 20:28:18 -07:00
Xiaolin Zhao d37a1bdd48 cmd/compile: fix the implementation of NORconst on loong64
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.

Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 20:24:09 -07:00
thepudds 74304cda29 cmd/compile/internal/escape: improve order of work to speed up analyzing many locations
For the package github.com/microsoft/typescript-go/internal/checker,
compilation currently spends most of its time in escape analysis.

Here, we re-order work to be more efficient when analyzing many
locations, and delay visiting some locations to prioritize locations
that might be more likely to reach a terminal point of reaching the
heap and possibly reduce the count of intermediate states for each location.

Action graph reported build times show roughly a 5x improvement for
compilation of the typescript-go/internal/checker package:

  go1.24.0:      91.792s
  cl-657179-ps1: 17.578s

with timing via:

  go build -a -debug-actiongraph=/tmp/actiongraph-cl-657179-ps1 -v github.com/microsoft/typescript-go/internal/checker

There are some additional adjustments to make here, including we can
consider a follow-on CL I have that parallelizes the operations of the
core loop, but this seems to be a nice win as is, and my understanding
is the desire is to merge this as it stands.

Updates #72815

Change-Id: I1753c5354b495b059f68fb97f3103ee7834f9eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/657179
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 20:11:56 -07:00
khr@golang.org a070533633 reflect: turn off allocation test if instrumentation is on
Help fix the asan builders.

Change-Id: I980f5171519643c3543bdefc6ea46fd0fca17c28
Reviewed-on: https://go-review.googlesource.com/c/go/+/674616
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2025-05-20 17:27:54 -07:00