Commit Graph

62732 Commits

Author SHA1 Message Date
Vasily Leonenko 5b36f61356 internal/bytealg: optimize Index/IndexString/IndexByte/IndexByteString on arm64
Introduce ABIInternal support for Index/IndexString/IndexByte/IndexByteString

goos: linux
goarch: arm64
pkg: bytes
                              │   base.txt    │                new.txt                │
                              │      B/s      │      B/s       vs base                │
IndexByte/10                     1.090Gi ± 0%    1.313Gi ± 0%  +20.51% (p=0.000 n=10)
IndexByte/32                     3.714Gi ± 0%    4.289Gi ± 0%  +15.47% (p=0.000 n=10)
IndexByte/4K                     22.92Gi ± 0%    23.01Gi ± 0%   +0.37% (p=0.000 n=10)
IndexByte/4M                     20.23Gi ± 0%    20.35Gi ± 0%   +0.60% (p=0.000 n=10)
IndexByte/64M                    23.82Gi ± 0%    23.81Gi ± 0%   -0.01% (p=0.002 n=10)
IndexBytePortable/10             788.5Mi ± 0%    788.5Mi ± 0%        ~ (p=0.722 n=10)
IndexBytePortable/32            1002.3Mi ± 0%   1002.3Mi ± 0%        ~ (p=0.137 n=10)
IndexBytePortable/4K             1.111Gi ± 0%    1.111Gi ± 0%        ~ (p=0.692 n=10)
IndexBytePortable/4M             1.116Gi ± 0%    1.116Gi ± 0%        ~ (p=0.158 n=10)
IndexBytePortable/64M            1.116Gi ± 0%    1.116Gi ± 0%   -0.01% (p=0.000 n=10)
IndexRune/10                     352.1Mi ± 0%    445.0Mi ± 0%  +26.38% (p=0.000 n=10)
IndexRune/32                     1.101Gi ± 0%    1.391Gi ± 0%  +26.43% (p=0.000 n=10)
IndexRune/4K                     21.07Gi ± 0%    21.25Gi ± 0%   +0.82% (p=0.000 n=10)
IndexRune/4M                     23.81Gi ± 0%    23.81Gi ± 0%        ~ (p=0.218 n=10)
IndexRune/64M                    23.81Gi ± 0%    23.81Gi ± 0%        ~ (p=0.271 n=10)
IndexRuneASCII/10                1.038Gi ± 0%    1.190Gi ± 1%  +14.63% (p=0.000 n=10)
IndexRuneASCII/32                3.643Gi ± 2%    4.203Gi ± 0%  +15.38% (p=0.000 n=10)
IndexRuneASCII/4K                22.90Gi ± 0%    22.98Gi ± 0%   +0.34% (p=0.000 n=10)
IndexRuneASCII/4M                23.81Gi ± 0%    23.81Gi ± 0%        ~ (p=0.108 n=10)
IndexRuneASCII/64M               23.82Gi ± 0%    23.81Gi ± 0%        ~ (p=0.105 n=10)
IndexRuneUnicode/Latin/10        404.4Mi ± 0%    493.7Mi ± 0%  +22.10% (p=0.000 n=10)
IndexRuneUnicode/Latin/32        1.261Gi ± 0%    1.543Gi ± 0%  +22.31% (p=0.000 n=10)
IndexRuneUnicode/Latin/4K        6.966Gi ± 0%    8.115Gi ± 0%  +16.50% (p=0.000 n=10)
IndexRuneUnicode/Latin/4M        6.599Gi ± 0%    7.576Gi ± 0%  +14.80% (p=0.000 n=10)
IndexRuneUnicode/Latin/64M       6.297Gi ± 0%    7.070Gi ± 2%  +12.28% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/10     385.9Mi ± 0%    440.1Mi ± 0%  +14.03% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/32     1.206Gi ± 0%    1.375Gi ± 0%  +14.05% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4K     2.468Gi ± 0%    2.921Gi ± 0%  +18.37% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/4M     2.386Gi ± 0%    2.845Gi ± 0%  +19.23% (p=0.000 n=10)
IndexRuneUnicode/Cyrillic/64M    2.280Gi ± 0%    2.717Gi ± 0%  +19.14% (p=0.000 n=10)
IndexRuneUnicode/Han/10          307.1Mi ± 0%    331.5Mi ± 0%   +7.94% (p=0.000 n=10)
IndexRuneUnicode/Han/32          982.2Mi ± 0%   1060.2Mi ± 0%   +7.94% (p=0.000 n=10)
IndexRuneUnicode/Han/4K          4.986Gi ± 0%    5.957Gi ± 0%  +19.48% (p=0.000 n=10)
IndexRuneUnicode/Han/4M          3.822Gi ± 0%    4.198Gi ± 0%   +9.83% (p=0.000 n=10)
IndexRuneUnicode/Han/64M         3.765Gi ± 0%    4.140Gi ± 0%   +9.96% (p=0.000 n=10)
Index/10                         634.6Mi ± 0%    635.2Mi ± 0%   +0.09% (p=0.000 n=10)
Index/32                         375.3Mi ± 0%    385.1Mi ± 0%   +2.63% (p=0.000 n=10)
Index/4K                         754.8Mi ± 0%    755.2Mi ± 0%   +0.04% (p=0.001 n=10)
Index/4M                         746.5Mi ± 0%    746.3Mi ± 0%   -0.03% (p=0.000 n=10)
Index/64M                        746.5Mi ± 0%    746.3Mi ± 0%   -0.03% (p=0.000 n=10)
IndexEasy/10                     714.6Mi ± 0%    714.6Mi ± 0%   +0.00% (p=0.001 n=10)
IndexEasy/32                     1.221Gi ± 0%    1.524Gi ± 0%  +24.81% (p=0.000 n=10)
IndexEasy/4K                     21.06Gi ± 0%    21.47Gi ± 0%   +1.91% (p=0.000 n=10)
IndexEasy/4M                     20.23Gi ± 0%    20.24Gi ± 0%        ~ (p=0.684 n=10)
IndexEasy/64M                    13.07Gi ± 0%    12.58Gi ± 4%   -3.75% (p=0.000 n=10)
IndexHard1                       1.114Gi ± 0%    1.114Gi ± 0%        ~ (p=0.193 n=10)
IndexHard2                       1.111Gi ± 0%    1.112Gi ± 0%   +0.04% (p=0.001 n=10)
IndexHard3                       1.086Gi ± 0%    1.081Gi ± 0%   -0.37% (p=0.000 n=10)
IndexHard4                       607.9Mi ± 0%    607.9Mi ± 0%        ~ (p=0.136 n=10)
geomean                          2.536Gi         2.720Gi        +7.26%

Change-Id: I1fc246783ebb215882d7144d05dbe2433dc66751
Reviewed-on: https://go-review.googlesource.com/c/go/+/662415
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-04-03 11:25:06 -07:00
Vasily Leonenko f2e9076764 internal/bytealg: optimize Count/CountString on arm64
Introduce ABIInternal support for Count/CountString
Move <32 size block from function end to beginning as fastpath

goos: linux
goarch: arm64
pkg: strings
                   │   base.txt   │               new.txt                │
                   │     B/s      │     B/s       vs base                │
CountByte/10         672.5Mi ± 0%   692.9Mi ± 0%   +3.04% (p=0.000 n=10)
CountByte/32         3.592Gi ± 0%   3.970Gi ± 0%  +10.53% (p=0.000 n=10)
CountByte/4096       16.63Gi ± 0%   16.73Gi ± 0%   +0.64% (p=0.000 n=10)
CountByte/4194304    14.97Gi ± 2%   15.02Gi ± 1%        ~ (p=0.190 n=10)
CountByte/67108864   12.50Gi ± 0%   12.50Gi ± 0%        ~ (p=0.853 n=10)
geomean              5.931Gi        6.099Gi        +2.83%

Change-Id: I5af1be2b117d9fb8d570739637499923de62251c
Reviewed-on: https://go-review.googlesource.com/c/go/+/662395
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Commit-Queue: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-03 09:21:11 -07:00
0x2b3bfa0 9326d9d012 make.bat: fix GOROOT_BOOTSTRAP detection
Due to a flaw in the %GOROOT_BOOTSTRAP% detection logic, the last Go
executable found by `where go` was taking precedence over the first one.

In batch scripts, environment variable expansion happens when each line
of the script is read, not when it is executed. Thus, the check in the
loop for GOROOT_BOOTSTRAP being unset would always be true, even when
the variable had been set in a previous loop iteration.

See SET /? for more information.

Change-Id: I15ddcbe771a902acb47a1f07ba7f4cb8a311e0dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/653535
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-03 08:45:30 -07:00
Alan Donovan fd8f6cec21 api: move go1.25 to next/70250
My CL 645115 added the new entries in the wrong place,
prematurely creating the go1.25 file.

Also, add the missing release note.

Change-Id: Ib5b5ccfb42757a9ea9dc93e33b3e3ed8e8bd7d3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662615
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2025-04-03 08:07:11 -07:00
Joel Sing 06f82af183 cmd/internal/obj/arm64: return a bit shift from movcon
Return the shift in bits from movcon, rather than returning an index.
This allows a number of multiplications to be removed, making the code
more readable. Scale down to an index only when encoding.

Change-Id: I1be91eb526ad95d389e2f8ce97212311551790df
Reviewed-on: https://go-review.googlesource.com/c/go/+/650939
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-03 06:00:28 -07:00
Joel Sing 4464a7d94f cmd/internal/obj/arm64: deduplicate con32class
Teach conclass how to handle 32 bit values and deduplicate the code
between con32class and conclass.

Change-Id: I9c5eea31d443fd4c2ce700c6ea21e1d0bef665b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/650938
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
2025-04-03 05:20:09 -07:00
Joel Sing c524db9ca8 cmd/internal/obj/arm64: simplify conclass
Reduce repetition by pulling some common conversions into variables.

Change-Id: I8c1cc806236b5ecdadf90f4507923718fa5de9b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/650937
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-03 04:37:34 -07:00
Michael Pratt 0b31e6d4cc runtime: cleanup M vgetrandom state before dropping P
When an M is destroyed, we put its vgetrandom state back on the shared
list for another M to reuse. This list is simply a slice, so appending
to the slice may allocate. Currently this operation is performed in
mdestroy, after the P is released, meaning allocation is not allowed.

More the cleanup earlier in mdestroy when allocation is still OK.

Also add //go:nowritebarrierrec to mdestroy since it runs without a P,
which would have caught this bug.

Fixes #73141.

Change-Id: I6a6a636c3fbf5c6eec09d07a260e39dbb4d2db12
Reviewed-on: https://go-review.googlesource.com/c/go/+/662455
Reviewed-by: Jason Donenfeld <Jason@zx2c4.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-03 01:25:08 -07:00
Julia Lapenko 13b1261175 cmd/compile/internal/devirtualize: do not select a zero-weight edge as the hottest one
When both a direct call and an interface call appear on the same line,
PGO devirtualization may make a suboptimal decision. In some cases,
the directly called function becomes a candidate for devirtualization
if no other relevant outgoing edges with non-zero weight exist for the
caller's IRNode in the WeightedCG. The edge to this candidate is
considered the hottest. Despite having zero weight, this edge still
causes the interface call to be devirtualized.

This CL prevents devirtualization when the weight of the hottest edge
is 0.

Fixes #72092

Change-Id: I06c0c5e080398d86f832e09244aceaa4aeb98721
Reviewed-on: https://go-review.googlesource.com/c/go/+/655475
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-02 16:39:13 -07:00
Sergey Slukin 116b82354c cmd/compile: changed variable name due to shadowing of package name min
Change-Id: I52e5de04d137238d6f6779edcc662f5c7433c61e
Reviewed-on: https://go-review.googlesource.com/c/go/+/660195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-02 11:25:15 -07:00
khr@golang.org 144d4e5d5f runtime: simplify needzero logic
We always need to zero allocations with pointers in them. So we don't
need some of the mallocs to take a needzero argument.

Change-Id: Ideaa7b738873ba6a93addb5169791b42e2baad7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/660455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-04-02 10:52:28 -07:00
qmuntal 83bbf47863 crypto/tls: use crypto/hkdf
For consistency, prefer crypto/hkdf over crypto/internal/fips140/hkdf.
Both should have the same behavior given the constrained use of HKDF
in TLS.

Change-Id: Ia982b9f7a6ea66537d748eb5ecae1ac1eade68a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/658217
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-04-02 08:50:53 -07:00
Alexander Musman 3033ef0016 cmd/compile: Remove unused 'NoInline' field from CallExpr stucture
Remove the 'NoInline' field from CallExpr stucture, as it's no longer
used after enabling of tail call inlining.

Change-Id: Ief3ada9938589e7a2f181582ef2758ebc4d03aad
Reviewed-on: https://go-review.googlesource.com/c/go/+/655816
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-02 05:55:35 -07:00
Alan Donovan dceb77a336 cmd/vet: add waitgroup analyzer
+ relnote

Fixes #18022

Change-Id: I92d1939e9d9f16824655c6c909a5f58ed9500014
Reviewed-on: https://go-review.googlesource.com/c/go/+/661519
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Alan Donovan <adonovan@google.com>
2025-04-01 15:09:39 -07:00
Alan Donovan 903d7b7862 cmd/vendor: update golang.org/x/tools to v0.31.1-0.20250328151535-a857356d5cc5
Also, sys@v0.31.1.

Updates #18022

Change-Id: I15a6d1979cc1e71d3065bc50f09dc8d3f6c6cdc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/661518
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Commit-Queue: Alan Donovan <adonovan@google.com>
2025-04-01 15:08:03 -07:00
qmuntal 75bf2a8c49 internal/poll: defer IOCP association until first IO operation
Defer the association of the IOCP to the handle until the first
I/O operation is performed.

A handle can only be associated with one IOCP at a time, so this allows
external code to associate the handle with their own IOCP and still be
able to use a FD (through os.NewFile) to pass the handle around
(e.g. to a child process standard input, output, and error) without
having to worry about the IOCP association.

This CL doesn't change any user-visible behavior, as os.NewFile still
initializes the FD as non-pollable.

For #19098.

Change-Id: Id22a49846d4fda3a66ffcc0bc1b48eb39b395dc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/661955
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-01 11:58:06 -07:00
喜欢 7177f24009 log/slog: log and logAttrs initialize ctx at top
In extreme cases (e.g., ctx = nil), it is recommended to initialize the
context only once at the entry point before using log and logAttrs.

Change-Id: Ib191963f52183406d7fcd5104b60fea1a9e1bc80
GitHub-Last-Rev: e1719b9539
GitHub-Pull-Request: golang/go#73066
Reviewed-on: https://go-review.googlesource.com/c/go/+/661255
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-01 06:38:57 -07:00
Joel Sing 4c1b8ca98c cmd/internal/obj/riscv: add support for vector fixed-point arithmetic instructions
Add support for vector fixed-point arithmetic instructions to the
RISC-V assembler. This includes single width saturating addition
and subtraction, averaging addition and subtraction and scaling
shift instructions.

Change-Id: I9aa27e9565ad016ba5bb2b479e1ba70db24e4ff5
Reviewed-on: https://go-review.googlesource.com/c/go/+/646776
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-01 05:31:22 -07:00
Russ Cox 4c32b1cc75 runtime: fix plan9 monotonic time, crypto randomness
Open /dev/bintime at process start on Plan 9,
marked close-on-exec, hold it open for the duration of the
process, and use it for obtaining time.

The change to using /dev/bintime also sets up for an upcoming
Plan 9 change to add monotonic time to that file. If the monotonic
field is available, then nanotime1 and time.now use that field.
Otherwise they fall back to using Unix nanoseconds as "monotonic",
as they always have.

Before this CL, monotonic time went backward any time
aux/timesync decided to adjust the system's time-of-day backward.

Also use /dev/random for randomness (once at startup).
Before this CL, there was no real randomness in the runtime
on Plan 9 (the crypto/rand package still had some). Now there will be.

Change-Id: I0c20ae79d3d96eff1a5f839a56cec5c4bc517e61
Reviewed-on: https://go-review.googlesource.com/c/go/+/656755
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Bypass: Russ Cox <rsc@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-03-31 17:06:53 -07:00
Damien Neil 6d418096b2 os: avoid symlink races in RemoveAll on Windows
Make the openat-using version of RemoveAll use the appropriate
Windows equivalent, via new portable (but internal) functions
added for os.Root.

We could reimplement everything in terms of os.Root,
but this is a bit simpler and keeps the existing code structure.

Fixes #52745

Change-Id: I0eba0286398b351f2ee9abaa60e1675173988787
Reviewed-on: https://go-review.googlesource.com/c/go/+/661575
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-31 15:36:10 -07:00
Damien Neil c6a1dc4729 cmd/link: close file in tempdir so Windows can delete it
Fixes #73098

Change-Id: I9f5570903071b15df9e4f8a1820414f305db9d35
Reviewed-on: https://go-review.googlesource.com/c/go/+/661915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-31 15:30:17 -07:00
qmuntal b9cbb65384 os,internal/poll: support I/O on overlapped handles not added to the poller
Calling syscall.ReadFile and syscall.WriteFile on overlapped handles
always need to be passed a valid *syscall.Overlapped structure, even if
the handle is not added to a IOCP (like the Go runtime poller). Else,
the syscall will fail with ERROR_INVALID_PARAMETER.

We also need to handle ERROR_IO_PENDING errors when the overlapped
handle is not added to the poller, in which case we need to block until
the operation completes.

Previous CLs already added support for overlapped handles to the poller,
mostly to keep track of the file offset independently of the file
pointer (which is not supported for overlapped handles).

Fixed #15388.
Updates #19098.

Change-Id: I2103ab892a37d0e326752ae8c2771a43c13ba42e
Reviewed-on: https://go-review.googlesource.com/c/go/+/661795
Auto-Submit: Quim Muntal <quimmuntal@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
2025-03-31 12:01:49 -07:00
Mateusz Poliwczak eec3745bd7 cmd/compile/internal/ssa: replace uses of interface{} with Sym/Aux
Change-Id: I0a3ce2e823697eee5bb5e7d5ea0ef025132c0689
Reviewed-on: https://go-review.googlesource.com/c/go/+/661655
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-31 08:20:16 -07:00
thepudds bfc209518e internal/runtime/maps: speed up small map lookups ~1.7x for unpredictable keys
On master, lookups on small Swiss Table maps (<= 8 elements) for
non-specialized key types are seemingly a performance regression
compared to the Go 1.23 map implementation (reported in #70849).
Currently, a linear scan is used for gets in these cases.

This CL changes (*Map).getWithKeySmall to instead use the SIMD or SWAR
match on the control bytes to then jump to candidate matching slots,
with sample results below for a 16-byte key. This especially helps the
hit case when the key is unpredictable, which previously had to scan an
unpredictable number of control bytes to find a candidate slot when the
key is unpredictable.

Separately, other CLs in this stack modify the main Swiss Table
benchmarks to randomize lookup key order (vs. previously most of the
benchmarks had a repeating lookup key ordering, which likely is
predictable until the map is too big). We have sample results for the
randomized key order benchmarks followed by results from the older
benchmarks.

The first table below is with randomized key order. For hits, the older
results get slower as there are more elements. With this CL, we see hits
for unpredictable key ordering (sizes 2-8) get a ~1.7x speedup from
~25ns to ~14ns, with a now consistent lookup time for the different
sizes. (The 1 element size map has a predictable key ordering because
there is only one key, and that reports a modest ~0.5ns or ~3%
performance penalty). Misses for unpredictable key order get a ~1.3x
speedup, from ~13ns to ~10ns, with similar results for the 1 element
size.

                                                   │ no-fix-new-bmarks  │ fix-with-new-bmarks   │
                                                   │     sec/op         │  sec/op       vs base │
MapSmallAccessHit/Key=smallType/Elem=int32/len=1-4        13.26n ±  0%   13.64n ±  0%   +2.90% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=2-4        19.47n ±  0%   13.62n ±  0%  -30.05% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=3-4        22.23n ±  0%   13.64n ±  0%  -38.68% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=4-4        23.98n ±  0%   13.64n ±  0%  -43.11% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=5-4        25.02n ±  0%   13.67n ±  0%  -45.35% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=6-4        25.77n ±  1%   13.68n ±  2%  -46.89% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=7-4        26.38n ±  0%   13.64n ±  0%  -48.28% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=8-4        26.31n ±  0%   13.71n ± 21%  -47.90% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=1-4      13.055n ±  0%   9.815n ±  0%  -24.82% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=2-4      13.070n ±  0%   9.813n ±  0%  -24.92% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=3-4      13.060n ±  0%   9.819n ±  0%  -24.82% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=4-4      13.075n ±  0%   9.816n ±  0%  -24.92% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=5-4      13.060n ±  0%   9.826n ±  0%  -24.76% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=6-4      13.095n ± 19%   9.834n ± 31%  -24.90% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=7-4      13.075n ± 19%   9.822n ± 27%  -24.88% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=8-4       13.11n ± 16%   12.14n ± 19%   -7.43% (p=0.000 n=20)

The next table uses the original benchmarks from just before this CL
stack (i.e., without shuffling lookup keys).

With this CL, we see improvement that is directionally similar to the
above results but not as large, presumably because the branches in the
linear scan are fairly predictable with predictable keys. (The numbers
here also include the time from a mod in the benchmark code, which
seemed to take around ~1/3 of CPU time based on spot checking a couple
of examples, vs. the modified benchmarks shown above have removed that
mod).

                                                  │ master-8c3e391573 │   just-fix-with-old-bmarks       │
                                                  │      sec/op       │    sec/op     vs base            │
MapSmallAccessHit/Key=smallType/Elem=int32/len=1-4      20.85n ±  0%   21.69n ±  0%   +4.03% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=2-4      21.22n ±  0%   21.70n ±  0%   +2.24% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=3-4      21.73n ±  0%   21.71n ±  0%        ~ (p=0.158 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=4-4      22.06n ±  0%   21.71n ±  0%   -1.56% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=5-4      22.41n ±  0%   21.73n ±  0%   -3.01% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=6-4      22.71n ±  0%   21.72n ±  0%   -4.38% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=7-4      22.98n ±  0%   21.71n ±  0%   -5.53% (p=0.000 n=20)
MapSmallAccessHit/Key=smallType/Elem=int32/len=8-4      23.20n ±  0%   21.72n ±  0%   -6.36% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=1-4     19.95n ±  0%   17.30n ±  0%  -13.28% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=2-4     19.96n ±  0%   17.31n ±  0%  -13.28% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=3-4     19.95n ±  0%   17.29n ±  0%  -13.33% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=4-4     19.95n ±  0%   17.30n ±  0%  -13.29% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=5-4     19.96n ± 25%   17.32n ±  0%  -13.22% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=6-4     19.99n ± 24%   17.29n ±  0%  -13.51% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=7-4     19.97n ± 20%   17.34n ± 16%  -13.14% (p=0.000 n=20)
MapSmallAccessMiss/Key=smallType/Elem=int32/len=8-4     20.02n ± 11%   17.33n ± 14%  -13.44% (p=0.000 n=20)
geomean                                                 21.02n         19.39n         -7.78%

See #70849 for additional benchmark results, including results for arm64
(which also means without SIMD support).

Updates #54766
Updates #70700
Fixes #70849

Change-Id: Ic2361bb6fc15b4436d1d1d5be7e4712e547f611b
Reviewed-on: https://go-review.googlesource.com/c/go/+/634396
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-30 21:39:07 -07:00
Joel Sing 391dde29a3 cmd/internal/obj/arm64: factor out constant classification code
This will allow for further improvements and deduplication.

Change-Id: I9374fc2d16168ced06f3fcc9e558a9c85e24fd01
Reviewed-on: https://go-review.googlesource.com/c/go/+/650936
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-29 06:30:04 -07:00
Joel Sing 535e0daefd cmd/internal/obj/riscv: add support for vector integer arithmetic instructions
Add support for vector integer arithmetic instructions to the RISC-V
assembler. This includes vector addition, subtraction, integer
extension, add-with-carry, subtract-with-borrow, bitwise logical
operations, comparison, min/max, integer division and multiplication
instructions.

Change-Id: I8c191ef8e31291e13743732903e4f12356133a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/646775
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
2025-03-29 05:54:51 -07:00
Cherry Mui 5fb9e5dc19 cmd/link: handle Mach-O X86_64_RELOC_SUBTRACTOR in internal linking
With recent LLVM toolchain, on macOS/AMD64, the race detector syso
file built from it contains X86_64_RELOC_SUBTRACTOR relocations,
which the Go linker currently doesn't handle in internal linking
mode. To ensure internal linking mode continue to work with the
race detector syso, this CL adds support of X86_64_RELOC_SUBTRACTOR
relocations.

X86_64_RELOC_SUBTRACTOR is actually a pair of relocations that
resolves to the difference between two symbol addresses (each
relocation specifies a symbol). For the cases we care (the race
syso), the symbol being subtracted out is always in the current
section, so we can just convert it to a PC-relative relocation,
with the addend adjusted. If later we need the more general form,
we can introduce a new mechanism (say, objabi.R_DIFF) that works
as a pair of relocations like the Mach-O one.

As we expect the pair of relocations be consecutive, don't reorder
(sort) relocation records when loading Mach-O objects.

Change-Id: I757456b07270fb4b2a41fd0fef67a2b39dd6b238
Reviewed-on: https://go-review.googlesource.com/c/go/+/660715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 14:35:41 -07:00
qmuntal b9934d855c internal/poll: honor ERROR_OPERATION_ABORTED if pipe is not closed
FD.Read converts a syscall.ERROR_OPERATION_ABORTED error to
ErrFileClosing. It does that in case the pipe operation was aborted by
a CancelIoEx call in FD.Close.

It doesn't take into account that the operation might have been
aborted by a CancelIoEx call in external code. In that case, the
operation should return the error as is.

Change-Id: I75dcf0edaace8b57dc47b398ea591ca9f116112b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 14:14:48 -07:00
Michael Anthony Knyszek 5ec76ae5aa weak: clarify Pointer equality semantics
The docs currently are imprecise about comparisons. This could lead
users to believe that objects of the same type, allocated at the same
address, could produce weak pointers that are equal to
previously-created weak pointers. This is not the case. Weak pointers
map to objects, not addresses.

Update the documentation to state precisely that if two pointers do not
compare equal, then two weak pointers created from those two pointers
are guaranteed not to compare equal. Since a future pointer pointing to
the same address is not comparable with a pointer produced *before* an
object at that address has been reclaimed, this is sufficient to explain
that weak pointers map 1:1 with object offsets, not addresses.

(An object slot cannot be reused unless that slot is unreachable, so
by construction, there's never an opportunity to compare an "old" and
"new" pointer unless one uses unsafe tricks that violate the
unsafe.Pointer rules.)

Fixes #71381.

Change-Id: I5509fd433cde013926d725694d480c697a8bc911
Reviewed-on: https://go-review.googlesource.com/c/go/+/643935
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2025-03-28 13:50:18 -07:00
Cherry Mui 8f6c083d7b cmd/link: choose one with larger size for duplicated BSS symbols
When two packages declare a variable with the same name (with
linkname at least on one side), the linker will choose one as the
actual definition of the symbol if one has content (i.e. a DATA
symbol) and the other does not (i.e. a BSS symbol). When both have
content, it is redefinition error. When neither has content,
currently the choice is sort of arbitrary (depending on symbol
loading order, etc. which are subject to change).

One use case for that is that one wants to reference a symbol
defined in another package, and the reference side just wants to
see some of the fields, so it may be declared with a smaller type.
In this case, we want to choose the one with the larger size as
the true definition. Otherwise the code accessing the larger
sized one may read/write out of bounds, corrupting the next
variable. This CL makes the linker do so.

Fixes #72032.

Change-Id: I160aa9e0234702066cb8f141c186eaa89d0fcfed
Reviewed-on: https://go-review.googlesource.com/c/go/+/660696
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@golang.org>
2025-03-28 12:00:23 -07:00
Damien Neil 26fdb07d4c os: add Root.Symlink
For #67002

Change-Id: Ia1637b61eae49e97e1d07f058ad2390e74cd3403
Reviewed-on: https://go-review.googlesource.com/c/go/+/660635
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
Auto-Submit: Damien Neil <dneil@google.com>
2025-03-28 11:02:40 -07:00
qmuntal 656b5b3abe internal/poll: don't skip empty writes on Windows
Empty writes might be important for some protocols. Let Windows decide
what do with them rather than skipping them on our side. This is inline
with the behavior of other platforms.

While here, refactor the Read/Write/Pwrite methods to reduce one
indentation level and make the code easier to read.

Fixes #73084.

Change-Id: Ic5393358e237d53b8be6097cd7359ac0ff205309
Reviewed-on: https://go-review.googlesource.com/c/go/+/661435
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-28 09:44:35 -07:00
Joel Sing e6c2e12c63 cmd/compile/internal/ssa: optimise more branches with zero on riscv64
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:

   li      t2, 0
   bltu    t2, t1, 0x174b4

This is equivalent to checking if the bounds is not equal to zero. With
this change:

   bnez    t1, 0x174c0

This removes more than 500 instructions from the Go binary on riscv64.

Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 01:27:22 -07:00
Damien Neil cfc784a152 os: avoid panic in Root when symlink references the root
We would panic when opening a symlink ending in ..,
where the symlink references the root itself.

Fixes #73081

Change-Id: I7dc3f041ca79df7942feec58c197fde6881ecae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/661416
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-27 22:25:37 -07:00
Junyang Shao b17a99d6fc cmd/compile: update GOSSAFUNC doc for printing CFG
Updates #30074

Change-Id: I160124afb65849c624a225d384c35313723f9f30
Reviewed-on: https://go-review.googlesource.com/c/go/+/661415
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-27 15:39:38 -07:00
Keith Randall a645bc5eb9 maps: implement faster clone
│     base     │             experiment              │
            │    sec/op    │   sec/op     vs base                │
MapClone-24   66.802m ± 7%   3.348m ± 2%  -94.99% (p=0.000 n=10)

Fixes #70836

Change-Id: I9e192b1ee82e18f5580ff18918307042a337fdcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/660175
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-27 14:21:20 -07:00
Mark Freeman 6722c008c1 cmd/compile: rename some test packages in codegen
All other files here use the codegen package.

Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-27 13:54:37 -07:00
Junyang Shao 4d6722a8fd cmd/compile: match more patterns for shortcircuit
This CL tries to generalize the pattern matching of certain
shortcircuit-able CFGs a bit more:
For a shortcircuit-able CFG:
p   q
 \ /
  b
 / \
t   u
Where the constant branch is t, and b has multiple phi values
other than the control phi.

For the non-constant branch target u, we try to match the
"diamond" shape CFG:
p   q
 \ /
  b
 / \
t   u
 \ /
  m

or

p   q
 \ /
  b
  |\
  | u
  |/
  m

Instead of matching u as a single block, we know try to
generalize it as a subgraph that satisfy condition:
it's a DAG that has a single entry point u, and has a
path to m.

compilebench stats:
                         │   old.txt   │              new.txt               │
                         │   sec/op    │   sec/op     vs base               │
Template                   109.4m ± 3%   109.8m ± 3%       ~ (p=0.796 n=10)
Unicode                    94.23m ± 1%   93.85m ± 1%       ~ (p=0.631 n=10)
GoTypes                    538.2m ± 1%   538.8m ± 1%       ~ (p=0.912 n=10)
Compiler                   90.21m ± 1%   90.02m ± 1%       ~ (p=0.436 n=10)
SSA                         3.318 ± 1%    3.323 ± 1%       ~ (p=1.000 n=10)
Flate                      69.38m ± 1%   69.57m ± 2%       ~ (p=0.529 n=10)
GoParser                   128.5m ± 1%   127.4m ± 1%       ~ (p=0.075 n=10)
Reflect                    267.4m ± 1%   267.2m ± 2%       ~ (p=0.739 n=10)
Tar                        127.7m ± 2%   126.4m ± 1%       ~ (p=0.353 n=10)
XML                        149.5m ± 1%   149.6m ± 2%       ~ (p=0.684 n=10)
LinkCompiler               390.0m ± 1%   388.4m ± 2%       ~ (p=0.353 n=10)
ExternalLinkCompiler        1.296 ± 0%    1.296 ± 1%       ~ (p=0.971 n=10)
LinkWithoutDebugCompiler   226.3m ± 1%   225.5m ± 1%       ~ (p=0.393 n=10)
StdCmd                      13.26 ± 0%    13.25 ± 1%       ~ (p=0.529 n=10)
geomean                    319.3m        318.8m       -0.17%

                         │   old.txt   │               new.txt               │
                         │ user-sec/op │ user-sec/op   vs base               │
Template                   293.1m ± 3%   291.4m ± 11%       ~ (p=0.436 n=10)
Unicode                    91.09m ± 5%   87.61m ±  7%       ~ (p=0.165 n=10)
GoTypes                     1.932 ± 3%    1.926 ±  3%       ~ (p=0.739 n=10)
Compiler                   125.8m ± 3%   121.5m ± 10%       ~ (p=0.481 n=10)
SSA                         18.93 ± 3%    18.89 ±  1%       ~ (p=0.684 n=10)
Flate                      158.5m ± 5%   160.0m ±  7%       ~ (p=0.971 n=10)
GoParser                   316.0m ± 9%   327.4m ±  7%       ~ (p=0.052 n=10)
Reflect                    845.6m ± 6%   861.6m ±  3%       ~ (p=0.579 n=10)
Tar                        358.1m ± 5%   348.5m ±  4%       ~ (p=0.089 n=10)
XML                        382.4m ± 4%   392.2m ±  3%       ~ (p=0.143 n=10)
LinkCompiler               609.1m ± 4%   627.9m ±  3%       ~ (p=0.123 n=10)
ExternalLinkCompiler        1.336 ± 2%    1.343 ±  4%       ~ (p=0.565 n=10)
LinkWithoutDebugCompiler   248.7m ± 3%   248.0m ±  1%       ~ (p=0.853 n=10)
geomean                    506.4m        506.8m        +0.08%

          │   old.txt    │               new.txt               │
          │  text-bytes  │  text-bytes   vs base               │
HelloSize   965.8Ki ± 0%   965.0Ki ± 0%  -0.08% (p=0.000 n=10)
CmdGoSize   12.30Mi ± 0%   12.29Mi ± 0%  -0.08% (p=0.000 n=10)
geomean     3.406Mi        3.403Mi       -0.08%

          │   old.txt    │                new.txt                │
          │  data-bytes  │  data-bytes   vs base                 │
HelloSize   15.08Ki ± 0%   15.08Ki ± 0%       ~ (p=1.000 n=10) ¹
CmdGoSize   408.5Ki ± 0%   408.5Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean     78.49Ki        78.49Ki       +0.00%
¹ all samples are equal

          │   old.txt    │                new.txt                │
          │  bss-bytes   │  bss-bytes    vs base                 │
HelloSize   142.0Ki ± 0%   142.0Ki ± 0%       ~ (p=1.000 n=10) ¹
CmdGoSize   206.4Ki ± 0%   206.4Ki ± 0%       ~ (p=1.000 n=10) ¹
geomean     171.2Ki        171.2Ki       +0.00%
¹ all samples are equal

          │   old.txt    │               new.txt               │
          │  exe-bytes   │  exe-bytes    vs base               │
HelloSize   1.466Mi ± 0%   1.462Mi ± 0%  -0.27% (p=0.000 n=10)
CmdGoSize   18.19Mi ± 0%   18.17Mi ± 0%  -0.10% (p=0.000 n=10)
geomean     5.164Mi        5.154Mi       -0.18%

Fixes #72132

Change-Id: I3d1cb10b6a158c5750adc23c79709d63dbd771f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656255
Auto-Submit: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-27 12:30:03 -07:00
Alan Donovan b3aff930cf go/types: LookupSelection: returns LookupFieldOrMethod as a Selection
Also, rewrite some uses of LookupFieldOrMethod in terms of it.

+ doc, relnote

Fixes #70737

Change-Id: I58a6dd78ee78560d8b6ea2d821381960a72660ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/647196
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-27 12:29:28 -07:00
Mateusz Poliwczak e9242ee812 cmd/compile: remove references to *gc.Node in docs
ssa.Sym is only implemented by *ir.Name or *obj.LSym.

Change-Id: Ia171db618abd8b438fcc2cf402f40f3fe3ec6833
Reviewed-on: https://go-review.googlesource.com/c/go/+/660995
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27 11:25:27 -07:00
qmuntal af53bd2c03 internal/syscall/windows: run go generate
CL 660595 manually edited zsyscall_windows.go, making it be out of sync
with its associated //sys directive. Longtest builders are failing
due to that.

Fixes #73069.

Change-Id: If7256ef4b831423e4925fb6e5656fe3f7ef77fea
Reviewed-on: https://go-review.googlesource.com/c/go/+/661275
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27 08:58:05 -07:00
Joel Sing d88f93f720 cmd/internal/obj/riscv,internal/bytealg: synthesize MIN/MAX/MINU/MAXU instructions
Provide a synthesized version of the MIN/MAX/MINU/MAXU instructions
if they're not natively available. This allows these instructions to
be used in assembly unconditionally.

Use MIN in internal/bytealg.compare.

Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I8a5a3a59f0a9205e136fc3d673b23eaf3ca469f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/653295
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-27 05:52:28 -07:00
Joel Sing d37624881f cmd/internal/obj/riscv: improve constant construction
Attempt to construct large constants that have a consecutive sequence
of ones from a small negative constant, with a logical right and/or
left shift. This allows for a large range of mask like constants to be
constructed with only two or three instructions, avoiding the need to
load from memory.

Change-Id: I35a77fecdd2df0ed3f33b772d518f85119d4ff66
Reviewed-on: https://go-review.googlesource.com/c/go/+/652778
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2025-03-27 04:26:47 -07:00
Guoqi Chen 1763ee199d runtime: optimizing memclrNoHeapPointers implementation using SIMD on loong64
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                        |  bench.old   |            bench.new.256             |
                        |    sec/op    |    sec/op     vs base                |
Memclr/5                   3.204n ± 0%    2.804n ± 0%  -12.48% (p=0.000 n=10)
Memclr/16                  3.204n ± 0%    3.204n ± 0%        ~ (p=0.465 n=10)
Memclr/64                  5.267n ± 0%    4.005n ± 0%  -23.96% (p=0.000 n=10)
Memclr/256                10.280n ± 0%    5.400n ± 0%  -47.47% (p=0.000 n=10)
Memclr/4096               107.00n ± 1%    30.24n ± 0%  -71.74% (p=0.000 n=10)
Memclr/65536              1675.0n ± 0%    431.1n ± 0%  -74.26% (p=0.000 n=10)
Memclr/1M                  52.61µ ± 0%    32.82µ ± 0%  -37.62% (p=0.000 n=10)
Memclr/4M                  210.3µ ± 0%    131.3µ ± 0%  -37.59% (p=0.000 n=10)
Memclr/8M                  420.3µ ± 0%    262.5µ ± 0%  -37.54% (p=0.000 n=10)
Memclr/16M                 857.4µ ± 1%    542.9µ ± 3%  -36.68% (p=0.000 n=10)
Memclr/64M                 3.658m ± 3%    2.173m ± 0%  -40.59% (p=0.000 n=10)
MemclrUnaligned/0_5        4.264n ± 1%    4.359n ± 0%   +2.23% (p=0.000 n=10)
MemclrUnaligned/0_16       4.595n ± 0%    4.599n ± 0%   +0.10% (p=0.020 n=10)
MemclrUnaligned/0_64       5.356n ± 0%    5.122n ± 0%   -4.37% (p=0.000 n=10)
MemclrUnaligned/0_256     10.370n ± 0%    5.907n ± 1%  -43.03% (p=0.000 n=10)
MemclrUnaligned/0_4096    107.10n ± 0%    37.35n ± 0%  -65.13% (p=0.000 n=10)
MemclrUnaligned/0_65536   1694.0n ± 0%    441.7n ± 0%  -73.93% (p=0.000 n=10)
MemclrUnaligned/1_5        4.272n ± 0%    4.348n ± 0%   +1.76% (p=0.000 n=10)
MemclrUnaligned/1_16       4.593n ± 0%    4.608n ± 0%   +0.33% (p=0.002 n=10)
MemclrUnaligned/1_64       7.610n ± 0%    5.293n ± 0%  -30.45% (p=0.000 n=10)
MemclrUnaligned/1_256     12.230n ± 0%    9.012n ± 0%  -26.31% (p=0.000 n=10)
MemclrUnaligned/1_4096    114.10n ± 0%    39.50n ± 0%  -65.38% (p=0.000 n=10)
MemclrUnaligned/1_65536   1705.0n ± 0%    468.8n ± 0%  -72.50% (p=0.000 n=10)
MemclrUnaligned/4_5        4.283n ± 1%    4.346n ± 0%   +1.48% (p=0.000 n=10)
MemclrUnaligned/4_16       4.599n ± 0%    4.605n ± 0%   +0.12% (p=0.000 n=10)
MemclrUnaligned/4_64       7.572n ± 1%    5.283n ± 0%  -30.24% (p=0.000 n=10)
MemclrUnaligned/4_256     12.215n ± 0%    9.212n ± 0%  -24.58% (p=0.000 n=10)
MemclrUnaligned/4_4096    114.35n ± 0%    39.48n ± 0%  -65.47% (p=0.000 n=10)
MemclrUnaligned/4_65536   1705.0n ± 0%    469.2n ± 0%  -72.48% (p=0.000 n=10)
MemclrUnaligned/7_5        4.296n ± 1%    4.349n ± 0%   +1.22% (p=0.000 n=10)
MemclrUnaligned/7_16       4.601n ± 0%    4.606n ± 0%   +0.11% (p=0.004 n=10)
MemclrUnaligned/7_64       7.609n ± 0%    5.296n ± 1%  -30.39% (p=0.000 n=10)
MemclrUnaligned/7_256     12.200n ± 0%    9.011n ± 0%  -26.14% (p=0.000 n=10)
MemclrUnaligned/7_4096    114.00n ± 0%    39.51n ± 0%  -65.34% (p=0.000 n=10)
MemclrUnaligned/7_65536   1704.0n ± 0%    469.5n ± 0%  -72.45% (p=0.000 n=10)
MemclrUnaligned/0_1M       52.57µ ± 0%    32.83µ ± 0%  -37.54% (p=0.000 n=10)
MemclrUnaligned/0_4M       210.1µ ± 0%    131.3µ ± 0%  -37.53% (p=0.000 n=10)
MemclrUnaligned/0_8M       420.8µ ± 0%    262.5µ ± 0%  -37.62% (p=0.000 n=10)
MemclrUnaligned/0_16M      846.2µ ± 0%    528.4µ ± 0%  -37.56% (p=0.000 n=10)
MemclrUnaligned/0_64M      3.425m ± 1%    2.187m ± 3%  -36.16% (p=0.000 n=10)
MemclrUnaligned/1_1M       52.56µ ± 0%    32.84µ ± 0%  -37.52% (p=0.000 n=10)
MemclrUnaligned/1_4M       210.5µ ± 0%    131.3µ ± 0%  -37.62% (p=0.000 n=10)
MemclrUnaligned/1_8M       420.5µ ± 0%    262.7µ ± 0%  -37.53% (p=0.000 n=10)
MemclrUnaligned/1_16M      845.2µ ± 0%    528.3µ ± 0%  -37.49% (p=0.000 n=10)
MemclrUnaligned/1_64M      3.381m ± 0%    2.243m ± 3%  -33.66% (p=0.000 n=10)
MemclrUnaligned/4_1M       52.56µ ± 0%    32.85µ ± 0%  -37.50% (p=0.000 n=10)
MemclrUnaligned/4_4M       210.1µ ± 0%    131.3µ ± 0%  -37.49% (p=0.000 n=10)
MemclrUnaligned/4_8M       420.0µ ± 0%    262.6µ ± 0%  -37.48% (p=0.000 n=10)
MemclrUnaligned/4_16M      844.8µ ± 0%    528.7µ ± 0%  -37.41% (p=0.000 n=10)
MemclrUnaligned/4_64M      3.382m ± 1%    2.211m ± 4%  -34.63% (p=0.000 n=10)
MemclrUnaligned/7_1M       52.59µ ± 0%    32.84µ ± 0%  -37.56% (p=0.000 n=10)
MemclrUnaligned/7_4M       210.2µ ± 0%    131.3µ ± 0%  -37.54% (p=0.000 n=10)
MemclrUnaligned/7_8M       420.1µ ± 0%    262.7µ ± 0%  -37.47% (p=0.000 n=10)
MemclrUnaligned/7_16M      845.1µ ± 0%    528.7µ ± 0%  -37.43% (p=0.000 n=10)
MemclrUnaligned/7_64M      3.369m ± 0%    2.313m ± 1%  -31.34% (p=0.000 n=10)
MemclrRange/1K_2K         2707.0n ± 0%    972.4n ± 0%  -64.08% (p=0.000 n=10)
MemclrRange/2K_8K          8.816µ ± 0%    2.519µ ± 0%  -71.43% (p=0.000 n=10)
MemclrRange/4K_16K         8.333µ ± 0%    2.240µ ± 0%  -73.12% (p=0.000 n=10)
MemclrRange/160K_228K      83.47µ ± 0%    31.27µ ± 0%  -62.54% (p=0.000 n=10)
MemclrKnownSize1          0.4003n ± 0%   0.4004n ± 0%        ~ (p=0.119 n=10)
MemclrKnownSize2          0.4003n ± 0%   0.4005n ± 0%        ~ (p=0.069 n=10)
MemclrKnownSize4          0.4003n ± 0%   0.4005n ± 0%        ~ (p=0.100 n=10)
MemclrKnownSize8          0.4003n ± 0%   0.4004n ± 0%   +0.04% (p=0.047 n=10)
MemclrKnownSize16         0.8011n ± 0%   0.8012n ± 0%        ~ (p=0.926 n=10)
MemclrKnownSize32          1.602n ± 0%    1.602n ± 0%        ~ (p=0.772 n=10)
MemclrKnownSize64          2.405n ± 0%    2.404n ± 0%        ~ (p=0.780 n=10)
MemclrKnownSize112         2.804n ± 0%    2.804n ± 0%        ~ (p=0.538 n=10)
MemclrKnownSize128         3.204n ± 0%    3.205n ± 0%        ~ (p=0.105 n=10)
MemclrKnownSize192         4.808n ± 0%    4.807n ± 0%        ~ (p=0.688 n=10)
MemclrKnownSize248         6.347n ± 0%    6.346n ± 0%        ~ (p=0.133 n=10)
MemclrKnownSize256         6.560n ± 0%    6.573n ± 0%   +0.19% (p=0.001 n=10)
MemclrKnownSize512        13.010n ± 0%    6.809n ± 0%  -47.66% (p=0.000 n=10)
MemclrKnownSize1024       25.830n ± 0%    8.412n ± 0%  -67.43% (p=0.000 n=10)
MemclrKnownSize4096       102.70n ± 0%    27.64n ± 0%  -73.09% (p=0.000 n=10)
MemclrKnownSize512KiB      26.30µ ± 0%    16.42µ ± 0%  -37.59% (p=0.000 n=10)
geomean                    629.8n         393.2n       -37.57%

Change-Id: I2b9fe834c31d786d2e30cc02c65a6f9c455c4e8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/657835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-03-26 17:58:32 -07:00
qmuntal d69ab99f3f net: run unix socket stream tests on Windows
The net package supports Unix domain sockets on Windows, but most of
the tests related to them are skipped.

This CL unskip the SOCK_STREAM tests. SOCK_DGRAM probablye can also
make to work, but that will come in a follow-up CL.

Change-Id: If9506a8af57e9bfe58bd7b48a98fc39335627a61
Reviewed-on: https://go-review.googlesource.com/c/go/+/660915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-26 13:22:29 -07:00
qmuntal 440a8f7024 internal/poll: support async file operations on Windows
This CL adds support for async file operations on Windows. The affected
functions are Read, Write, Pread, and Pwrite.

The code has been slightly refactored to avoid duplication. Both the
async and sync variants follow the same code path, with the exception of
the async variant passes an overlapped structure to the syscalls
and supports the use of a completion port.

This doesn't change any user-facing behavior, as the os package still
sets the pollable parameter to false when calling FD.Init.

For #19098.

Change-Id: Iead6e51fa8f57e83456eb5ccdce28c2ea3846cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/660595
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
2025-03-26 13:05:03 -07:00
Jonathan Amsterdam b138f8e4d2 log/slog: Handler doc points to handler guide
There's a link in the package doc, but there should be one here too.

For #73057.

Change-Id: I8f8fe73f20bb6dd49cdf23b5f7634a92d4f7add9
Reviewed-on: https://go-review.googlesource.com/c/go/+/661015
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Sean Liao <sean@liao.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-26 11:06:06 -07:00
Joel Sing 9ce47e66e8 cmd/internal/obj/arm64: add support for BTI instruction
Add support for the `BTI' instruction to the arm64 assembler. This
instruction provides Branch Target Identification for targets of
indirect branches. A BTI can be marked with a target type of
'C' (call), 'J' (jump) or 'JC' (jump or call).

Updates #66054

Change-Id: I1cf31a0382207bb75b9b2deb49ac298a59c00d8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/646781
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Marvin Drees <marvin.drees@9elements.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-26 03:48:50 -07:00
junya koyama 99d97d7c4f testing/slogtest: test nested groups in empty record
Updates #62280

Change-Id: I1c80cb18bb174b47ff156974f72c37baf6b73635
GitHub-Last-Rev: d98b6cd57e
GitHub-Pull-Request: golang/go#65597
Reviewed-on: https://go-review.googlesource.com/c/go/+/562635
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Sean Liao <sean@liao.dev>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Sean Liao <sean@liao.dev>
2025-03-25 19:19:09 -07:00
Xiaolin Zhao 0095b5d098 cmd/internal/obj/loong64: add [X]VSHUF4I.{B/H/W/D} instructions support
Go asm syntax:
	 VSHUF4I{B/H/W/V}	$1, V1, V2
	XVSHUF4I{B/H/W/V}	$2, X1, X2

Equivalent platform assembler syntax:
	 vshuf4i.{b/h/w/d}	v2, v1, $1
	xvshuf4i.{b/h/w/d}	x2, x1, $2

Change-Id: I6a847ccbd2c93432d87bd1390b5cf1508da06496
Reviewed-on: https://go-review.googlesource.com/c/go/+/658376
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-25 18:02:50 -07:00