Commit Graph

62741 Commits

Author SHA1 Message Date
Mark Wakefield a053e79024 net/http: support TCP half-close when HTTP is upgraded in ReverseProxy
This CL propagates closing the write stream from either side of the
reverse proxy and ensures the proxy waits for both copy-to and the
copy-from the backend to complete.

The new unit test checks communication through the reverse proxy when
the backend or frontend closes either the read or write streams.
That closing the write stream is propagated through the proxy from
either the backend or the frontend. That closing the read stream is
not propagated through the proxy.

Fixes #35892

Change-Id: I83ce377df66a0f17b9ba2b53caf9e4991a95f6a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/637939
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Sean Liao <sean@liao.dev>
Auto-Submit: Sean Liao <sean@liao.dev>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Matej Kramny <matejkramny@gmail.com>
2025-03-04 04:59:32 -08:00
wineandchord 7181118a85 net/http: check server shutting down before processing the request
The root cause of issue #65802 is a small race condition that occurs between
two events:

1. During the HTTP server shutdown, a connection in an idle state is identified
and closed.
2. The connection, although idle, has just finished reading a complete request
before being closed and hasn't yet updated its state to active.

In this scenario, despite the connection being closed, the request continues to
be processed. This not only wastes server resources but also prevents the
client request from being retried.

Fixes #65802

Change-Id: Ic22abb4497be04f6c84dff059df00f2c319d8652
GitHub-Last-Rev: 426099a3e7
GitHub-Pull-Request: golang/go#65805
Reviewed-on: https://go-review.googlesource.com/c/go/+/565277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Sean Liao <sean@liao.dev>
Reviewed-by: Damien Neil <dneil@google.com>
Auto-Submit: Sean Liao <sean@liao.dev>
2025-03-04 04:56:10 -08:00
Than McIntosh bef2bb80a9 cmd/compile,cmd/link: move to DWARF5-style location lists
This patch updates the compiler to generate DWARF5-style location
lists (e.g. entries that feed into .debug_loclists) as opposed to
DWARF4-style location lists (which wind up in .debug_loc). The DWARF5
format is much more compact, and can make indirect references to text
addresses via the .debug_addr section for further space savings.

Updates #26379.

Change-Id: If2e6fce1136d9cba5125ea51a71419596d1d1691
Reviewed-on: https://go-review.googlesource.com/c/go/+/635836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-04 04:48:53 -08:00
Than McIntosh 4c0a47a8ff cmd/compile,cmd/link: move to DWARF5-style range lists
This patch updates the compiler to generate DWARF5-style range lists
(e.g. entries that feed into .debug_rnglists) as opposed to
DWARF4-style range lists (which wind up in .debug_ranges). The DWARF5
format is much more compact, and can make indirect references to text
address via the .debug_addr section for further space savings.

Updates #26379.

Change-Id: I273a6283484b7fe33d79d5412e31c5155b22a7c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/635345
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-04 04:48:46 -08:00
Cuong Manh Le 4f45b2b7e0 cmd/compile: fix out of memory when inlining closure
CL 629195 strongly favor closure inlining, allowing closures to be
inlined more aggressively.

However, if the closure body contains a call to a function, which itself
is one of the call arguments, it causes the infinite inlining.

Fixing this by prevent this kind of functions from being inlinable.

Fixes #72063

Change-Id: I5fb5723a819b1e2c5aadb57c1023ec84ca9fa53c
Reviewed-on: https://go-review.googlesource.com/c/go/+/654195
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-04 03:10:17 -08:00
Egon Elbre 82791889cc crypto/internal/fips140/bigmod/_asm: update avo to v0.6.0
avo v0.4.0 x/tools dependency crashes while parsing with Go 1.25.

Change-Id: Ic951066b0b39b477887ad0e32be44f4d88d4c2f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/653175
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-04 03:02:05 -08:00
Sean Liao 3b9d10cce7 net/textproto: document enforcement of RFC 9112 for headers
Fixes #68590

Change-Id: Ie7cf1fe8379182f86317d5ebb7f45a404ecd70e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/601555
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2025-03-03 22:12:55 -08:00
Xiaolin Zhao 2ce1fb4220 cmd/internal/obj/loong64: add F{MAXA/MINA}.{S/D} instructions
Go asm syntax:
	F{MAXA/MINA}{F/D}	FK, FJ, FD

Equivalent platform assembler syntax:
	f{maxa/mina}.{s/d}	fd, fj, fk

Ref: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html

Change-Id: I6790657d2f36bdf5e6818b6c0aaa48117e782b8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/653915
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-03-03 17:01:33 -08:00
Xiaolin Zhao 47fd73a51c cmd/internal/obj/loong64: add {V,XV}{SLL/SRL/SRA/ROTR}[I].{B/H/W/D} instructions support
Go asm syntax:
	 V{SLL/SRL/SRA/ROTR}{B/H/W/V}	$1, V2, V3
	XV{SLL/SRL/SRA/ROTR}{B/H/W/V}	$1, X2, X3
	 V{SLL/SRL/SRA/ROTR}{B/H/W/V}	VK, VJ, VD
	XV{SLL/SRL/SRA/ROTR}{B/H/W/V}	XK, XJ, XD

Equivalent platform assembler syntax:
	 v{sll/srl/sra/rotr}i.{b/h/w/d}	v3, v2, $1
	xv{sll/srl/sra/rotr}i.{b/h/w/d}	x3, x2, $1
	 v{sll/srl/sra/rotr}.{b/h/w/d}	vd, vj, vk
	xv{sll/srl/sra/rotr}.{b/h/w/d}	xd, xj, xk

Change-Id: Ie4f04de1c77491a71688d226f7d91cd1a699ab47
Reviewed-on: https://go-review.googlesource.com/c/go/+/637775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-03-03 17:00:29 -08:00
Robert Griesemer 7e81bcf39f go/types, types2: remove remaining mentions of core type in error messages
The implementatiom still calls coreType in places and refers to
"core types" in comments, but user-visible error messages don't
know about core types anymore.

This brings the user-visible part of the implementation in sync with
the spec which doesn't have the notion of core types anymore.

For #70128.

Change-Id: I14bc6767a83e8f54b10ebe99a7df0b98cd9fca87
Reviewed-on: https://go-review.googlesource.com/c/go/+/654395
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-03 13:41:04 -08:00
Damien Neil 0a0e6af39b context: use atomic operation in ctx.Err
oos: darwin
goarch: arm64
pkg: context
cpu: Apple M1 Pro
               │ /tmp/bench.0.mac │          /tmp/bench.1.mac           │
               │      sec/op      │   sec/op     vs base                │
ErrOK-10             13.750n ± 1%   2.080n ± 0%  -84.87% (p=0.000 n=10)
ErrCanceled-10       13.530n ± 1%   3.248n ± 1%  -76.00% (p=0.000 n=10)
geomean               13.64n        2.599n       -80.94%

goos: linux
goarch: amd64
pkg: context
cpu: Intel(R) Xeon(R) CPU @ 2.30GHz
               │ /tmp/bench.0.linux │         /tmp/bench.1.linux          │
               │       sec/op       │   sec/op     vs base                │
ErrOK-16               21.435n ± 0%   4.243n ± 0%  -80.21% (p=0.000 n=10)
ErrCanceled-16         21.445n ± 0%   5.070n ± 0%  -76.36% (p=0.000 n=10)
geomean                 21.44n        4.638n       -78.37%

Fixes #72040

Change-Id: I3b337ab1934689d2da4134492ee7c5aac8f92845
Reviewed-on: https://go-review.googlesource.com/c/go/+/653795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-03-03 12:32:57 -08:00
qmuntal 14647b0ac8 os: only call GetConsoleMode for char devices
There is no need to call GetConsoleMode if we know that the file
type is not FILE_TYPE_CHAR. This is a tiny performance optimization,
as I sometimes see this call in profiles.

Change-Id: I9e9237908585d0ec8360930a0406b26f52699b92
Reviewed-on: https://go-review.googlesource.com/c/go/+/654155
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-03-03 11:33:48 -08:00
Robert Griesemer 05354fc3b4 go/types, types2: remove remaining references to coreType in literals.go
For now, use commonUnder (formerly called sharedUnder) and update
error messages and comments. We can provide better error messages
in individual cases eventually.

For #70128.

Change-Id: I906ba9a0c768f6499c1683dc9be3ad27da8007a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/653156
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
2025-03-03 11:26:34 -08:00
Robert Griesemer 26ba61dfad go/types, types2: remove most remaining references to coreType in builtin.go
For now, use commonUnder (formerly called sharedUnder) and update
error messages and comments. We can provide better error messages
in individual cases eventually.

Kepp using coreType for make built-in for now because it must accept
different channel types with non-conflicting directions and identical
element types. Added extra test cases.

While at it, rename sharedUnder, sharedUnderOrChan to commonUnder
and commonUnderOrChan, respectively (per suggestion from rfindley).

For #70128.

Change-Id: I11f3d5ce858746574f4302271d8cb763c2cdcf98
Reviewed-on: https://go-review.googlesource.com/c/go/+/653139
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-03 11:26:31 -08:00
Filippo Valsorda 19d0b3e81f crypto/rsa: use Div instead of GCD for trial division
Div is way faster. We could actually test a lot more primes and still
gain performance despite the diminishing returns, but necessarily it
would have marginal impact overall.

fips140: off
goos: linux
goarch: amd64
pkg: crypto/rsa
cpu: AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics
                    │  e325b41ad1  │             0f611af2e1              │
                    │    sec/op    │   sec/op     vs base                │
GenerateKey/2048-16   124.19m ± 0%   39.93m ± 0%  -67.85% (p=0.000 n=20)

Surprisingly, the performance gain is similar on ARM64, which doesn't
have intrinsified math.Div.

fips140: off
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M2
                   │  e325b41ad1  │             6276161a7f              │
                   │    sec/op    │   sec/op     vs base                │
GenerateKey/2048-8   136.49m ± 0%   47.97m ± 1%  -64.86% (p=0.000 n=20)

Change-Id: I6a6a46560331198312bd09c1cbe4d2b3c370c552
Reviewed-on: https://go-review.googlesource.com/c/go/+/639955
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
2025-03-03 11:17:41 -08:00
Robert Griesemer 4b1ac7bbfe go/types, types2: remove references to core type in append
Writing explicit code for this case turned out to be simpler
and easier to reason about then relying on a helper functions
(except for typeset).

While at it, make append error messages more consistent.

For #70128.

Change-Id: I3dc79774249929de5061b4301ab2506d4b3da0d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/653095
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-03-03 11:07:14 -08:00
Junyang Shao f48b53f0f6 testing: fix testing.B.Loop doc on loop condition
As mentioned by
https://github.com/golang/go/issues/61515#issuecomment-2656656554,
the documentation should be relaxed.

Change-Id: I9f18301e1a4e4d9a72c9fa0b1132b1ba3cc57b03
Reviewed-on: https://go-review.googlesource.com/c/go/+/651435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
2025-03-03 10:37:29 -08:00
Brad Fitzpatrick 0312e31ed1 net: fix parsing of interfaces on plan9 without associated devices
Fixes #72060
Updates #39908

Change-Id: I7d5bda1654753acebc8aa9937d010b41c5722b36
Reviewed-on: https://go-review.googlesource.com/c/go/+/654055
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org>
2025-03-03 08:07:04 -08:00
Jakob Ackermann bda9f85e5c net/http: allocate CloseNotifier channel lazily
The CloseNotifier interface is deprecated. We can defer allocating the
backing channel until the first use of CloseNotifier.

goos: linux
goarch: amd64
pkg: net/http
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
                   │   before    │               after                │
                   │   sec/op    │   sec/op     vs base               │
Server-8             160.8µ ± 2%   160.1µ ± 1%       ~ (p=0.353 n=10)
CloseNotifier/h1-8   222.1µ ± 4%   226.4µ ± 7%       ~ (p=0.143 n=10)
geomean              189.0µ        190.4µ       +0.75%

                   │    before    │                after                │
                   │     B/op     │     B/op      vs base               │
Server-8             2.292Ki ± 0%   2.199Ki ± 0%  -4.07% (p=0.000 n=10)
CloseNotifier/h1-8   3.224Ki ± 0%   3.241Ki ± 0%  +0.51% (p=0.000 n=10)
geomean              2.718Ki        2.669Ki       -1.80%

                   │   before   │                after                │
                   │ allocs/op  │ allocs/op   vs base                 │
Server-8             21.00 ± 0%   20.00 ± 0%  -4.76% (p=0.000 n=10)
CloseNotifier/h1-8   50.00 ± 0%   50.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean              32.40        31.62       -2.41%
¹ all samples are equal

Change-Id: I3f35d56b8356fb660589b7708a023e4480f32067
GitHub-Last-Rev: c75696b9b8
GitHub-Pull-Request: golang/go#71163
Reviewed-on: https://go-review.googlesource.com/c/go/+/640598
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-03 07:45:07 -08:00
Joel Sing b199d9766a runtime: add padding to m struct for 64 bit architectures
CL 652276 reduced the m struct by 8 bytes, which has changed the
allocation class on 64 bit OpenBSD platforms. This results in build
failures due to:

    M structure uses sizeclass 1792/0x700 bytes; incompatible with mutex flag mask 0x3ff

Add 128 bytes of padding when spinbitmutex is enabled on 64 bit
architectures, moving the size to the half point between the
1792 and 2048 allocation size.

Change-Id: I71623a1f75714543c302217e619d20cf0e717aeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/653335
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-01 09:22:41 -08:00
Mark Ryan 5a7db813a6 cmd/internal/obj/riscv: add riscv64 CSR map
The map is automatically generated by running the latest version of
parse.py from github.com/riscv/riscv-opcodes.

Change-Id: I05e00ab27ec583750752c25e1835c2578b339fbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/630518
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-03-01 04:59:21 -08:00
qmuntal f4750a6cfb cmd/go/internal/work: use par.Cache to cache tool IDs.
The tool IDs can be calculated once and reused across multiple
threads. This is a small optimization that helps optimize system
resources.

On a normal Windows machine with 12 virtual CPUs, the time to build
a hello world program is reduced from over 1 second, with spikes of 2
seconds, to a consistent 0.7 seconds.

Updates #71981.

Change-Id: I85f4a19f8ad4230afa32213780c761b7eb22fa29
Reviewed-on: https://go-review.googlesource.com/c/go/+/653715
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-01 00:29:04 -08:00
Xiaolin Zhao 039b3ebeba runtime: use ABIInternal on syscall and other sys.stuff for loong64
Change-Id: I6b2942c413eab58c457980131022dace036cd76c
Reviewed-on: https://go-review.googlesource.com/c/go/+/623475
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-02-28 17:10:54 -08:00
limeidan f24d2e175e cmd/internal/obj, cmd/asm: reclassify 32-bit immediate value of loong64
Change-Id: If9fd257ca0837a8c8597889c4f5ed3d4edc602c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/636995
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-28 17:10:24 -08:00
Michael Matloob 6894904974 cmd/go/internal/modindex: clean modroot and pkgdir for openIndexPackage
GetPackage is sometimes called with a modroot or pkgdir that ends with a
path separator, and sometimes without. Clean them before passing them to
openIndexPackage to deduplicate calls to mcache.Do and action entry
files.

This shouldn't affect #71698 but was discovered while debugging that
issue.

Change-Id: I6a7fa4de93f45801504abea11bd97f6c6577f296
Reviewed-on: https://go-review.googlesource.com/c/go/+/652435
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-28 14:09:39 -08:00
guoguangwu 64feded8af cmd/covdata: close output meta-data file
Change-Id: Idd2a324eb51ffa3f40cb3df03a82a1d6d882295a
GitHub-Last-Rev: 62e22b309d
GitHub-Pull-Request: golang/go#71993
Reviewed-on: https://go-review.googlesource.com/c/go/+/653140
Reviewed-by: Than McIntosh <thanm@golang.org>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
2025-02-28 12:43:43 -08:00
qmuntal 2298215f5b cmd/link: use __got as the .got section name
The __nl_symbol_ptr is not a common section name anymore. LLVM prefers
__got for GOT symbols in the __DATA_CONST segment.

Note that the Go linker used to place the GOT section in the __DATA
segment, but since CL 644055 we place it in the __DATA_CONST segment.

Updates #71416.

Cq-Include-Trybots: luci.golang.try:gotip-darwin-amd64-longtest
Change-Id: Icb776e19855eaabb4777a9b1eb433497842413b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/652555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-02-28 12:43:36 -08:00
Jes Cok 74ba2164a0 reflect: add more tests for Type.{CanSeq,CanSeq2}
For #71874.

Change-Id: I3850edfb3104305f3bf4847a73cdd826cc99837f
GitHub-Last-Rev: 574c1edb7a
GitHub-Pull-Request: golang/go#71890
Reviewed-on: https://go-review.googlesource.com/c/go/+/651775
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-28 11:09:39 -08:00
Michael Anthony Knyszek 1dafccaaf3 runtime: increase timeout in TestSpuriousWakeupsNeverHangSemasleep
This change tries increasing the timeout in
TestSpuriousWakeupsNeverHangSemasleep. I'm not entirely sure of the
mechanism, but GODEBUG=gcstoptheworld=2 and GODEBUG=gccheckmark=1 can
cause this test to fail at it's regular timeout. It does not seem to
indicate a deadlock, because bumping the timeout 10x make the problem
go away. I suspect the problem is due to the long STW times these two
modes can induce, plus the fact this test runs in parallel with others.

Let's just bump the timeout. The test is fundamentally sound, and it's
unclear to me how else to test for a deadlock here.

Fixes #71691.
Fixes #71548.

Change-Id: I649531eeec8a8408ba90823ce5223f3a17863124
Reviewed-on: https://go-review.googlesource.com/c/go/+/652756
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-02-28 10:17:23 -08:00
Filippo Valsorda 6e8d7a113c crypto/x509: avoid crypto/rand.Int to generate serial number
It's probabyl safe enough, but just reading bytes from rand and then
using SetBytes is simpler, and doesn't require allowing calls from
crypto into math/big's Lsh, Sub, and Cmp.

Change-Id: I6a6a4656761f7073f9e149f288c48e97048ab13c
Reviewed-on: https://go-review.googlesource.com/c/go/+/643278
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Roland Shoemaker <roland@golang.org>
2025-02-28 08:54:13 -08:00
KangJi 555974734f cmd/cgo: update generated headers for compatibility with latest MSVC C++ standards
Updates #71921

Change-Id: Idfbb72e259b169121c8ced6d89ee2f13d6254d0d
GitHub-Last-Rev: fcf12e5a22
GitHub-Pull-Request: golang/go#72004
Reviewed-on: https://go-review.googlesource.com/c/go/+/653141
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-28 08:48:12 -08:00
Jes Cok b3e36364b9 flag: replace interface{} -> any for textValue.Get method
Make it literally match the Getter interface.

Change-Id: I73f03780ba1d3fd2230e0e5e2343d40530d9e6d8
GitHub-Last-Rev: 398b90b2fb
GitHub-Pull-Request: golang/go#71975
Reviewed-on: https://go-review.googlesource.com/c/go/+/652795
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-28 08:43:46 -08:00
Damien Neil d31c805535 net/http: reject newlines in chunk-size lines
Unlike request headers, where we are allowed to leniently accept
a bare LF in place of a CRLF, chunked bodies must always use CRLF
line terminators. We were already enforcing this for chunk-data lines;
do so for chunk-size lines as well. Also reject bare CRs anywhere
other than as part of the CRLF terminator.

Fixes CVE-2025-22871
Fixes #71988

Change-Id: Ib0e21af5a8ba28c2a1ca52b72af8e2265ec79e4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/652998
Reviewed-by: Jonathan Amsterdam <jba@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-27 09:23:42 -08:00
Lin Lin 0135120922 cmd/go: update c document
Fixes: #11875

Change-Id: I0ea2c3e94d7d1647c2aaa3d488ac3c1f5fb6cb18
GitHub-Last-Rev: 7512b33f05
GitHub-Pull-Request: golang/go#71966
Reviewed-on: https://go-review.googlesource.com/c/go/+/652675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
2025-02-27 07:57:37 -08:00
Russ Cox 2b73707358 math/big: add tests for allocation during multiply
Test that big.Int.Mul reusing the same target is not allocating
temporary garbage during its computation. That code is going
to be modified in an upcoming CL.

Change-Id: I3ed55c06da030282233c29cd7af2a04f395dc7a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/652056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 06:05:02 -08:00
Russ Cox 0ab2253ce1 math/big: move multiplication to natmul.go
No code changes.

This CL moves the multiplication (and squaring) code into natmul.go,
in preparation for cleaning up Karatsuba and then adding Toom-Cook
and FFT-based multiplication.

Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/652055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27 06:05:00 -08:00
Russ Cox 1f7e28acdf math/big: optimize atoi of base 2, 4, 16
Avoid multiplies when converting base 2, 4, 16 inputs,
reducing conversion time from O(N²) to O(N).

The Base8 and Base10 code paths should be unmodified,
but the base-2,4,16 changes tickle the compiler to generate
better (amd64) or worse (arm64) when really it should not.
This is described in detail in #71868 and should be ignored
for the purposes of this CL.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                      │     old      │                 new                 │
                      │    sec/op    │   sec/op     vs base                │
Scan/10/Base2-16         324.4n ± 0%   258.7n ± 0%  -20.25% (p=0.000 n=15)
Scan/100/Base2-16        2.376µ ± 0%   1.968µ ± 0%  -17.17% (p=0.000 n=15)
Scan/1000/Base2-16       23.89µ ± 0%   19.16µ ± 0%  -19.80% (p=0.000 n=15)
Scan/10000/Base2-16      311.5µ ± 0%   190.4µ ± 0%  -38.86% (p=0.000 n=15)
Scan/100000/Base2-16    10.508m ± 0%   1.904m ± 0%  -81.88% (p=0.000 n=15)
Scan/10/Base8-16         138.3n ± 0%   127.9n ± 0%   -7.52% (p=0.000 n=15)
Scan/100/Base8-16        886.1n ± 0%   790.2n ± 0%  -10.82% (p=0.000 n=15)
Scan/1000/Base8-16       9.227µ ± 0%   8.234µ ± 0%  -10.76% (p=0.000 n=15)
Scan/10000/Base8-16      165.8µ ± 0%   155.6µ ± 0%   -6.19% (p=0.000 n=15)
Scan/100000/Base8-16     9.044m ± 0%   8.935m ± 0%   -1.20% (p=0.000 n=15)
Scan/10/Base10-16        129.9n ± 0%   120.0n ± 0%   -7.62% (p=0.000 n=15)
Scan/100/Base10-16       816.3n ± 0%   730.0n ± 0%  -10.57% (p=0.000 n=15)
Scan/1000/Base10-16      8.518µ ± 0%   7.628µ ± 0%  -10.45% (p=0.000 n=15)
Scan/10000/Base10-16     158.6µ ± 0%   149.4µ ± 0%   -5.80% (p=0.000 n=15)
Scan/100000/Base10-16    8.962m ± 0%   8.855m ± 0%   -1.20% (p=0.000 n=15)
Scan/10/Base16-16        114.5n ± 0%   108.6n ± 0%   -5.15% (p=0.000 n=15)
Scan/100/Base16-16       648.3n ± 0%   525.0n ± 0%  -19.02% (p=0.000 n=15)
Scan/1000/Base16-16      7.375µ ± 0%   5.636µ ± 0%  -23.58% (p=0.000 n=15)
Scan/10000/Base16-16    171.18µ ± 0%   66.99µ ± 0%  -60.87% (p=0.000 n=15)
Scan/100000/Base16-16   9490.9µ ± 0%   682.8µ ± 0%  -92.81% (p=0.000 n=15)
geomean                  20.11µ        13.69µ       -31.94%

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                      │      old      │                 new                 │
                      │    sec/op     │   sec/op     vs base                │
Scan/10/Base2-88          275.4n ± 0%   215.0n ± 0%  -21.93% (p=0.000 n=15)
Scan/100/Base2-88         1.869µ ± 0%   1.629µ ± 0%  -12.84% (p=0.000 n=15)
Scan/1000/Base2-88        18.56µ ± 0%   15.81µ ± 0%  -14.82% (p=0.000 n=15)
Scan/10000/Base2-88       270.0µ ± 0%   157.2µ ± 0%  -41.77% (p=0.000 n=15)
Scan/100000/Base2-88     11.518m ± 0%   1.571m ± 0%  -86.36% (p=0.000 n=15)
Scan/10/Base8-88          108.9n ± 0%   106.0n ± 0%   -2.66% (p=0.000 n=15)
Scan/100/Base8-88         655.2n ± 0%   594.9n ± 0%   -9.20% (p=0.000 n=15)
Scan/1000/Base8-88        6.467µ ± 0%   5.966µ ± 0%   -7.75% (p=0.000 n=15)
Scan/10000/Base8-88       151.2µ ± 0%   147.4µ ± 0%   -2.53% (p=0.000 n=15)
Scan/100000/Base8-88      10.33m ± 0%   10.30m ± 0%   -0.25% (p=0.000 n=15)
Scan/10/Base10-88        100.20n ± 0%   98.53n ± 0%   -1.67% (p=0.000 n=15)
Scan/100/Base10-88        596.9n ± 0%   543.3n ± 0%   -8.98% (p=0.000 n=15)
Scan/1000/Base10-88       5.904µ ± 0%   5.485µ ± 0%   -7.10% (p=0.000 n=15)
Scan/10000/Base10-88      145.7µ ± 0%   142.0µ ± 0%   -2.55% (p=0.000 n=15)
Scan/100000/Base10-88     10.26m ± 0%   10.24m ± 0%   -0.18% (p=0.000 n=15)
Scan/10/Base16-88         90.33n ± 0%   87.60n ± 0%   -3.02% (p=0.000 n=15)
Scan/100/Base16-88        506.4n ± 0%   437.7n ± 0%  -13.57% (p=0.000 n=15)
Scan/1000/Base16-88       5.056µ ± 0%   4.007µ ± 0%  -20.75% (p=0.000 n=15)
Scan/10000/Base16-88     163.35µ ± 0%   65.37µ ± 0%  -59.98% (p=0.000 n=15)
Scan/100000/Base16-88   11027.2µ ± 0%   735.1µ ± 0%  -93.33% (p=0.000 n=15)
geomean                   17.13µ        11.74µ       -31.46%

goos: linux
goarch: arm64
pkg: math/big
                      │     old      │                 new                  │
                      │    sec/op    │    sec/op     vs base                │
Scan/10/Base2-16         324.7n ± 0%    348.4n ± 0%   +7.30% (p=0.000 n=15)
Scan/100/Base2-16        2.604µ ± 0%    3.031µ ± 0%  +16.40% (p=0.000 n=15)
Scan/1000/Base2-16       26.15µ ± 0%    29.94µ ± 0%  +14.52% (p=0.000 n=15)
Scan/10000/Base2-16      334.3µ ± 0%    298.8µ ± 0%  -10.64% (p=0.000 n=15)
Scan/100000/Base2-16    10.664m ± 0%    2.991m ± 0%  -71.95% (p=0.000 n=15)
Scan/10/Base8-16         144.4n ± 1%    162.2n ± 1%  +12.33% (p=0.000 n=15)
Scan/100/Base8-16        917.2n ± 0%   1084.0n ± 0%  +18.19% (p=0.000 n=15)
Scan/1000/Base8-16       9.367µ ± 0%   10.901µ ± 0%  +16.38% (p=0.000 n=15)
Scan/10000/Base8-16      164.2µ ± 0%    181.2µ ± 0%  +10.34% (p=0.000 n=15)
Scan/100000/Base8-16     8.871m ± 1%    9.140m ± 0%   +3.04% (p=0.000 n=15)
Scan/10/Base10-16        134.6n ± 1%    148.3n ± 1%  +10.18% (p=0.000 n=15)
Scan/100/Base10-16       837.1n ± 0%    986.6n ± 0%  +17.86% (p=0.000 n=15)
Scan/1000/Base10-16      8.563µ ± 0%    9.936µ ± 0%  +16.03% (p=0.000 n=15)
Scan/10000/Base10-16     156.5µ ± 1%    171.3µ ± 0%   +9.41% (p=0.000 n=15)
Scan/100000/Base10-16    8.863m ± 1%    9.011m ± 0%   +1.66% (p=0.000 n=15)
Scan/10/Base16-16        115.7n ± 2%    129.1n ± 1%  +11.58% (p=0.000 n=15)
Scan/100/Base16-16       708.6n ± 0%    796.8n ± 0%  +12.45% (p=0.000 n=15)
Scan/1000/Base16-16      7.314µ ± 0%    7.554µ ± 0%   +3.28% (p=0.000 n=15)
Scan/10000/Base16-16    149.05µ ± 0%    74.60µ ± 0%  -49.95% (p=0.000 n=15)
Scan/100000/Base16-16   9091.6µ ± 0%    741.5µ ± 0%  -91.84% (p=0.000 n=15)
geomean                  20.39µ         17.65µ       -13.44%

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                      │     old      │                 new                 │
                      │    sec/op    │   sec/op     vs base                │
Scan/10/Base2-12         193.8n ± 2%   157.3n ± 1%  -18.83% (p=0.000 n=15)
Scan/100/Base2-12        1.445µ ± 2%   1.362µ ± 1%   -5.74% (p=0.000 n=15)
Scan/1000/Base2-12       14.28µ ± 0%   13.51µ ± 0%   -5.42% (p=0.000 n=15)
Scan/10000/Base2-12      177.1µ ± 0%   134.6µ ± 0%  -24.04% (p=0.000 n=15)
Scan/100000/Base2-12     5.429m ± 1%   1.333m ± 0%  -75.45% (p=0.000 n=15)
Scan/10/Base8-12         75.52n ± 2%   76.09n ± 1%        ~ (p=0.010 n=15)
Scan/100/Base8-12        528.4n ± 1%   532.1n ± 1%        ~ (p=0.003 n=15)
Scan/1000/Base8-12       5.423µ ± 1%   5.427µ ± 0%        ~ (p=0.183 n=15)
Scan/10000/Base8-12      89.26µ ± 1%   89.37µ ± 0%        ~ (p=0.237 n=15)
Scan/100000/Base8-12     4.543m ± 2%   4.560m ± 1%        ~ (p=0.595 n=15)
Scan/10/Base10-12        69.87n ± 1%   70.51n ± 0%        ~ (p=0.002 n=15)
Scan/100/Base10-12       488.4n ± 1%   491.2n ± 0%        ~ (p=0.060 n=15)
Scan/1000/Base10-12      5.014µ ± 1%   5.008µ ± 0%        ~ (p=0.783 n=15)
Scan/10000/Base10-12     84.90µ ± 0%   85.10µ ± 0%        ~ (p=0.109 n=15)
Scan/100000/Base10-12    4.516m ± 1%   4.521m ± 1%        ~ (p=0.713 n=15)
Scan/10/Base16-12        59.21n ± 1%   57.70n ± 1%   -2.55% (p=0.000 n=15)
Scan/100/Base16-12       380.0n ± 1%   360.7n ± 1%   -5.08% (p=0.000 n=15)
Scan/1000/Base16-12      3.775µ ± 0%   3.421µ ± 0%   -9.38% (p=0.000 n=15)
Scan/10000/Base16-12     80.62µ ± 0%   34.44µ ± 1%  -57.28% (p=0.000 n=15)
Scan/100000/Base16-12   4826.4µ ± 2%   450.9µ ± 2%  -90.66% (p=0.000 n=15)
geomean                  11.05µ        8.448µ       -23.52%

Change-Id: Ifdb2049545f34072aa75cdbb72bed4cf465f0ad7
Reviewed-on: https://go-review.googlesource.com/c/go/+/650640
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-02-27 06:04:57 -08:00
Russ Cox f8f08bfd7c math/big: improve scan test and benchmark
Add a few more test cases for scanning (integer conversion),
which were helpful in debugging some upcoming changes.

BenchmarkScan currently times converting the value 10**N
represented in base B back into []Word form.
When B = 10, the text is 1 followed by many zeros, which
could hit a "multiply by zero" special case when processing
many digit chunks, misrepresenting the actual time required
depending on whether that case is optimized.

Change the benchmark to use 9**N, which is about as big and
will not cause runs of zeros in any of the tested bases.

The benchmark comparison below is not showing faster code,
since of course the code is not changing at all here. Instead,
it is showing that the new benchmark work is roughly the same
size as the old benchmark work.

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                      │     old     │                new                 │
                      │   sec/op    │   sec/op     vs base               │
ScanPi-12               43.35µ ± 1%   43.59µ ± 1%       ~ (p=0.069 n=15)
Scan/10/Base2-12        202.3n ± 2%   193.7n ± 1%  -4.25% (p=0.000 n=15)
Scan/100/Base2-12       1.512µ ± 3%   1.447µ ± 1%  -4.30% (p=0.000 n=15)
Scan/1000/Base2-12      15.06µ ± 2%   14.33µ ± 0%  -4.83% (p=0.000 n=15)
Scan/10000/Base2-12     188.0µ ± 5%   177.3µ ± 1%  -5.65% (p=0.000 n=15)
Scan/100000/Base2-12    5.814m ± 3%   5.382m ± 1%  -7.43% (p=0.000 n=15)
Scan/10/Base8-12        78.57n ± 2%   75.02n ± 1%  -4.52% (p=0.000 n=15)
Scan/100/Base8-12       548.2n ± 2%   526.8n ± 1%  -3.90% (p=0.000 n=15)
Scan/1000/Base8-12      5.674µ ± 2%   5.421µ ± 0%  -4.46% (p=0.000 n=15)
Scan/10000/Base8-12     94.42µ ± 1%   88.61µ ± 1%  -6.15% (p=0.000 n=15)
Scan/100000/Base8-12    4.906m ± 2%   4.498m ± 3%  -8.31% (p=0.000 n=15)
Scan/10/Base10-12       73.42n ± 1%   69.56n ± 0%  -5.26% (p=0.000 n=15)
Scan/100/Base10-12      511.9n ± 1%   488.2n ± 0%  -4.63% (p=0.000 n=15)
Scan/1000/Base10-12     5.254µ ± 2%   5.009µ ± 0%  -4.66% (p=0.000 n=15)
Scan/10000/Base10-12    90.22µ ± 2%   84.52µ ± 0%  -6.32% (p=0.000 n=15)
Scan/100000/Base10-12   4.842m ± 3%   4.471m ± 3%  -7.65% (p=0.000 n=15)
Scan/10/Base16-12       62.28n ± 1%   58.70n ± 1%  -5.75% (p=0.000 n=15)
Scan/100/Base16-12      398.6n ± 0%   377.9n ± 1%  -5.19% (p=0.000 n=15)
Scan/1000/Base16-12     4.108µ ± 1%   3.782µ ± 0%  -7.94% (p=0.000 n=15)
Scan/10000/Base16-12    83.78µ ± 2%   80.51µ ± 1%  -3.90% (p=0.000 n=15)
Scan/100000/Base16-12   5.080m ± 3%   4.698m ± 3%  -7.53% (p=0.000 n=15)
geomean                 12.41µ        11.74µ       -5.36%

Change-Id: If3ce290ecc7f38672f11b42fd811afb53dee665d
Reviewed-on: https://go-review.googlesource.com/c/go/+/650639
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:58:49 -08:00
Russ Cox 872496da2b math/big: replace nat pool with Word stack
In the early days of math/big, algorithms that needed more space
grew the result larger than it needed to be and then used the
high words as extra space. This made results their own temporary
space caches, at the cost that saving a result in a data structure
might hold significantly more memory than necessary.
Specifically, new(big.Int).Mul(x, y) returned a big.Int with a
backing slice 3X as big as it strictly needed to be.
If you are storing many multiplication results, or even a single
large result, the 3X overhead can add up.

This approach to storage for temporaries also requires being able
to analyze the algorithms to predict the exact amount they need,
which can be difficult.

For both these reasons, the implementation of recursive long division,
which came later, introduced a “nat pool” where temporaries could be
stored and reused, or reclaimed by the GC when no longer used.
This avoids the storage and bookkeeping overheads but introduces a
per-temporary sync.Pool overhead. divRecursiveStep takes an array
of cached temporaries to remove some of that overhead.
The nat pool was better but is still not quite right.

This CL introduces something even better than the nat pool
(still probably not quite right, but the best I can see for now):
a sync.Pool holding stacks for allocating temporaries.
Now an operation can get one stack out of the pool and then
allocate as many temporaries as it needs during the operation,
eventually returning the stack back to the pool. The sync.Pool
operations are now per-exported-operation (like big.Int.Mul),
not per-temporary.

This CL converts both the pre-allocation in nat.mul and the
uses of the nat pool to use stack pools instead. This simplifies
some code and sets us up better for more complex algorithms
(such as Toom-Cook or FFT-based multiplication) that need
more temporaries. It is also a little bit faster.

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               23.68n ± 0%   22.21n ± 0%   -6.21% (p=0.000 n=15)
Div/40/20-16               23.68n ± 0%   22.21n ± 0%   -6.21% (p=0.000 n=15)
Div/100/50-16              56.65n ± 0%   55.53n ± 0%   -1.98% (p=0.000 n=15)
Div/200/100-16             194.6n ± 1%   172.8n ± 0%  -11.20% (p=0.000 n=15)
Div/400/200-16             232.1n ± 0%   206.7n ± 0%  -10.94% (p=0.000 n=15)
Div/1000/500-16            405.3n ± 1%   383.8n ± 0%   -5.30% (p=0.000 n=15)
Div/2000/1000-16           810.4n ± 1%   795.2n ± 0%   -1.88% (p=0.000 n=15)
Div/20000/10000-16         25.88µ ± 0%   25.39µ ± 0%   -1.89% (p=0.000 n=15)
Div/200000/100000-16       931.5µ ± 0%   924.3µ ± 0%   -0.77% (p=0.000 n=15)
Div/2000000/1000000-16     37.77m ± 0%   37.75m ± 0%        ~ (p=0.098 n=15)
Div/20000000/10000000-16    1.367 ± 0%    1.377 ± 0%   +0.72% (p=0.003 n=15)
NatMul/10-16               168.5n ± 3%   164.0n ± 4%        ~ (p=0.751 n=15)
NatMul/100-16              6.086µ ± 3%   5.380µ ± 3%  -11.60% (p=0.000 n=15)
NatMul/1000-16             238.1µ ± 3%   228.3µ ± 1%   -4.12% (p=0.000 n=15)
NatMul/10000-16            8.721m ± 2%   8.518m ± 1%   -2.33% (p=0.000 n=15)
NatMul/100000-16           369.6m ± 0%   371.1m ± 0%   +0.42% (p=0.000 n=15)
geomean                    19.57µ        18.74µ        -4.21%

                 │     old      │                  new                   │
                 │     B/op     │     B/op      vs base                  │
NatMul/10-16         192.0 ± 0%     192.0 ± 0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      4.750Ki ± 0%   1.751Ki ± 0%  -63.14% (p=0.000 n=15)
NatMul/1000-16     48.16Ki ± 0%   16.02Ki ± 0%  -66.73% (p=0.000 n=15)
NatMul/10000-16    482.9Ki ± 1%   165.4Ki ± 3%  -65.75% (p=0.000 n=15)
NatMul/100000-16   5.747Mi ± 7%   4.197Mi ± 0%  -26.97% (p=0.000 n=15)
geomean            41.42Ki        20.63Ki       -50.18%
¹ all samples are equal

                 │     old     │                 new                  │
                 │  allocs/op  │  allocs/op   vs base                 │
NatMul/10-16       1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100-16      1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/1000-16     1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/10000-16    1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100000-16   7.000 ± 14%   7.000 ± 14%       ~ (p=0.668 n=15)
geomean            1.476         1.476        +0.00%
¹ all samples are equal

goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-88               15.84n ± 1%   13.12n ± 0%  -17.17% (p=0.000 n=15)
Div/40/20-88               15.88n ± 1%   13.12n ± 0%  -17.38% (p=0.000 n=15)
Div/100/50-88              26.42n ± 0%   25.47n ± 0%   -3.60% (p=0.000 n=15)
Div/200/100-88             132.4n ± 0%   114.9n ± 0%  -13.22% (p=0.000 n=15)
Div/400/200-88             150.1n ± 0%   135.6n ± 0%   -9.66% (p=0.000 n=15)
Div/1000/500-88            275.5n ± 0%   264.1n ± 0%   -4.14% (p=0.000 n=15)
Div/2000/1000-88           586.5n ± 0%   581.1n ± 0%   -0.92% (p=0.000 n=15)
Div/20000/10000-88         25.87µ ± 0%   25.72µ ± 0%   -0.59% (p=0.000 n=15)
Div/200000/100000-88       772.2µ ± 0%   779.0µ ± 0%   +0.88% (p=0.000 n=15)
Div/2000000/1000000-88     33.36m ± 0%   33.63m ± 0%   +0.80% (p=0.000 n=15)
Div/20000000/10000000-88    1.307 ± 0%    1.320 ± 0%   +1.03% (p=0.000 n=15)
NatMul/10-88               140.4n ± 0%   148.8n ± 4%   +5.98% (p=0.000 n=15)
NatMul/100-88              4.663µ ± 1%   4.388µ ± 1%   -5.90% (p=0.000 n=15)
NatMul/1000-88             207.7µ ± 0%   205.8µ ± 0%   -0.89% (p=0.000 n=15)
NatMul/10000-88            8.456m ± 0%   8.468m ± 0%   +0.14% (p=0.021 n=15)
NatMul/100000-88           295.1m ± 0%   297.9m ± 0%   +0.94% (p=0.000 n=15)
geomean                    14.96µ        14.33µ        -4.23%

                 │     old      │                   new                   │
                 │     B/op     │     B/op       vs base                  │
NatMul/10-88         192.0 ± 0%     192.0 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/100-88      4.750Ki ± 0%   1.758Ki ±  0%  -62.99% (p=0.000 n=15)
NatMul/1000-88     48.44Ki ± 0%   16.08Ki ±  0%  -66.80% (p=0.000 n=15)
NatMul/10000-88    489.7Ki ± 1%   166.1Ki ±  3%  -66.08% (p=0.000 n=15)
NatMul/100000-88   5.546Mi ± 0%   3.819Mi ± 60%  -31.15% (p=0.000 n=15)
geomean            41.29Ki        20.30Ki        -50.85%
¹ all samples are equal

                 │     old     │                 new                  │
                 │  allocs/op  │  allocs/op   vs base                 │
NatMul/10-88       1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100-88      1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/1000-88     1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/10000-88    1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=15) ¹
NatMul/100000-88   5.000 ± 20%   6.000 ± 67%       ~ (p=0.672 n=15)
geomean            1.380         1.431        +3.71%
¹ all samples are equal

goos: linux
goarch: arm64
pkg: math/big
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Div/20/10-16               15.85n ± 0%   15.23n ± 0%   -3.91% (p=0.000 n=15)
Div/40/20-16               15.88n ± 0%   15.22n ± 0%   -4.16% (p=0.000 n=15)
Div/100/50-16              29.69n ± 0%   26.39n ± 0%  -11.11% (p=0.000 n=15)
Div/200/100-16             149.2n ± 0%   123.3n ± 0%  -17.36% (p=0.000 n=15)
Div/400/200-16             160.3n ± 0%   139.2n ± 0%  -13.16% (p=0.000 n=15)
Div/1000/500-16            271.0n ± 0%   256.1n ± 0%   -5.50% (p=0.000 n=15)
Div/2000/1000-16           545.3n ± 0%   527.0n ± 0%   -3.36% (p=0.000 n=15)
Div/20000/10000-16         22.60µ ± 0%   22.20µ ± 0%   -1.77% (p=0.000 n=15)
Div/200000/100000-16       889.0µ ± 0%   892.2µ ± 0%   +0.35% (p=0.000 n=15)
Div/2000000/1000000-16     38.01m ± 0%   38.12m ± 0%   +0.30% (p=0.000 n=15)
Div/20000000/10000000-16    1.437 ± 0%    1.444 ± 0%   +0.50% (p=0.000 n=15)
NatMul/10-16               166.4n ± 2%   169.5n ± 1%   +1.86% (p=0.000 n=15)
NatMul/100-16              5.733µ ± 1%   5.570µ ± 1%   -2.84% (p=0.000 n=15)
NatMul/1000-16             232.6µ ± 1%   229.8µ ± 0%   -1.22% (p=0.000 n=15)
NatMul/10000-16            9.039m ± 1%   8.969m ± 0%   -0.77% (p=0.000 n=15)
NatMul/100000-16           367.0m ± 0%   368.8m ± 0%   +0.48% (p=0.000 n=15)
geomean                    16.15µ        15.50µ        -4.01%

                 │     old      │                  new                   │
                 │     B/op     │     B/op      vs base                  │
NatMul/10-16         192.0 ± 0%     192.0 ± 0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      4.750Ki ± 0%   1.751Ki ± 0%  -63.14% (p=0.000 n=15)
NatMul/1000-16     48.33Ki ± 0%   16.02Ki ± 0%  -66.85% (p=0.000 n=15)
NatMul/10000-16    536.5Ki ± 1%   165.7Ki ± 3%  -69.12% (p=0.000 n=15)
NatMul/100000-16   6.078Mi ± 6%   4.197Mi ± 0%  -30.94% (p=0.000 n=15)
geomean            42.81Ki        20.64Ki       -51.78%
¹ all samples are equal

                 │     old     │                  new                  │
                 │  allocs/op  │  allocs/op   vs base                  │
NatMul/10-16       1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/100-16      1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/1000-16     1.000 ±  0%   1.000 ±  0%        ~ (p=1.000 n=15) ¹
NatMul/10000-16    2.000 ± 50%   1.000 ±  0%  -50.00% (p=0.001 n=15)
NatMul/100000-16   9.000 ± 11%   8.000 ± 12%  -11.11% (p=0.001 n=15)
geomean            1.783         1.516        -14.97%
¹ all samples are equal

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                         │     old      │                new                 │
                         │    sec/op    │   sec/op     vs base               │
Div/20/10-12                9.850n ± 1%   9.405n ± 1%  -4.52% (p=0.000 n=15)
Div/40/20-12                9.858n ± 0%   9.403n ± 1%  -4.62% (p=0.000 n=15)
Div/100/50-12               16.40n ± 1%   14.81n ± 0%  -9.70% (p=0.000 n=15)
Div/200/100-12              88.48n ± 2%   80.88n ± 0%  -8.59% (p=0.000 n=15)
Div/400/200-12             107.90n ± 1%   99.28n ± 1%  -7.99% (p=0.000 n=15)
Div/1000/500-12             188.8n ± 1%   178.6n ± 1%  -5.40% (p=0.000 n=15)
Div/2000/1000-12            399.9n ± 0%   389.1n ± 0%  -2.70% (p=0.000 n=15)
Div/20000/10000-12          13.94µ ± 2%   13.81µ ± 1%       ~ (p=0.574 n=15)
Div/200000/100000-12        523.8µ ± 0%   521.7µ ± 0%  -0.40% (p=0.000 n=15)
Div/2000000/1000000-12      21.46m ± 0%   21.48m ± 0%       ~ (p=0.067 n=15)
Div/20000000/10000000-12    812.5m ± 0%   812.9m ± 0%       ~ (p=0.061 n=15)
NatMul/10-12                77.14n ± 0%   78.35n ± 1%  +1.57% (p=0.000 n=15)
NatMul/100-12               2.999µ ± 0%   2.871µ ± 1%  -4.27% (p=0.000 n=15)
NatMul/1000-12              126.2µ ± 0%   126.8µ ± 0%  +0.51% (p=0.011 n=15)
NatMul/10000-12             5.099m ± 0%   5.125m ± 0%  +0.51% (p=0.000 n=15)
NatMul/100000-12            206.7m ± 0%   208.4m ± 0%  +0.80% (p=0.000 n=15)
geomean                     9.512µ        9.236µ       -2.91%

                 │     old      │                   new                    │
                 │     B/op     │      B/op       vs base                  │
NatMul/10-12         192.0 ± 0%     192.0 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100-12      4.750Ki ± 0%   1.750Ki ±   0%  -63.16% (p=0.000 n=15)
NatMul/1000-12     48.13Ki ± 0%   16.01Ki ±   0%  -66.73% (p=0.000 n=15)
NatMul/10000-12    483.5Ki ± 1%   163.2Ki ±   2%  -66.24% (p=0.000 n=15)
NatMul/100000-12   5.480Mi ± 4%   1.532Mi ± 104%  -72.05% (p=0.000 n=15)
geomean            41.03Ki        16.82Ki         -59.01%
¹ all samples are equal

                 │    old     │                  new                   │
                 │ allocs/op  │  allocs/op    vs base                  │
NatMul/10-12       1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100-12      1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/1000-12     1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/10000-12    1.000 ± 0%   1.000 ±   0%        ~ (p=1.000 n=15) ¹
NatMul/100000-12   5.000 ± 0%   1.000 ± 400%  -80.00% (p=0.007 n=15)
geomean            1.380        1.000         -27.52%
¹ all samples are equal

Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011
Reviewed-on: https://go-review.googlesource.com/c/go/+/650638
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-02-27 05:58:09 -08:00
Russ Cox 763766e9ab math/big: report allocs in BenchmarkNatMul, BenchmarkNatSqr
Change-Id: I112f55c0e3ee3b75e615a06b27552de164565c04
Reviewed-on: https://go-review.googlesource.com/c/go/+/650637
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:50:45 -08:00
Russ Cox b66d83ccea math/big: clean up GCD a little
The GCD code was setting one *Int to the value of another
by smashing one struct on top of the other, instead of using Set.
That was safe in this one case, but it's not idiomatic in math/big
nor safe in general, so rewrite the code not to do that.
(In one case, by swapping variables around; in another, by calling Set.)

The added Set call does slow down GCDs by a small amount,
since the answer has to be copied out. To compensate for that,
optimize a bit: remove the s, t temporaries entirely and handle
vector x word multiplication directly. The net result is that almost
all GCDs are faster, except for small ones, which are a few
nanoseconds slower.

goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
                              │ bench.before │             bench.after             │
                              │    sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-12            23.80n ± 1%   31.71n ± 1%  +33.24% (p=0.000 n=10)
GCD10x10/WithXY-12              100.40n ± 0%   92.14n ± 1%   -8.22% (p=0.000 n=10)
GCD10x100/WithoutXY-12           63.70n ± 0%   70.73n ± 0%  +11.05% (p=0.000 n=10)
GCD10x100/WithXY-12              278.6n ± 0%   233.1n ± 1%  -16.35% (p=0.000 n=10)
GCD10x1000/WithoutXY-12          153.4n ± 0%   162.2n ± 1%   +5.74% (p=0.000 n=10)
GCD10x1000/WithXY-12             456.0n ± 0%   411.8n ± 1%   -9.69% (p=0.000 n=10)
GCD10x10000/WithoutXY-12         1.002µ ± 1%   1.036µ ± 0%   +3.39% (p=0.000 n=10)
GCD10x10000/WithXY-12            2.330µ ± 1%   2.210µ ± 0%   -5.13% (p=0.000 n=10)
GCD10x100000/WithoutXY-12        8.894µ ± 0%   8.889µ ± 1%        ~ (p=0.754 n=10)
GCD10x100000/WithXY-12           20.84µ ± 0%   20.24µ ± 0%   -2.84% (p=0.000 n=10)
GCD100x100/WithoutXY-12          373.3n ± 3%   314.4n ± 0%  -15.76% (p=0.000 n=10)
GCD100x100/WithXY-12             662.5n ± 0%   572.4n ± 1%  -13.59% (p=0.000 n=10)
GCD100x1000/WithoutXY-12         641.8n ± 0%   598.1n ± 1%   -6.81% (p=0.000 n=10)
GCD100x1000/WithXY-12            1.123µ ± 0%   1.019µ ± 1%   -9.26% (p=0.000 n=10)
GCD100x10000/WithoutXY-12        2.870µ ± 0%   2.831µ ± 0%   -1.38% (p=0.000 n=10)
GCD100x10000/WithXY-12           4.930µ ± 1%   4.675µ ± 0%   -5.16% (p=0.000 n=10)
GCD100x100000/WithoutXY-12       24.08µ ± 0%   23.97µ ± 0%   -0.48% (p=0.007 n=10)
GCD100x100000/WithXY-12          43.66µ ± 0%   42.52µ ± 0%   -2.61% (p=0.001 n=10)
GCD1000x1000/WithoutXY-12        3.999µ ± 0%   3.569µ ± 1%  -10.75% (p=0.000 n=10)
GCD1000x1000/WithXY-12           6.397µ ± 0%   5.534µ ± 0%  -13.49% (p=0.000 n=10)
GCD1000x10000/WithoutXY-12       6.875µ ± 0%   6.450µ ± 0%   -6.18% (p=0.000 n=10)
GCD1000x10000/WithXY-12          20.75µ ± 1%   19.17µ ± 1%   -7.64% (p=0.000 n=10)
GCD1000x100000/WithoutXY-12      36.38µ ± 0%   35.60µ ± 1%   -2.13% (p=0.000 n=10)
GCD1000x100000/WithXY-12         172.1µ ± 0%   174.4µ ± 3%        ~ (p=0.052 n=10)
GCD10000x10000/WithoutXY-12      79.89µ ± 1%   75.16µ ± 2%   -5.92% (p=0.000 n=10)
GCD10000x10000/WithXY-12         160.1µ ± 0%   150.0µ ± 0%   -6.33% (p=0.000 n=10)
GCD10000x100000/WithoutXY-12     213.2µ ± 1%   209.0µ ± 1%   -1.98% (p=0.000 n=10)
GCD10000x100000/WithXY-12        1.399m ± 0%   1.342m ± 3%   -4.08% (p=0.002 n=10)
GCD100000x100000/WithoutXY-12    5.463m ± 1%   5.504m ± 2%        ~ (p=0.190 n=10)
GCD100000x100000/WithXY-12       11.36m ± 0%   11.46m ± 1%   +0.86% (p=0.000 n=10)
geomean                          6.953µ        6.695µ        -3.71%

goos: linux
goarch: amd64
pkg: math/big
cpu: AMD Ryzen 9 7950X 16-Core Processor
                              │ bench.before │             bench.after             │
                              │    sec/op    │   sec/op     vs base                │
GCD10x10/WithoutXY-32           39.66n ±  4%   44.34n ± 4%  +11.77% (p=0.000 n=10)
GCD10x10/WithXY-32              156.7n ± 12%   130.8n ± 2%  -16.53% (p=0.000 n=10)
GCD10x100/WithoutXY-32          115.8n ±  5%   120.2n ± 2%   +3.89% (p=0.000 n=10)
GCD10x100/WithXY-32             465.3n ±  3%   368.1n ± 2%  -20.91% (p=0.000 n=10)
GCD10x1000/WithoutXY-32         201.1n ±  1%   210.8n ± 2%   +4.82% (p=0.000 n=10)
GCD10x1000/WithXY-32            652.9n ±  4%   605.0n ± 1%   -7.32% (p=0.002 n=10)
GCD10x10000/WithoutXY-32        1.046µ ±  2%   1.143µ ± 1%   +9.33% (p=0.000 n=10)
GCD10x10000/WithXY-32           3.360µ ±  1%   3.258µ ± 1%   -3.04% (p=0.000 n=10)
GCD10x100000/WithoutXY-32       9.391µ ±  3%   9.997µ ± 1%   +6.46% (p=0.000 n=10)
GCD10x100000/WithXY-32          27.92µ ±  1%   28.21µ ± 0%   +1.04% (p=0.043 n=10)
GCD100x100/WithoutXY-32         443.7n ±  5%   320.0n ± 2%  -27.88% (p=0.000 n=10)
GCD100x100/WithXY-32            789.9n ±  2%   690.4n ± 1%  -12.60% (p=0.000 n=10)
GCD100x1000/WithoutXY-32        718.4n ±  3%   600.0n ± 1%  -16.48% (p=0.000 n=10)
GCD100x1000/WithXY-32           1.388µ ±  4%   1.175µ ± 1%  -15.28% (p=0.000 n=10)
GCD100x10000/WithoutXY-32       2.750µ ±  1%   2.668µ ± 1%   -2.96% (p=0.000 n=10)
GCD100x10000/WithXY-32          6.016µ ±  1%   5.590µ ± 1%   -7.09% (p=0.000 n=10)
GCD100x100000/WithoutXY-32      21.40µ ±  1%   22.30µ ± 1%   +4.21% (p=0.000 n=10)
GCD100x100000/WithXY-32         47.02µ ±  4%   48.80µ ± 0%   +3.78% (p=0.015 n=10)
GCD1000x1000/WithoutXY-32       3.417µ ±  4%   3.020µ ± 1%  -11.65% (p=0.000 n=10)
GCD1000x1000/WithXY-32          5.752µ ±  0%   5.418µ ± 2%   -5.81% (p=0.000 n=10)
GCD1000x10000/WithoutXY-32      6.150µ ±  0%   6.246µ ± 1%   +1.55% (p=0.000 n=10)
GCD1000x10000/WithXY-32         24.68µ ±  3%   25.07µ ± 1%        ~ (p=0.051 n=10)
GCD1000x100000/WithoutXY-32     34.60µ ±  2%   36.85µ ± 1%   +6.51% (p=0.000 n=10)
GCD1000x100000/WithXY-32        209.5µ ±  4%   227.4µ ± 0%   +8.56% (p=0.000 n=10)
GCD10000x10000/WithoutXY-32     90.69µ ±  0%   88.48µ ± 0%   -2.44% (p=0.000 n=10)
GCD10000x10000/WithXY-32        197.1µ ±  0%   200.5µ ± 0%   +1.73% (p=0.000 n=10)
GCD10000x100000/WithoutXY-32    239.1µ ±  0%   242.5µ ± 0%   +1.42% (p=0.000 n=10)
GCD10000x100000/WithXY-32       1.963m ±  3%   2.028m ± 0%   +3.28% (p=0.000 n=10)
GCD100000x100000/WithoutXY-32   7.466m ±  0%   7.412m ± 0%   -0.71% (p=0.000 n=10)
GCD100000x100000/WithXY-32      16.10m ±  2%   16.47m ± 0%   +2.25% (p=0.000 n=10)
geomean                         8.388µ         8.127µ        -3.12%

Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742
Reviewed-on: https://go-review.googlesource.com/c/go/+/650636
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-02-27 05:49:50 -08:00
Joel Sing 37e9c5eaba cmd/internal/obj/riscv: implement vector load/store instructions
Implement vector unit stride, vector strided, vector indexed and
vector whole register load and store instructions.

The vector unit stride instructions take an optional vector mask
register, which if specified must be register V0. If only two
operands are given, the instruction is encoded as unmasked.

The vector strided and vector indexed instructions also take an
optional vector mask register, which if specified must be register
V0. If only three operands are given, the instruction is encoded as
unmasked.

Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64
Change-Id: I35e43bb8f1cf6ae8826fbeec384b95ac945da50f
Reviewed-on: https://go-review.googlesource.com/c/go/+/631937
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
2025-02-27 03:47:20 -08:00
Joel Sing 927fdb7843 cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.

Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-27 03:45:44 -08:00
Guoqi Chen 01ba8bfe86 runtime/cgo: use standard ABI call setg_gcc in crosscall1 on loong64
Change-Id: Ie38583d667d579751d643b2da2aa56390b69904c
Reviewed-on: https://go-review.googlesource.com/c/go/+/652255
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-02-26 17:36:57 -08:00
Guoqi Chen 11c847e536 runtime: use correct memory barrier in exitThread function on loong64
In the runtime.exitThread function, a storeRelease barrier
is required instead of a full barrier.

Change-Id: I2815ddb03e4984c891d71811ccf650a82325e10d
Reviewed-on: https://go-review.googlesource.com/c/go/+/631915
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-02-26 17:35:25 -08:00
Keith Randall c594762ad5 runtime: remove ret field from gobuf
It's not used for anything.

Change-Id: I031b3cdfe52b6b1cff4b3cb6713ffe588084542f
Reviewed-on: https://go-review.googlesource.com/c/go/+/652276
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-26 14:48:55 -08:00
Ian Lance Taylor 2e71ae33ca runtime/cgo: avoid errors from -Wdeclaration-after-statement
CL 652181 accidentally missed this iPhone only code.

For #71961

Change-Id: I567f8bb38958907442e69494da330d5199d11f54
Reviewed-on: https://go-review.googlesource.com/c/go/+/653135
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-02-26 14:39:37 -08:00
Mateusz Poliwczak c55c1cbd04 net: return proper error from Context
Sadly err was a named parameter so this did not cause
compile error.

Fixes #71974

Change-Id: I10cf29ae14c52d48a793c9a6cb01b01d79b1b356
GitHub-Last-Rev: 4dc0e6670a
GitHub-Pull-Request: golang/go#71976
Reviewed-on: https://go-review.googlesource.com/c/go/+/652815
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
2025-02-26 14:16:12 -08:00
Ian Lance Taylor 9cceaf8736 cmd/link: require cgo for -linkmode=external test
For #71416
Fixes #71957

Change-Id: I2180dada34d9dd2d3f5b0aaf8525951fd2e86a27
Reviewed-on: https://go-review.googlesource.com/c/go/+/652277
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-02-26 14:12:58 -08:00
Egon Elbre 983e30bd3b crypto/internal/fips140/edwards25519/field: optimize AMD64
Replace constant multiplication with shift and adds,
this reduces pressure on multiplications, making things overall
faster.

goos: windows
goarch: amd64
pkg: crypto/internal/fips140/edwards25519/field
cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
            │   v0.log~   │              v1.log~               │
            │   sec/op    │   sec/op     vs base               │
Add-32        4.768n ± 1%   4.763n ± 0%       ~ (p=0.683 n=20)
Multiply-32   20.93n ± 0%   19.48n ± 0%  -6.88% (p=0.000 n=20)
Square-32     15.88n ± 0%   15.00n ± 0%  -5.51% (p=0.000 n=20)
Invert-32     4.291µ ± 0%   4.072µ ± 0%  -5.10% (p=0.000 n=20)
Mult32-32     5.184n ± 0%   5.169n ± 0%  -0.30% (p=0.032 n=20)
Bytes-32      11.36n ± 0%   11.34n ± 0%       ~ (p=0.106 n=20)
geomean       27.15n        26.32n       -3.06%

Change-Id: I9c2f588fad29d89c3e6c712c092b32b66479f596
Reviewed-on: https://go-review.googlesource.com/c/go/+/652716
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
2025-02-26 13:58:20 -08:00