mirror of https://github.com/golang/go.git
62741 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
6fb7bdc96d |
cmd/compile: intrinsify math/bits.TrailingZeros on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.
On a StarFive VisionFive 2 with GORISCV64=rva22u64:
│ ctz.b.1 │ ctz.b.2 │
│ sec/op │ sec/op vs base │
TrailingZeros-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
TrailingZeros8-4 14.76n ± 0% 10.74n ± 0% -27.24% (p=0.000 n=10)
TrailingZeros16-4 26.84n ± 0% 10.74n ± 0% -59.99% (p=0.000 n=10)
TrailingZeros32-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
TrailingZeros64-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10)
geomean 23.09n 9.035n -60.88%
Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
|
|
|
|
e6ffe764cf |
strings: add FuzzReplace test
While reviewing CL 657935 I've notied there a couple tricky reslices that depends on multiple things being correct. Might as well fuzz it. Change-Id: Id78921bcb252e73a8a06e6deb4c920445a87d525 Reviewed-on: https://go-review.googlesource.com/c/go/+/658075 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
e0edd3e155 |
html/template: replace end-of-life link
Fix #65044 Change-Id: I5bf9c1cf2e9d3ae1e4bbb8f2653512c710db370b Reviewed-on: https://go-review.googlesource.com/c/go/+/555815 Auto-Submit: Sean Liao <sean@liao.dev> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
bb6a400028 |
os: use slices.Clone
Change-Id: I5a3de1b2fe2ebbb6437df5e7cc55e0d8d69c9cd7 Reviewed-on: https://go-review.googlesource.com/c/go/+/657915 Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
35139d6e45 |
runtime: log profile when mutex profile test fails
For #70602 Change-Id: I3f723ebc17ef690d5be7f4f948c9dd1f890196fd Reviewed-on: https://go-review.googlesource.com/c/go/+/658095 Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
|
|
|
21417518a9 |
cmd/compile: combine negation and word sign extension on riscv64
Use NEGW to produce a negated and sign extended word, rather than doing the same via two instructions: neg t0, t0 sext.w a0, t0 Becomes: negw t0, t0 Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212 Reviewed-on: https://go-review.googlesource.com/c/go/+/652319 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
10d070668c |
cmd/compile/internal/ssa: remove double negation with addition on riscv64
On riscv64, subtraction from a constant is typically implemented as an ADDI with the negative constant, followed by a negation. However this can lead to multiple NEG/ADDI/NEG sequences that can be optimised out. For example, runtime.(*_panic).nextDefer currently contains: lbu t0, 0(t0) addi t0, t0, -8 neg t0, t0 addi t0, t0, -7 neg t0, t0 Which is now optimised to: lbu t0, 0(t0) addi t0, t0, -1 Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80 Reviewed-on: https://go-review.googlesource.com/c/go/+/652318 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
a8f2e63f2f |
test/codegen: add a test for negation and conversion to int32
Codify the current code generation used on riscv64 in this case. Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae Reviewed-on: https://go-review.googlesource.com/c/go/+/652317 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
e1f9013a58 |
test/codegen: add riscv64 codegen for arithmetic tests
Codify the current riscv64 code generation for various subtract from constant and addition/subtraction tests. Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729 Reviewed-on: https://go-review.googlesource.com/c/go/+/652316 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
c01fa0cc21 |
test/codegen: add riscv64/rva23u64 specifiers to existing tests
Tests that exist for riscv64/rva22u64 should also be applied to riscv64/rva23u64. Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842 Reviewed-on: https://go-review.googlesource.com/c/go/+/652315 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
3c3b8dd4f0 |
internal/runtime/atomic: add Xchg8 for s390x and wasm
This makes the single-byte atomic.Xchg8 operation available on all GOARCHes, including those without direct / single-instruction support. Fixes #69735 Change-Id: Icb6aff8f907257db81ea440dc4d29f96b3cff6c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/657936 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
|
|
|
580b6ee646 |
cmd/go: enable fuzz testing on OpenBSD
This change provides support for -fuzz flag on OpenBSD. According to #46554 the flag was unsupported on some OSes due to lack of proper testing.
Fixes: #60491
Change-Id: I49835131d3ee23f6482583b518b9c5c224fc4efe
GitHub-Last-Rev:
|
|
|
|
5bb73e6504 |
debug/elf: add riscv attributes definitions
This CL adds `riscv.attributes` related ELF section header type and program header type according to [RISC-V ELF Specification](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/v1.0/riscv-abi.pdf) Also an riscv64/linux testcase binary built from: ``` gcc -march=rv64g -no-pie -o gcc-riscv64-linux-exec hello.c strip gcc-riscv64-linux-exec ``` Fixes #72843 Change-Id: I7710a0516f69141c0efaba71dd997f05b4c88421 Reviewed-on: https://go-review.googlesource.com/c/go/+/657515 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
853b514417 |
time: optimize quote using byte(c) for ASCII
Since c < runeSelf && c >= ' ' (i.e., 32 <= c < 128), using buf = append(buf, byte(c)) instead of buf = append(buf, string(c)...) is a better choice, as it provides better performance.
Change-Id: Ic0ab25c71634a1814267f4d85be2ebd8a3d44676
GitHub-Last-Rev:
|
|
|
|
c1c7e5902f |
test/codegen: tighten the TrailingZeros64 test on 386
Make the TrailingZeros64 code generation check more specific for 386. Just checking for BSFL will match both the generic 64 bit decomposition and the custom 386 lowering. Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/656996 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
577bb3d0ce |
runtime: only set isExtraInC if there are no Go frames left
mp.isExtraInC is intended to indicate that this M has no Go frames at all; it is entirely executing in C. If there was a cgocallback to Go and then a cgocall to C, such that the leaf frames are C, that is fine. e.g., traceback can handle this fine with SetCgoTraceback (or by simply skipping the C frames). However, we currently mismanage isExtraInC, unconditionally setting it on return from cgocallback. This means that if there are two levels of cgocallback, we end up running Go code with isExtraInC set. 1. C-created thread calls into Go function 1 (via cgocallback). 2. Go function 1 calls into C function 1 (via cgocall). 3. C function 1 calls into Go function 2 (via cgocallback). 4. Go function 2 returns back to C function 1 (returning via the remainder of cgocallback). 5. C function 1 returns back to Go function 1 (returning via the remainder of cgocall). 6. Go function 1 is now running with mp.isExtraInC == true. The fix is simple; only set isExtraInC on return from cgocallback if there are no more Go frames. There can't be more Go frames unless there is an active cgocall out of the Go frames. Fixes #72870. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I6a6a636c4e7ba75a29639d7036c5af3738033467 Reviewed-on: https://go-review.googlesource.com/c/go/+/658035 Reviewed-by: Cherry Mui <cherryyz@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
7e3d2aa69f |
encoding/asn1: make sure implicit fields roundtrip
Make sure Marshal and Unmarshal support the same field tags for implicit encoding choices. In particular this adds support for Unmarshalling implicitly tagged GeneralizedTime fields. Also add tests and update the docs. Fixes #72078 Change-Id: I21465ee4bcd73a7db0d0c36b2df53cabfc480185 Reviewed-on: https://go-review.googlesource.com/c/go/+/654275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
a1ddbdd3ef |
cmd/compile: don't move nilCheck operations during tighten
Nil checks need to stay in their original blocks. They cannot be moved to a following conditionally-executed block. Fixes #72860 Change-Id: Ic2d66cdf030357d91f8a716a004152ba4c016f77 Reviewed-on: https://go-review.googlesource.com/c/go/+/657715 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
80f068928f |
cmd/internal/obj/loong64: add {V,XV}{FSQRT/FRECIP/FRSQRT}.{S/D} instructions support
Go asm syntax:
V{FSQRT/FRECIP/FRSQRT}{F/D} VJ, VD
XV{FSQRT/FRECIP/FRSQRT}{F/D} XJ, XD
Equivalent platform assembler syntax:
v{fsqrt/frecip/frsqrt}.{s/d} vd, vj
xv{fsqrt/frecip/frsqrt}.{s/d} xd, xj
Change-Id: I3fdbe3193659d7532164451b087ccf725053172f
Reviewed-on: https://go-review.googlesource.com/c/go/+/636395
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
|
|
937368f84e |
crypto/x509: change how we retrieve chains on darwin
Instead of using the deprecated SecTrustGetCertificateAtIndex and SecTrustGetCertificateCount method, use the SecTrustCopyCertificateChain method. This method require macOS 12+, which will be the minimum supported version in 1.25. Change-Id: I9a5ef75431cdb84f1cbe4eee47e6e9e2da4dea03 Reviewed-on: https://go-review.googlesource.com/c/go/+/654376 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> |
|
|
|
3b456ff421 |
crypto/x509,ecoding/asn1: better handling of weird encodings
For various cursed reasons we need to support the BMPString and T61String ASN.1 string encodings. These types use the defunct UCS-2 and T.61 character encodings respectively. This change rejects some characters when decoding BMPStrings which are not valid in UCS-2, and properly parses T61Strings instead of treating them as plain UTF-8. While still not perfect, this matches the behavior of most other implementations, particularly BoringSSL. Ideally we'd just remove support for these ASN.1 types (particularly in crypto/x509, where we don't actually expose any API), but doing so is likely to break some deploy certificates which unfortunately still use these types in DNs, despite them being deprecated since 1999/2002. Fixes #71862 Change-Id: Ib8f392656a35171e48eaf71a200be6d7605b2f02 Reviewed-on: https://go-review.googlesource.com/c/go/+/651275 Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
d704ef7606 |
crypto/tls/internal/fips140tls: use crypto/fips140
There is no need for fips140tls to depend on an internal package, it can use crypto/fips140 directly. Both approaches are equivalent, but using crypto/fips140 makes us exercise a public API and sets precedence. Change-Id: I668e80ee62b711bc60821cee3a54232a33295ee1 Reviewed-on: https://go-review.googlesource.com/c/go/+/642035 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Filippo Valsorda <filippo@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com> |
|
|
|
6114b69e0c |
crypto/tls: relax native FIPS 140-3 mode
We are going to stick to BoringSSL's policy for Go+BoringCrypto, but when using the native FIPS 140-3 module we can allow Ed25519, ML-KEM, and P-521. NIST SP 800-52r2 is stricter, but it only applies to some entities, so they can restrict the profile with Config. Fixes #71757 Change-Id: I6a6a4656eb02e56d079f0a22f98212275a40a679 Reviewed-on: https://go-review.googlesource.com/c/go/+/650576 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
59afdd3ed0 |
crypto/tls: clean up supported/default/allowed parameters
Cleaned up a lot of the plumbing to make it consistently follow this logic: clone the preference order; filter by user preference; filter by FIPS policy. There should be no behavior changes. Updates #71757 Change-Id: I6a6a4656eb02e56d079f0a22f98212275a400000 Reviewed-on: https://go-review.googlesource.com/c/go/+/657096 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> |
|
|
|
0f5d86c5a0 |
cmd/go: permit additional cflags when compiling
In CL 475375 the Go command started to generate the "preferlinkext" token file for "strange/dangerous" compiler flags. This serves as a hint to the Go linker whether to call the external linker or not. Permit compiler flag used by the hermetic_cc_toolchain bzlmod. As a side effect, it also allows these flags to appear in #cgo directives in source code. We don't know of any cases where that is actually useful, but it appears to be harmless and simplifies the implementation of the internal linking change. Fixes #72842 Change-Id: Ic6de29b535a4e2c0720f383567ea6b3c7ca4f541 Reviewed-on: https://go-review.googlesource.com/c/go/+/657575 Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Matloob <matloob@golang.org> |
|
|
|
6bd5741a4c |
crypto/tls: add ConnectionState.CurveID
This required adding a new field to SessionState for TLS 1.0–1.2, since the key exchange is not repeated on resumption. The additional field is unfortunately not backwards compatible because current Go versions check that the encoding has no extra data at the end, but will cause cross-version tickets to be ignored. Relaxed that so we can add fields in a backwards compatible way the next time. For the cipher suite, we check that the session's is still acceptable per the Config. That would arguably make sense here, too: if a Config for example requires PQ, we should reject resumptions of connections that didn't use PQ. However, that only applies to pre-TLS 1.3 connections, since in TLS 1.3 we always do a fresh key exchange on resumption. Since PQ is the only main differentiator between key exchanges (aside from off-by-default non-PFS RSA, which are controlled by the cipher suite in TLS 1.0–1.2) and it's PQ-only, we can skip that check. Fixes #67516 Change-Id: I6a6a465681a6292edf66c7b8df8f4aba4171a76b Reviewed-on: https://go-review.googlesource.com/c/go/+/653315 Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Roland Shoemaker <roland@golang.org> |
|
|
|
fbdd994166 |
crypto/tls: allow P-521 in FIPS 140-3 mode and Go+BoringCrypto
Partially reverts CL 587296, restoring the Go+BoringCrypto 1.23 behavior in terms of supported curves. Updates #71757 Change-Id: I6a6a465651a8407056fd0fae091d10a945b37997 Reviewed-on: https://go-review.googlesource.com/c/go/+/657095 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> |
|
|
|
12ea4f7785 |
doc/next: add release note for new toolchain line behavior
The go command will now no longer update the toolchain line implicitly to the local toolchain version when updating the go line. Document that in a release note. For #65847 Change-Id: I4e970d881a43c22292fe9fa65a9835d0214ef7bf Reviewed-on: https://go-review.googlesource.com/c/go/+/657178 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
f3c69c2e78 |
cmd/go/internal/load,cmd/link/internal/ld: use objabi.LookupPkgSpecial(pkg).Runtime
As suggested by Michael in CL 655515.
Change-Id: Idf0b879287bd777d03443aebc7351fcb0d724885
GitHub-Last-Rev:
|
|
|
|
fb8691edae |
syscall: use testing.T.Context
Change-Id: I62763878d51598bf1ae0a4e75441e1d3a4b86aa3 Reviewed-on: https://go-review.googlesource.com/c/go/+/656955 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
af92bb594d |
test/codegen: remove plan9/amd64 specific array zeroing/copying tests
The compiler previously avoided the use of MOVUPS on plan9/amd64. This was changed in CL 655875, however the codegen tests were not updated and now fail (seemingly the full codegen tests do not run anywhere, not even on the longtest builders). Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/656997 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Auto-Submit: Keith Randall <khr@google.com> |
|
|
|
bdfa604b2e |
cmd/internal/dwarf: always use AT_ranges for scopes with DWARF 5
This patch extends the change in CL 657175 to apply the same abbrev selection strategy to single-range lexical scopes that we're now using for inlined routine bodies, when DWARF 5 is in effect. Ranges are more compact and use fewer relocation than explicit hi/lo PC values, so we might as well always use them. Updates #26379. Change-Id: Ieeaddf50e82acc4866010e29af32bcd1fb3b4f02 Reviewed-on: https://go-review.googlesource.com/c/go/+/657177 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
d7f58834cb |
doc/next: add tentative DWARF 5 release note fragment
Add a small fragment describing the move to DWARF 5 for this release, along with the name of the GOEXPERIMENT. Updates #26379. Change-Id: I3a30a71436133e2e0a5edf1ba0db84b9cc17cc5c Reviewed-on: https://go-review.googlesource.com/c/go/+/657176 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> |
|
|
|
8cdef129fb |
cmd/link: only check PIE size difference when the linkmode is the same
Currently we check the size difference between non-PIE and PIE binaries without specifying a linkmode (and that is presumed to be internal). However, on some platforms (like openbsd/arm64), the use of -buildmode=pie results in external linking. Ensure that we only test internally linked non-PIE against internally linked PIE and externally linked non-PIE against externally linked PIE, avoiding unexpected differences. Fixes #72818 Change-Id: I7e1da0976a4b5de387a59d0d6c04f58498a8eca0 Reviewed-on: https://go-review.googlesource.com/c/go/+/657035 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@golang.org> |
|
|
|
b143c98169 |
cmd/compile: simplify bounded shift on loong64
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.1005n ± 0% 0.8425n ± 1% -23.44% (p=0.000 n=10)
LeadingZeros8 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.001 n=10)
LeadingZeros16 1.502n ± 0% 1.501n ± 0% -0.07% (p=0.000 n=10)
LeadingZeros32 0.9511n ± 0% 0.8050n ± 0% -15.36% (p=0.000 n=10)
LeadingZeros64 1.1195n ± 0% 0.8423n ± 0% -24.76% (p=0.000 n=10)
TrailingZeros 0.8086n ± 0% 0.8005n ± 0% -1.00% (p=0.000 n=10)
TrailingZeros8 1.031n ± 1% 1.035n ± 1% ~ (p=0.136 n=10)
TrailingZeros16 0.8114n ± 0% 0.8254n ± 1% +1.73% (p=0.000 n=10)
TrailingZeros32 0.8090n ± 0% 0.8005n ± 0% -1.05% (p=0.000 n=10)
TrailingZeros64 0.8089n ± 1% 0.8005n ± 0% -1.04% (p=0.000 n=10)
OnesCount 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
OnesCount16 0.9344n ± 0% 1.2010n ± 0% +28.53% (p=0.000 n=10)
OnesCount32 0.8677n ± 0% 1.2010n ± 0% +38.41% (p=0.000 n=10)
OnesCount64 1.2010n ± 0% 0.8671n ± 0% -27.80% (p=0.000 n=10)
RotateLeft 0.8009n ± 0% 0.6671n ± 0% -16.71% (p=0.000 n=10)
RotateLeft8 1.202n ± 0% 1.327n ± 0% +10.40% (p=0.000 n=10)
RotateLeft16 0.8036n ± 0% 0.8218n ± 0% +2.26% (p=0.000 n=10)
RotateLeft32 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
RotateLeft64 0.6674n ± 0% 0.8004n ± 0% +19.94% (p=0.000 n=10)
Reverse 0.4067n ± 1% 0.4122n ± 1% +1.38% (p=0.001 n=10)
Reverse8 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Reverse16 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
Reverse32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.001 n=10)
Reverse64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.008 n=10)
ReverseBytes 0.4057n ± 1% 0.4133n ± 1% +1.90% (p=0.000 n=10)
ReverseBytes16 0.8009n ± 0% 0.8004n ± 0% -0.07% (p=0.000 n=10)
ReverseBytes32 0.8009n ± 0% 0.8005n ± 0% -0.05% (p=0.000 n=10)
ReverseBytes64 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add32 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Add64multiple 1.832n ± 0% 1.828n ± 0% -0.22% (p=0.001 n=10)
Sub 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub32 1.602n ± 0% 1.601n ± 0% -0.06% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=0.474 n=10)
Sub64multiple 2.402n ± 0% 2.400n ± 0% -0.10% (p=0.000 n=10)
Mul 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul32 0.8009n ± 0% 0.8004n ± 0% -0.06% (p=0.000 n=10)
Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Div 9.083n ± 0% 7.638n ± 0% -15.91% (p=0.000 n=10)
Div32 4.011n ± 0% 4.009n ± 0% -0.05% (p=0.000 n=10)
Div64 9.711n ± 0% 8.204n ± 0% -15.51% (p=0.000 n=10)
geomean 1.083n 1.078n -0.40%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 627855 | this CL |
| sec/op | sec/op vs base |
LeadingZeros 1.341n ± 4% 1.331n ± 2% -0.71% (p=0.008 n=10)
LeadingZeros8 1.781n ± 0% 1.766n ± 1% -0.84% (p=0.011 n=10)
LeadingZeros16 1.782n ± 0% 1.767n ± 0% -0.79% (p=0.001 n=10)
LeadingZeros32 1.341n ± 1% 1.333n ± 0% -0.52% (p=0.001 n=10)
LeadingZeros64 1.338n ± 0% 1.333n ± 0% -0.37% (p=0.008 n=10)
TrailingZeros 0.9025n ± 0% 0.8077n ± 0% -10.50% (p=0.000 n=10)
TrailingZeros8 1.056n ± 0% 1.089n ± 1% +3.17% (p=0.001 n=10)
TrailingZeros16 1.101n ± 0% 1.102n ± 0% +0.09% (p=0.011 n=10)
TrailingZeros32 0.9024n ± 1% 0.8083n ± 0% -10.43% (p=0.000 n=10)
TrailingZeros64 0.9028n ± 1% 0.8087n ± 0% -10.43% (p=0.000 n=10)
OnesCount 1.482n ± 1% 1.302n ± 0% -12.15% (p=0.000 n=10)
OnesCount8 1.206n ± 0% 1.207n ± 2% +0.12% (p=0.000 n=10)
OnesCount16 1.534n ± 0% 1.402n ± 0% -8.58% (p=0.000 n=10)
OnesCount32 1.531n ± 1% 1.302n ± 0% -14.99% (p=0.000 n=10)
OnesCount64 1.302n ± 0% 1.538n ± 1% +18.16% (p=0.000 n=10)
RotateLeft 0.8083n ± 0% 0.8087n ± 1% ~ (p=0.579 n=10)
RotateLeft8 1.310n ± 0% 1.323n ± 0% +0.95% (p=0.001 n=10)
RotateLeft16 1.149n ± 0% 1.165n ± 1% +1.35% (p=0.001 n=10)
RotateLeft32 0.8093n ± 0% 0.8105n ± 0% ~ (p=0.393 n=10)
RotateLeft64 0.8088n ± 0% 0.8090n ± 0% ~ (p=0.739 n=10)
Reverse 0.5109n ± 0% 0.5172n ± 1% +1.25% (p=0.000 n=10)
Reverse8 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.002 n=10)
Reverse32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.000 n=10)
Reverse64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes 0.5122n ± 2% 0.5182n ± 1% ~ (p=0.060 n=10)
ReverseBytes16 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes32 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.005 n=10)
ReverseBytes64 0.8010n ± 0% 0.8011n ± 0% +0.01% (p=0.001 n=10)
Add 1.201n ± 4% 1.202n ± 0% +0.08% (p=0.028 n=10)
Add32 1.201n ± 0% 1.202n ± 2% +0.08% (p=0.014 n=10)
Add64 1.201n ± 1% 1.202n ± 0% +0.08% (p=0.025 n=10)
Add64multiple 1.902n ± 0% 1.913n ± 0% +0.55% (p=0.004 n=10)
Sub 1.201n ± 0% 1.202n ± 3% +0.08% (p=0.001 n=10)
Sub32 1.654n ± 0% 1.656n ± 1% ~ (p=0.117 n=10)
Sub64 1.201n ± 0% 1.202n ± 0% +0.08% (p=0.001 n=10)
Sub64multiple 2.180n ± 4% 2.159n ± 1% -0.96% (p=0.006 n=10)
Mul 0.9345n ± 0% 0.9346n ± 0% +0.01% (p=0.000 n=10)
Mul32 1.030n ± 0% 1.050n ± 1% +1.94% (p=0.000 n=10)
Mul64 0.9345n ± 0% 0.9346n ± 1% +0.01% (p=0.000 n=10)
Div 11.57n ± 1% 11.12n ± 0% -3.85% (p=0.000 n=10)
Div32 4.337n ± 1% 4.341n ± 1% ~ (p=0.286 n=10)
Div64 12.76n ± 0% 12.02n ± 3% -5.80% (p=0.000 n=10)
geomean 1.252n 1.235n -1.32%
Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
b10c35945d |
cmd/internal/obj/loong64: add {V,XV}DIV{B/H/W/V}[U] and {V,XV}MOD{B/H/W/V}[U] instructions support
Go asm syntax:
VDIV{B/H/W/V}[U] VK, VJ, VD
XVDIV{B/H/W/V}[U] XK, XJ, XD
VMOD{B/H/W/V}[U] VK, VJ, VD
XVMOD{B/H/W/V}[U] XK, XJ, XD
Equivalent platform assembler syntax:
vdiv.{b/h/w/d}[u] vd, vj, vk
xvdiv.{b/h/w/d}[u] xd, xj, xk
vmod.{b/h/w/d}[u] vd, vj, vk
xvmod.{b/h/w/d}[u] xd, xj, xk
Change-Id: I3676721c3c415de0f2ebbd480ecd1b2400a28dba
Reviewed-on: https://go-review.googlesource.com/c/go/+/636376
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
|
|
d729053edf |
mime/multipart: add helper to build content-disposition header contents
This PR adds an helper FileContentDisposition that builds multipart
Content-Disposition header contents with field name and file name,
escaping quotes and escape characters.
The function is then called in the related helper CreateFormFile.
The new function allows users to add other custom MIMEHeaders,
without having to rewrite the char escaping logic of field name and
file name, which is provided by the new helper.
Fixes #46771
Change-Id: Ifc82a79583feb6dd609ca1e6024e612fb58c05ce
GitHub-Last-Rev:
|
|
|
|
a68bf75d34 |
cmd/go: don't write own toolchain line when updating go line
The Go command had a behavior of writing its own toolchain name when updating the go line in a go.mod (for example when a user runs go get go@version). This behavior was often undesirable and the toolchain line was often removed by users before checking in go.mod files (including in the x/ repos). It also led to user confusion. This change removes that behavior. A toolchain line will not be added if one wasn't present before. The toolchain line can still be removed though: the toolchain line must be at least the go version, so if the go version is increased above the toolchain version, the toolchain version will be bumped up to that go version. Then the toolchain line will then be dropped because go <version> implies toolchain <version>. Making this change slightly hurts reproducability because future go commands run on the go.mod file may be run with a different toolchain than the one that used it, but that doesn't seem to be worth the confusion the behavior resulted in. We expect this change will not have negative consequences, but it could be possible, and we would like to hear from any users that depended on the previous behavior in case we need to roll it back before the release. Fixes #65847 Change-Id: Id795b7f762e4f90ba0fa8c7935d03f32dfc8590e Reviewed-on: https://go-review.googlesource.com/c/go/+/656835 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
485480faaa |
net: deflake recently added TestCloseUnblocksReadUDP
Fixes #72802 Change-Id: I0dd457ef81a354f61c9de306e4609efdbe3d69b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/656857 Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Bypass: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
955cf0873f |
cmd/internal/dwarf: fix bug in inlined func DIE range DWARF 5 info
This patch changes the strategy we use in the compiler for handling range information for inlined subroutine bodies, fixing a bug in how this was handled for DWARF 5. The high and lo PC values being emitted for DW_TAG_inlined_subroutine DIEs were incorrect, pointing to the start of functions instead of the proper location. The fix in this patch is to move to unconditionally using DW_AT_ranges for inlined subroutines, even those with only a single range. Background: prior to this point, if a given inlined function body had a single contiguous range, we'd pick an abbrev entry for it with explicit DW_AT_low_pc and DW_AT_high_pc attributes. If the extent of the code for the inlined body was not contiguous (which can happen), we'd select an abbrev that used a DW_AT_ranges attribute instead. This strategy (preferring explicit hi/lo PC attrs for a single-range func) made sense for DWARF 4, since in DWARF 4 the representation used in the .debug_ranges section was especially heavyweight (lots of space, lots of relocations), so having explicit hi/lo PC attrs was less expensive. With DWARF 5 range info is written to the .debug_rnglists section, and the representation here is much more compact. Specifically, a single hi/lo range can be represented using a base address in addrx format (max of 4 bytes, but more likely 2 or 3) followed by start and endpoints of the range in ULEB128 format. This combination is more compact spacewise than the explicit hi/lo values, and has fewer relocations (0 as opposed to 2). Note: we should at some point consider applying this same strategy to lexical scopes, since we can probably reap some of the same benefits there as well. Updates #26379. Fixes #72821. Change-Id: Ifb65ecc6221601bad2ca3939f9b69964c1fafc7c Reviewed-on: https://go-review.googlesource.com/c/go/+/657175 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> |
|
|
|
bec12f153a |
log/slog: optimize appendKey to reduce allocations
This change introduces a new method, `appendTwoStrings`, which
optimizes the `appendKey` function by avoiding the allocation of a
temporary string (string concatenation of prefix and key). Instead, it
directly appends the prefix and key to the buffer.
Additionally, added `BenchmarkAppendKey` benchmark tests to validate performance improvements.
This change improves performance in cases where large prefixes are used,
as verified by the following benchmarks:
goos: darwin
goarch: arm64
pkg: log/slog
cpu: Apple M1 Max
│ old.out │ new.out │
│ sec/op │ sec/op vs base │
AppendKey/prefix_size_5-10 44.41n ± 0% 35.62n ± 0% -19.80% (p=0.000 n=10)
AppendKey/prefix_size_10-10 48.17n ± 0% 39.12n ± 0% -18.80% (p=0.000 n=10)
AppendKey/prefix_size_30-10 84.50n ± 0% 62.30n ± 0% -26.28% (p=0.000 n=10)
AppendKey/prefix_size_50-10 124.9n ± 0% 102.3n ± 0% -18.09% (p=0.000 n=10)
AppendKey/prefix_size_100-10 203.6n ± 1% 168.7n ± 0% -17.14% (p=0.000 n=10)
geomean 85.61n 68.41n -20.09%
│ old.out │ new.out │
│ B/op │ B/op vs base │
AppendKey/prefix_size_5-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
AppendKey/prefix_size_10-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
AppendKey/prefix_size_30-10 48.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10)
AppendKey/prefix_size_50-10 128.00 ± 0% 64.00 ± 0% -50.00% (p=0.000 n=10)
AppendKey/prefix_size_100-10 224.0 ± 0% 112.0 ± 0% -50.00% (p=0.000 n=10)
geomean ² ? ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean
│ old.out │ new.out │
│ allocs/op │ allocs/op vs base │
AppendKey/prefix_size_5-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
AppendKey/prefix_size_10-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
AppendKey/prefix_size_30-10 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
AppendKey/prefix_size_50-10 2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10)
AppendKey/prefix_size_100-10 2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10)
geomean ² ? ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean
This patch improves performance without altering the external behavior of the `slog` package.
Change-Id: I8b47718de522196f06e0ddac48af73e352d2e5cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/631415
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
|
|
7e8ceadf85 |
cmd/compile/internal/ssagen: use an alias for math/bits.Len
Rather than using a specific intrinsic for math/bits.Len, use a pair of aliases instead. This requires less code and automatically adapts when platforms have a math/bits.Len32 or math/bits.Len64 intrinsic. Change-Id: I28b300172daaee26ef82a7530d9e96123663f541 Reviewed-on: https://go-review.googlesource.com/c/go/+/656995 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> |
|
|
|
a812e5f3c3 |
math/big: update calibration tests and recalibrate
Refactor calibration tests to use the same logic for all. Choosing thresholds that are broadly appropriate for all systems is part science but also part guesswork and judgement. We could instead set per-GOOS/GOARCH thresholds, but that seems like too much work, and even then there would be variation between different chips within a GOOS/GOARCH. (For example see the three linux/amd64 systems benchmarked below.) The thresholds chosen in this CL are: karatsubaThreshold = 40 // unchanged basicSqrThreshold = 12 // was 20 karatsubaSqrThreshold = 80 // was 260 divRecursiveThreshold = 40 // was 100 The new file calibrate.md explains the calibration process and links to graphs justifying those values. (The graphs are hosted on swtch.com to avoid adding a megabyte of extra data to the Go repo and Go distributions.) A rendered copy of calibrate.md is at https://swtch.com/math/big/calibrate.html. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.494 n=15) Div/40/20-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.137 n=15) Div/100/50-88 25.50n ± 0% 25.51n ± 0% ~ (p=0.038 n=15) Div/200/100-88 113.1n ± 1% 116.0n ± 3% +2.56% (p=0.000 n=15) Div/400/200-88 135.3n ± 0% 137.1n ± 1% ~ (p=0.004 n=15) Div/1000/500-88 259.9n ± 1% 259.0n ± 2% ~ (p=0.182 n=15) Div/2000/1000-88 568.8n ± 1% 564.7n ± 3% ~ (p=0.927 n=15) Div/20000/10000-88 25.79µ ± 1% 22.11µ ± 2% -14.26% (p=0.000 n=15) Div/200000/100000-88 755.1µ ± 1% 737.6µ ± 1% -2.32% (p=0.000 n=15) Div/2000000/1000000-88 31.30m ± 0% 31.20m ± 1% ~ (p=0.081 n=15) Div/20000000/10000000-88 1.268 ± 0% 1.265 ± 0% ~ (p=0.011 n=15) NatMul/10-88 142.6n ± 0% 142.9n ± 7% ~ (p=0.145 n=15) NatMul/100-88 4.347µ ± 0% 4.350µ ± 3% ~ (p=0.430 n=15) NatMul/1000-88 187.6µ ± 0% 188.4µ ± 2% ~ (p=0.004 n=15) NatMul/10000-88 8.052m ± 0% 8.057m ± 1% ~ (p=0.148 n=15) NatMul/100000-88 260.6m ± 0% 260.7m ± 0% ~ (p=0.512 n=15) NatSqr/1-88 26.58n ± 5% 27.96n ± 8% ~ (p=0.574 n=15) NatSqr/2-88 42.35n ± 7% 44.87n ± 6% ~ (p=0.690 n=15) NatSqr/3-88 53.28n ± 4% 55.62n ± 5% ~ (p=0.151 n=15) NatSqr/5-88 76.26n ± 6% 81.43n ± 6% +6.78% (p=0.000 n=15) NatSqr/8-88 110.8n ± 5% 116.4n ± 6% ~ (p=0.040 n=15) NatSqr/10-88 141.4n ± 4% 147.8n ± 4% ~ (p=0.011 n=15) NatSqr/20-88 325.8n ± 3% 341.7n ± 4% +4.88% (p=0.000 n=15) NatSqr/30-88 536.8n ± 3% 556.1n ± 4% ~ (p=0.027 n=15) NatSqr/50-88 1.168µ ± 3% 1.197µ ± 3% ~ (p=0.442 n=15) NatSqr/80-88 2.527µ ± 2% 2.480µ ± 2% -1.86% (p=0.000 n=15) NatSqr/100-88 3.771µ ± 2% 3.535µ ± 2% -6.26% (p=0.000 n=15) NatSqr/200-88 14.03µ ± 2% 10.57µ ± 3% -24.68% (p=0.000 n=15) NatSqr/300-88 24.06µ ± 2% 20.57µ ± 2% -14.52% (p=0.000 n=15) NatSqr/500-88 65.43µ ± 1% 45.45µ ± 1% -30.55% (p=0.000 n=15) NatSqr/800-88 126.41µ ± 1% 94.13µ ± 2% -25.54% (p=0.000 n=15) NatSqr/1000-88 196.4µ ± 1% 135.1µ ± 1% -31.18% (p=0.000 n=15) NatSqr/10000-88 6.404m ± 0% 5.326m ± 1% -16.84% (p=0.000 n=15) NatSqr/100000-88 267.2m ± 0% 198.7m ± 0% -25.64% (p=0.000 n=15) geomean 7.318µ 6.948µ -5.06% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.973 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.226 n=15) Div/100/50-16 55.27n ± 1% 55.59n ± 0% ~ (p=0.004 n=15) Div/200/100-16 174.7n ± 3% 175.9n ± 2% ~ (p=0.645 n=15) Div/400/200-16 208.8n ± 1% 209.5n ± 2% ~ (p=0.169 n=15) Div/1000/500-16 378.7n ± 2% 380.5n ± 2% ~ (p=0.091 n=15) Div/2000/1000-16 778.4n ± 1% 781.1n ± 2% ~ (p=0.104 n=15) Div/20000/10000-16 25.16µ ± 1% 24.93µ ± 1% -0.91% (p=0.000 n=15) Div/200000/100000-16 926.4µ ± 0% 927.7µ ± 1% ~ (p=0.436 n=15) Div/2000000/1000000-16 35.58m ± 0% 35.53m ± 0% ~ (p=0.267 n=15) Div/20000000/10000000-16 1.333 ± 0% 1.330 ± 0% ~ (p=0.126 n=15) NatMul/10-16 172.6n ± 0% 165.4n ± 0% -4.17% (p=0.000 n=15) NatMul/100-16 5.706µ ± 0% 5.503µ ± 0% -3.56% (p=0.000 n=15) NatMul/1000-16 220.8µ ± 0% 219.1µ ± 0% -0.76% (p=0.000 n=15) NatMul/10000-16 8.688m ± 0% 8.621m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 333.3m ± 0% 333.5m ± 0% ~ (p=0.512 n=15) NatSqr/1-16 28.66n ± 1% 28.42n ± 3% -0.84% (p=0.000 n=15) NatSqr/2-16 48.29n ± 2% 48.19n ± 2% ~ (p=0.042 n=15) NatSqr/3-16 59.93n ± 0% 59.64n ± 2% -0.48% (p=0.000 n=15) NatSqr/5-16 88.05n ± 0% 87.89n ± 3% ~ (p=0.066 n=15) NatSqr/8-16 127.7n ± 0% 126.9n ± 3% -0.63% (p=0.000 n=15) NatSqr/10-16 170.4n ± 0% 169.7n ± 3% ~ (p=0.004 n=15) NatSqr/20-16 388.8n ± 0% 392.9n ± 3% ~ (p=0.123 n=15) NatSqr/30-16 635.2n ± 0% 641.7n ± 3% ~ (p=0.123 n=15) NatSqr/50-16 1.304µ ± 1% 1.314µ ± 3% ~ (p=0.927 n=15) NatSqr/80-16 2.709µ ± 1% 2.899µ ± 4% +7.01% (p=0.000 n=15) NatSqr/100-16 3.885µ ± 0% 3.981µ ± 4% ~ (p=0.123 n=15) NatSqr/200-16 13.29µ ± 2% 12.14µ ± 4% -8.67% (p=0.000 n=15) NatSqr/300-16 23.39µ ± 0% 22.51µ ± 3% -3.78% (p=0.000 n=15) NatSqr/500-16 58.13µ ± 1% 50.56µ ± 2% -13.02% (p=0.000 n=15) NatSqr/800-16 118.4µ ± 1% 107.6µ ± 2% -9.11% (p=0.000 n=15) NatSqr/1000-16 172.7µ ± 1% 151.8µ ± 2% -12.11% (p=0.000 n=15) NatSqr/10000-16 6.065m ± 1% 5.757m ± 1% -5.08% (p=0.000 n=15) NatSqr/100000-16 240.9m ± 0% 228.1m ± 0% -5.32% (p=0.000 n=15) geomean 8.601µ 8.453µ -1.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-32 11.11n ± 0% 11.11n ± 1% ~ (p=0.532 n=15) Div/40/20-32 11.08n ± 1% 11.11n ± 0% ~ (p=0.815 n=15) Div/100/50-32 16.81n ± 0% 16.84n ± 29% ~ (p=0.020 n=15) Div/200/100-32 73.91n ± 0% 76.85n ± 11% +3.98% (p=0.000 n=15) Div/400/200-32 87.35n ± 0% 88.91n ± 34% +1.79% (p=0.000 n=15) Div/1000/500-32 169.3n ± 1% 168.9n ± 1% ~ (p=0.049 n=15) Div/2000/1000-32 369.3n ± 0% 369.0n ± 0% ~ (p=0.108 n=15) Div/20000/10000-32 15.92µ ± 0% 13.55µ ± 2% -14.91% (p=0.000 n=15) Div/200000/100000-32 491.4µ ± 0% 482.4µ ± 1% -1.84% (p=0.000 n=15) Div/2000000/1000000-32 20.09m ± 0% 19.96m ± 0% -0.69% (p=0.000 n=15) Div/20000000/10000000-32 756.5m ± 0% 755.5m ± 0% ~ (p=0.089 n=15) NatMul/10-32 125.4n ± 5% 124.8n ± 1% ~ (p=0.588 n=15) NatMul/100-32 2.952µ ± 3% 2.969µ ± 0% ~ (p=0.237 n=15) NatMul/1000-32 120.7µ ± 0% 121.1µ ± 0% +0.30% (p=0.000 n=15) NatMul/10000-32 4.845m ± 0% 4.839m ± 1% ~ (p=0.653 n=15) NatMul/100000-32 173.3m ± 0% 173.3m ± 0% ~ (p=0.838 n=15) NatSqr/1-32 31.18n ± 23% 32.08n ± 2% ~ (p=0.015 n=15) NatSqr/2-32 57.22n ± 28% 58.88n ± 2% ~ (p=0.054 n=15) NatSqr/3-32 61.34n ± 18% 64.33n ± 2% ~ (p=0.237 n=15) NatSqr/5-32 72.47n ± 17% 79.81n ± 3% ~ (p=0.067 n=15) NatSqr/8-32 83.26n ± 26% 100.10n ± 3% ~ (p=0.016 n=15) NatSqr/10-32 87.31n ± 43% 125.50n ± 2% ~ (p=0.003 n=15) NatSqr/20-32 193.5n ± 25% 244.4n ± 13% ~ (p=0.002 n=15) NatSqr/30-32 323.9n ± 17% 380.9n ± 6% ~ (p=0.003 n=15) NatSqr/50-32 713.4n ± 9% 761.7n ± 8% ~ (p=0.419 n=15) NatSqr/80-32 1.486µ ± 7% 1.609µ ± 5% +8.28% (p=0.000 n=15) NatSqr/100-32 2.115µ ± 9% 2.253µ ± 1% ~ (p=0.104 n=15) NatSqr/200-32 7.201µ ± 4% 6.610µ ± 1% -8.21% (p=0.000 n=15) NatSqr/300-32 13.08µ ± 2% 12.37µ ± 1% -5.41% (p=0.000 n=15) NatSqr/500-32 32.56µ ± 2% 27.83µ ± 2% -14.52% (p=0.000 n=15) NatSqr/800-32 66.83µ ± 3% 59.59µ ± 1% -10.83% (p=0.000 n=15) NatSqr/1000-32 98.09µ ± 1% 83.59µ ± 1% -14.78% (p=0.000 n=15) NatSqr/10000-32 3.445m ± 1% 3.245m ± 0% -5.81% (p=0.000 n=15) NatSqr/100000-32 137.3m ± 0% 127.0m ± 0% -7.54% (p=0.000 n=15) geomean 4.897µ 4.972µ +1.52% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.26n ± 2% 15.14n ± 1% ~ (p=0.212 n=15) Div/40/20-16 15.22n ± 1% 15.16n ± 0% ~ (p=0.190 n=15) Div/100/50-16 26.53n ± 2% 26.42n ± 0% -0.41% (p=0.000 n=15) Div/200/100-16 124.3n ± 0% 124.0n ± 0% ~ (p=0.704 n=15) Div/400/200-16 142.4n ± 0% 141.8n ± 0% ~ (p=0.074 n=15) Div/1000/500-16 262.0n ± 1% 261.3n ± 1% ~ (p=0.046 n=15) Div/2000/1000-16 532.6n ± 0% 532.5n ± 1% ~ (p=0.798 n=15) Div/20000/10000-16 22.27µ ± 0% 22.88µ ± 0% +2.73% (p=0.000 n=15) Div/200000/100000-16 890.4µ ± 0% 902.8µ ± 0% +1.39% (p=0.000 n=15) Div/2000000/1000000-16 35.03m ± 0% 35.10m ± 0% ~ (p=0.305 n=15) Div/20000000/10000000-16 1.380 ± 0% 1.385 ± 0% ~ (p=0.019 n=15) NatMul/10-16 177.6n ± 1% 175.6n ± 3% ~ (p=0.480 n=15) NatMul/100-16 5.675µ ± 0% 5.669µ ± 1% ~ (p=0.705 n=15) NatMul/1000-16 224.3µ ± 0% 224.6µ ± 0% ~ (p=0.653 n=15) NatMul/10000-16 8.735m ± 0% 8.739m ± 0% ~ (p=0.567 n=15) NatMul/100000-16 331.6m ± 0% 331.6m ± 1% ~ (p=0.412 n=15) NatSqr/1-16 43.69n ± 2% 42.77n ± 6% ~ (p=0.383 n=15) NatSqr/2-16 65.26n ± 2% 63.91n ± 5% ~ (p=0.285 n=15) NatSqr/3-16 73.95n ± 1% 72.25n ± 6% ~ (p=0.198 n=15) NatSqr/5-16 95.06n ± 1% 94.21n ± 3% ~ (p=0.721 n=15) NatSqr/8-16 155.5n ± 1% 153.4n ± 4% ~ (p=0.170 n=15) NatSqr/10-16 175.4n ± 1% 174.0n ± 2% ~ (p=0.271 n=15) NatSqr/20-16 360.8n ± 0% 358.5n ± 2% ~ (p=0.170 n=15) NatSqr/30-16 584.7n ± 0% 582.9n ± 1% ~ (p=0.170 n=15) NatSqr/50-16 1.323µ ± 0% 1.322µ ± 0% ~ (p=0.627 n=15) NatSqr/80-16 2.916µ ± 0% 2.674µ ± 0% -8.30% (p=0.000 n=15) NatSqr/100-16 4.365µ ± 0% 3.802µ ± 0% -12.90% (p=0.000 n=15) NatSqr/200-16 16.42µ ± 0% 11.29µ ± 0% -31.26% (p=0.000 n=15) NatSqr/300-16 28.07µ ± 0% 22.83µ ± 0% -18.68% (p=0.000 n=15) NatSqr/500-16 76.30µ ± 0% 50.06µ ± 0% -34.39% (p=0.000 n=15) NatSqr/800-16 147.5µ ± 0% 101.2µ ± 1% -31.41% (p=0.000 n=15) NatSqr/1000-16 228.6µ ± 0% 149.5µ ± 0% -34.61% (p=0.000 n=15) NatSqr/10000-16 7.417m ± 0% 6.025m ± 0% -18.76% (p=0.000 n=15) NatSqr/100000-16 309.2m ± 0% 214.9m ± 0% -30.50% (p=0.000 n=15) geomean 8.559µ 7.906µ -7.63% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.577n ± 6% 9.473n ± 5% ~ (p=0.384 n=15) Div/40/20-12 9.480n ± 1% 9.430n ± 1% ~ (p=0.019 n=15) Div/100/50-12 14.82n ± 0% 14.82n ± 0% ~ (p=0.845 n=15) Div/200/100-12 83.94n ± 1% 84.35n ± 4% ~ (p=0.512 n=15) Div/400/200-12 102.7n ± 1% 102.9n ± 0% ~ (p=0.845 n=15) Div/1000/500-12 185.3n ± 1% 181.9n ± 1% -1.83% (p=0.000 n=15) Div/2000/1000-12 397.0n ± 1% 396.7n ± 0% ~ (p=0.959 n=15) Div/20000/10000-12 14.05µ ± 0% 13.70µ ± 1% ~ (p=0.002 n=15) Div/200000/100000-12 529.4µ ± 3% 526.7µ ± 2% ~ (p=0.967 n=15) Div/2000000/1000000-12 20.05m ± 0% 20.05m ± 0% ~ (p=0.653 n=15) Div/20000000/10000000-12 788.2m ± 1% 789.0m ± 1% ~ (p=0.412 n=15) NatMul/10-12 79.95n ± 1% 80.87n ± 1% +1.15% (p=0.000 n=15) NatMul/100-12 2.973µ ± 0% 2.986µ ± 2% ~ (p=0.051 n=15) NatMul/1000-12 122.6µ ± 5% 123.0µ ± 1% ~ (p=0.783 n=15) NatMul/10000-12 4.990m ± 1% 5.000m ± 1% ~ (p=0.653 n=15) NatMul/100000-12 185.3m ± 3% 190.3m ± 1% ~ (p=0.089 n=15) NatSqr/1-12 11.84n ± 1% 11.88n ± 1% ~ (p=0.735 n=15) NatSqr/2-12 21.01n ± 1% 21.44n ± 6% ~ (p=0.039 n=15) NatSqr/3-12 25.59n ± 0% 26.74n ± 9% +4.49% (p=0.000 n=15) NatSqr/5-12 36.78n ± 0% 37.04n ± 1% +0.71% (p=0.000 n=15) NatSqr/8-12 63.09n ± 3% 63.22n ± 1% ~ (p=0.846 n=15) NatSqr/10-12 79.98n ± 0% 79.78n ± 0% ~ (p=0.100 n=15) NatSqr/20-12 174.0n ± 0% 175.5n ± 1% ~ (p=0.361 n=15) NatSqr/30-12 290.0n ± 0% 291.4n ± 0% ~ (p=0.002 n=15) NatSqr/50-12 655.2n ± 4% 658.1n ± 0% ~ (p=0.060 n=15) NatSqr/80-12 1.506µ ± 0% 1.397µ ± 5% -7.24% (p=0.000 n=15) NatSqr/100-12 2.273µ ± 0% 2.005µ ± 5% -11.79% (p=0.000 n=15) NatSqr/200-12 8.833µ ± 6% 6.109µ ± 0% -30.84% (p=0.000 n=15) NatSqr/300-12 15.15µ ± 4% 12.37µ ± 0% -18.34% (p=0.000 n=15) NatSqr/500-12 41.89µ ± 0% 27.70µ ± 1% -33.88% (p=0.000 n=15) NatSqr/800-12 80.72µ ± 0% 56.40µ ± 0% -30.12% (p=0.000 n=15) NatSqr/1000-12 127.06µ ± 1% 84.06µ ± 1% -33.84% (p=0.000 n=15) NatSqr/10000-12 4.130m ± 0% 3.390m ± 0% -17.91% (p=0.000 n=15) NatSqr/100000-12 173.2m ± 0% 131.2m ± 6% -24.25% (p=0.000 n=15) geomean 4.489µ 4.189µ -6.68% Change-Id: Iaf65fd85457b003ebf07a787c875cda321b40cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/652058 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org> |
|
|
|
40c953cd46 |
runtime: remove nextSampleNoFP from plan9
Plan 9 can use floating point now. Change-Id: If721b243daa31853609cb3d2c535d86c106a1ee1 Reviewed-on: https://go-review.googlesource.com/c/go/+/655879 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> |
|
|
|
d037ed62bc |
math/big: simplify, speed up Karatsuba multiplication
The old Karatsuba implementation only operated on lengths that are
a power of two times a number smaller than karatsubaThreshold.
For example, when karatsubaThreshold = 40, multiplying a pair
of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words
and then has to fix up the answer to include the high 3 words of each.
I suspect this requirement was needed to make the analysis of
how many temporary words to reserve easier, back when the
answer was 3*n and depended on exactly halving the size at
each Karatsuba step.
Now that we have the more flexible temporary allocation stack,
we can change Karatsuba to accept operands of odd length.
Doing so avoids most of the fixup that the old approach required.
For example, multiplying a pair of 99-word numbers runs
karatsuba on all 99 words now.
This is simpler and about the same speed or, for large cases, faster.
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) CPU @ 3.10GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15)
GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15)
GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15)
GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15)
GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15)
GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15)
GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15)
GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15)
Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15)
Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15)
Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15)
Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15)
Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15)
Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15)
Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15)
Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15)
Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15)
Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15)
Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15)
NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15)
NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15)
NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15)
NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15)
NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15)
NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15)
NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15)
NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15)
NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15)
NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15)
NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15)
NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15)
NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15)
NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15)
NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15)
NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15)
NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15)
NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15)
NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15)
NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15)
NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15)
NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15)
NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15)
geomean 7.745µ 7.760µ +0.19%
goos: linux
goarch: amd64
pkg: math/big
cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15)
GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15)
GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15)
GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15)
GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15)
GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15)
GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15)
GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15)
Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15)
Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15)
Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15)
Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15)
Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15)
Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15)
Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15)
Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15)
Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15)
Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15)
Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15)
NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15)
NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15)
NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15)
NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15)
NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15)
NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15)
NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15)
NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15)
NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15)
NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15)
NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15)
NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15)
NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15)
NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15)
NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15)
NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15)
NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15)
NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15)
NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15)
NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15)
NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15)
NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15)
NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15)
geomean 6.518µ 6.438µ -1.22%
goos: linux
goarch: arm64
pkg: math/big
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15)
GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15)
GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15)
GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15)
GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15)
GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15)
GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15)
GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15)
Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15)
Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15)
Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15)
Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15)
Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15)
Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15)
Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15)
Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15)
Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15)
Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15)
Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15)
NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15)
NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15)
NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15)
NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15)
NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15)
NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15)
NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15)
NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15)
NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15)
NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15)
NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15)
NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15)
NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15)
NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15)
NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15)
NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15)
NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15)
NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15)
NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15)
NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15)
NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15)
NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15)
NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15)
geomean 7.264µ 7.172µ -1.27%
goos: darwin
goarch: arm64
pkg: math/big
cpu: Apple M3 Pro
│ old │ new │
│ sec/op │ sec/op vs base │
GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15)
GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15)
GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15)
GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15)
GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15)
GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15)
GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15)
GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15)
Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15)
Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15)
Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15)
Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15)
Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15)
Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15)
Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15)
Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15)
Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15)
Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15)
Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15)
NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15)
NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15)
NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15)
NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15)
NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15)
NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15)
NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15)
NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15)
NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15)
NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15)
NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15)
NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15)
NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15)
NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15)
NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15)
NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15)
NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15)
NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15)
NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15)
NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15)
NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15)
NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15)
NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15)
geomean 3.790µ 3.757µ -0.87%
Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652057
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
|
|
|
|
26040b1dd7 |
cmd/compile: remove noDuffDevice
noDuffDevice was for Plan 9, but Plan 9 doesn't need it anymore. It was also being set in s390x, mips, mipsle, and wasm, but on those systems it had no effect since the SSA rules for those architectures don't refer to it at all. Change-Id: Ib85c0832674c714f3ad5091f0a022eb7cd3ebcdf Reviewed-on: https://go-review.googlesource.com/c/go/+/655878 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Russ Cox <rsc@golang.org> |
|
|
|
c9b07e8871 |
cmd/compile: use FMA on plan9, and drop UseFMA
Every OS uses FMA now. Change-Id: Ia7ffa77c52c45aefca611ddc54e9dfffb27a48da Reviewed-on: https://go-review.googlesource.com/c/go/+/655877 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
35cb497d6e |
cmd/compile: remove useSSE
Every OS uses SSE now. Change-Id: I4df7e2fbc8e5ccb1fc84a884d4c922b7a2a628e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/655876 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
644b984027 |
cmd/compile: compute bitsize from type size in prove to clean some switches
Change-Id: I215adda9050d214576433700aed4c371a36aaaed Reviewed-on: https://go-review.googlesource.com/c/go/+/656335 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> |
|
|
|
b60b9cf21f |
cmd/compile: add constant folding for bits.Add64
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd Reviewed-on: https://go-review.googlesource.com/c/go/+/656275 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> |