mirror of https://github.com/golang/go.git
5183 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
558f5372fc |
cmd/compile,testing: implement one-time rampup logic for testing.B.Loop
testing.B.Loop now does its own loop scheduling without interaction with b.N. b.N will be updated to the actual iterations b.Loop controls when b.Loop returns false. This CL also added tests for fixed iteration count (benchtime=100x case). This CL also ensured that b.Loop() is inlined. For #61515 Change-Id: Ia15f4462f4830ef4ec51327520ff59910eb4bb58 Reviewed-on: https://go-review.googlesource.com/c/go/+/627755 Reviewed-by: Michael Pratt <mpratt@google.com> Commit-Queue: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
566cf1c108 |
crypto/ecdh: move implementation to crypto/internal/fips/ecdh
This intentionally gives up on the property of not computing the public key until requested. It was nice, but it was making the code too complex. The average use case is to call PublicKey immediately after GenerateKey anyway. Added support in the module for P-224, just in case we'd ever want to support it in crypto/ecdh. Tried various ways to fix test/fixedbugs/issue52193.go to be meaningful, but crypto/ecdh is pretty complex and all the solutions would end up locking in crypto/ecdh structure rather than compiler behavior. The rest of that test is good enough on its own anyway. If we do the work in the future of making crypto/ecdh zero-allocations using the affordances of the compiler, we can add a more robust TestAllocations on our side. For #69536 Change-Id: I68ac3955180cb31f6f96a0ef57604aaed88ab311 Reviewed-on: https://go-review.googlesource.com/c/go/+/628315 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Daniel McCarney <daniel@binaryparadox.net> Reviewed-by: Russ Cox <rsc@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
170436c045 |
cmd/compile: strongly favor closure inlining
This tweaks the inlining cost knob for closures specifically, they receive a doubled budget. The rationale for this is that closures have a lot of "crud" in their IR that will disappear after inlining, so the standard budget penalizes them unnecessarily. This is also the cause of these bugs -- looking at the code involved, these closures "should" be inlineable, therefore tweak the parameters until behavior matches expectations. It's not costly in binary size, because the only-called-from-one-site case is common (especially for rangefunc iterators). I can imagine better fixes and I am going to try to get that done, but this one is small and makes things better. Fixes #69411, #69539. Change-Id: I8a892c40323173a723799e0ddad69dcc2724a8f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/629195 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
53b2b64b64 |
sync: add explicit noCopy fields to Map, Mutex, and Once
Following CLs will refactor Mutex and change the internals of Map. This ends up breaking tests in x/tools for the copylock vet check, because the error message changes. Let's insulate ourselves from such things permanently by adding an explicit noCopy field. We'll update the vet check to accept that as the problem, rather than depend on less explicit internals. We capture Once here too to clean up the error message as well. Change-Id: Iead985fc8ec9ef3ea5ff615f26dde17bb03aeadb Reviewed-on: https://go-review.googlesource.com/c/go/+/627777 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Tim King <taking@google.com> |
|
|
|
5a0f2a7a7c |
cmd/compile: remove gc programs from stack frame objects
This is a two-pronged approach. First, try to keep large objects off the stack frame. Second, if they do manage to appear anyway, use straight bitmasks instead of gc programs. Generally probably a good idea to keep large objects out of stack frames. But particularly keeping gc programs off the stack simplifies runtime code a bit. This CL sets the limit of most stack objects to 131072 bytes (on 64-bit archs). There can still be large objects if allocated by a late pass, like order, or they are required to be on the stack, like function arguments. But the size for the bitmasks for these objects isn't a huge deal, as we have already have (probably several) bitmasks for the frame liveness map itself. Change-Id: I6d2bed0e9aa9ac7499955562c6154f9264061359 Reviewed-on: https://go-review.googlesource.com/c/go/+/542815 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
e30ce3c498 |
sync/atomic: make intrinsics noescape except 64bits op on 32bits arch and unsafe.Pointer
Fixes #16241 I made 64 bits op on 32 bits arches still leak since it was kinda promised. The promised leaks were wider than this but I don't belive it's effect can be observed in an breaking maner without using unsafe the way it's currently setup. Change-Id: I66d8df47bfe49bce3efa64ac668a2a55f70733a3 Reviewed-on: https://go-review.googlesource.com/c/go/+/462298 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
|
|
|
2eac154b1c |
cmd/compile: better error message when offending/missing token is a keyword
Prefix keywords (type, default, case, etc.) with "keyword" in error messages to make them less ambiguous. Fixes #68589. Change-Id: I1eb92d1382f621b934167b3a4c335045da26be9f Reviewed-on: https://go-review.googlesource.com/c/go/+/623819 Auto-Submit: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Tim King <taking@google.com> |
|
|
|
ab55465098 |
cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 1.7240n ± 0% 0.8120n ± 0% -52.90% (p=0.000 n=20)
TrailingZeros8 1.0530n ± 0% 0.8015n ± 0% -23.88% (p=0.000 n=20)
TrailingZeros16 2.072n ± 0% 1.015n ± 0% -51.01% (p=0.000 n=20)
TrailingZeros32 1.7160n ± 0% 0.8122n ± 0% -52.67% (p=0.000 n=20)
TrailingZeros64 2.0060n ± 0% 0.8125n ± 0% -59.50% (p=0.000 n=20)
geomean 1.669n 0.8470n -49.25%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
TrailingZeros 2.6275n ± 0% 0.9120n ± 0% -65.29% (p=0.000 n=20)
TrailingZeros8 1.451n ± 0% 1.163n ± 0% -19.85% (p=0.000 n=20)
TrailingZeros16 3.069n ± 0% 1.201n ± 0% -60.87% (p=0.000 n=20)
TrailingZeros32 2.9060n ± 0% 0.9115n ± 0% -68.63% (p=0.000 n=20)
TrailingZeros64 2.6305n ± 0% 0.9115n ± 0% -65.35% (p=0.000 n=20)
geomean 2.456n 1.011n -58.83%
This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
|
|
|
|
1f8fa4941f |
runtime: fix iterator returns map entries after clear (pre-swissmap)
Fixes #70189 Fixes #59411 Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-noswissmap Change-Id: I4ef7ecd7e996330189309cb2a658cf34bf9e1119 Reviewed-on: https://go-review.googlesource.com/c/go/+/625275 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
745ec75719 |
cmd/compile/internal/ssa: improve carry addition rules on PPC64
Fold constant int16 addends for usages of math/bits.Add64(x,const,0) on PPC64. This usage shows up in a few crypto implementations; notably the go wrapper for CL 626176. Change-Id: I6963163330487d04e0479b4fdac235f97bb96889 Reviewed-on: https://go-review.googlesource.com/c/go/+/625899 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
fb9b946adc |
cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.
Benchmark results on loongson 3A5000 and 3A6000 machines:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 4.413n ± 0% 1.401n ± 0% -68.25% (p=0.000 n=10)
OnesCount8 1.364n ± 0% 1.363n ± 0% ~ (p=0.130 n=10)
OnesCount16 2.112n ± 0% 1.534n ± 0% -27.37% (p=0.000 n=10)
OnesCount32 4.533n ± 0% 1.529n ± 0% -66.27% (p=0.000 n=10)
OnesCount64 4.565n ± 0% 1.531n ± 1% -66.46% (p=0.000 n=10)
geomean 3.048n 1.470n -51.78%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
OnesCount 3.553n ± 0% 1.201n ± 0% -66.20% (p=0.000 n=10)
OnesCount8 0.8021n ± 0% 0.8004n ± 0% -0.21% (p=0.000 n=10)
OnesCount16 1.216n ± 0% 1.000n ± 0% -17.76% (p=0.000 n=10)
OnesCount32 3.006n ± 0% 1.035n ± 0% -65.57% (p=0.000 n=10)
OnesCount64 3.503n ± 0% 1.035n ± 0% -70.45% (p=0.000 n=10)
geomean 2.053n 1.006n -51.01%
Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
|
|
fe2da30cb5 |
cmd/compile: keep variables alive in testing.B.Loop loops
For the loop body guarded by testing.B.Loop, we disable function inlining and devirtualization inside. The only legal form to be matched is `for b.Loop() {...}`.
For #61515
Change-Id: I2e226f08cb4614667cbded498a7821dffe3f72d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/612043
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Junyang Shao <shaojunyang@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
|
|
5a9aeef9d5 |
cmd/compile: allow more types for wasmimport/wasmexport parameters and results
As proposed on #66984, this CL allows more types to be used as wasmimport/wasmexport function parameters and results. Specifically, bool, string, and uintptr are now allowed, and also pointer types that point to allowed element types. Allowed element types includes sized integer and floating point types (including small integer types like uint8 which are not directly allowed as a parameter type), bool, array whose element type is allowed, and struct whose fields are allowed element type and also include a struct.HostLayout field. For #66984. Change-Id: Ie5452a1eda21c089780dfb4d4246de6008655c84 Reviewed-on: https://go-review.googlesource.com/c/go/+/626615 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
583d750fa1 |
cmd/compile: wire up bits.Reverse intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 2.8130n ± 0% 0.8008n ± 0% -71.53% (p=0.000 n=20)
Reverse8 0.7014n ± 0% 0.4040n ± 0% -42.40% (p=0.000 n=20)
Reverse16 1.2975n ± 0% 0.6632n ± 1% -48.89% (p=0.000 n=20)
Reverse32 2.7520n ± 0% 0.4042n ± 0% -85.31% (p=0.000 n=20)
Reverse64 2.8970n ± 0% 0.4041n ± 0% -86.05% (p=0.000 n=20)
geomean 1.828n 0.5116n -72.01%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| CL 624576 | this CL |
| sec/op | sec/op vs base |
Reverse 4.0050n ± 0% 0.8011n ± 0% -80.00% (p=0.000 n=20)
Reverse8 0.8010n ± 0% 0.5210n ± 1% -34.96% (p=0.000 n=20)
Reverse16 1.6160n ± 0% 0.6008n ± 0% -62.82% (p=0.000 n=20)
Reverse32 3.8550n ± 0% 0.5179n ± 0% -86.57% (p=0.000 n=20)
Reverse64 3.8050n ± 0% 0.5177n ± 0% -86.40% (p=0.000 n=20)
geomean 2.378n 0.5828n -75.49%
Updates #59120
This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
e6cc9d228a |
cmd/compile: implement FMA codegen for loong64
Benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 25.930n ± 0% 2.002n ± 0% -92.28% (p=0.000 n=10)
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
FMA 32.840n ± 0% 2.002n ± 0% -93.90% (p=0.000 n=10)
Updates #59120
This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
|
|
d6fb0ab2c7 |
cmd/compile: wire up Bswap/ReverseBytes intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.0020n ± 0% 0.4040n ± 0% -79.82% (p=0.000 n=20)
ReverseBytes16 0.8866n ± 1% 0.8007n ± 0% -9.69% (p=0.000 n=20)
ReverseBytes32 1.2195n ± 0% 0.8007n ± 0% -34.34% (p=0.000 n=20)
ReverseBytes64 2.0705n ± 0% 0.8008n ± 0% -61.32% (p=0.000 n=20)
geomean 1.455n 0.6749n -53.62%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
ReverseBytes 2.8040n ± 0% 0.5205n ± 0% -81.44% (p=0.000 n=20)
ReverseBytes16 0.7066n ± 0% 0.8011n ± 0% +13.37% (p=0.000 n=20)
ReverseBytes32 1.5500n ± 0% 0.8010n ± 0% -48.32% (p=0.000 n=20)
ReverseBytes64 2.7665n ± 0% 0.8010n ± 0% -71.05% (p=0.000 n=20)
geomean 1.707n 0.7192n -57.87%
Updates #59120
This patch is a copy of CL 483357.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: If355354cd031533df91991fcc3392e5a6c314295
Reviewed-on: https://go-review.googlesource.com/c/go/+/624576
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
|
|
d98c51809d |
cmd/compile: wire up math/bits.Len intrinsics for loong64
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
|
|
cb163ff60b |
cmd/compile: init limit for newly created value in prove pass
Fixes: #70156 Change-Id: I2e5dc2a39a8e54ec5f18c5f9d1644208cffb2e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/624695 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
|
|
|
5f88755f43 |
cmd/compile: add loong64-specific inlining for runtime.memmove
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
Memmove/0 0.8004n ± 0% 0.4002n ± 0% -50.00% (p=0.000 n=20)
Memmove/1 2.494n ± 0% 2.136n ± 0% -14.35% (p=0.000 n=20)
Memmove/2 2.802n ± 0% 2.512n ± 0% -10.35% (p=0.000 n=20)
Memmove/3 2.802n ± 0% 2.497n ± 0% -10.92% (p=0.000 n=20)
Memmove/4 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20)
Memmove/5 2.821n ± 0% 2.658n ± 0% -5.76% (p=0.000 n=20)
Memmove/6 2.819n ± 0% 2.657n ± 0% -5.73% (p=0.000 n=20)
Memmove/7 2.820n ± 0% 2.654n ± 0% -5.87% (p=0.000 n=20)
Memmove/8 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20)
Memmove/9 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/10 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/11 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/12 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20)
Memmove/13 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/14 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/15 3.202n ± 0% 3.010n ± 0% -6.01% (p=0.000 n=20)
Memmove/16 3.202n ± 0% 3.009n ± 0% -6.03% (p=0.000 n=20)
Memmove/32 3.602n ± 0% 3.603n ± 0% +0.03% (p=0.000 n=20)
Memmove/64 4.202n ± 0% 4.204n ± 0% +0.05% (p=0.000 n=20)
Memmove/128 8.005n ± 0% 8.007n ± 0% +0.02% (p=0.000 n=20)
Memmove/256 11.21n ± 0% 10.81n ± 0% -3.57% (p=0.000 n=20)
Memmove/512 17.65n ± 0% 17.96n ± 0% +1.73% (p=0.000 n=20)
Memmove/1024 30.48n ± 0% 30.46n ± 0% -0.07% (p=0.000 n=20)
Memmove/2048 56.43n ± 0% 56.30n ± 0% -0.24% (p=0.000 n=20)
Memmove/4096 107.7n ± 0% 107.6n ± 0% -0.09% (p=0.000 n=20)
MemmoveOverlap/32 4.002n ± 0% 4.003n ± 0% +0.02% (p=0.002 n=20)
MemmoveOverlap/64 4.603n ± 0% 4.603n ± 0% ~ (p=0.286 n=20)
MemmoveOverlap/128 8.704n ± 0% 8.699n ± 0% ~ (p=0.180 n=20)
MemmoveOverlap/256 12.01n ± 0% 11.76n ± 0% -2.08% (p=0.000 n=20)
MemmoveOverlap/512 18.42n ± 0% 18.36n ± 0% -0.33% (p=0.000 n=20)
MemmoveOverlap/1024 31.23n ± 0% 31.16n ± 0% -0.21% (p=0.000 n=20)
MemmoveOverlap/2048 57.42n ± 0% 56.82n ± 0% -1.04% (p=0.000 n=20)
MemmoveOverlap/4096 108.5n ± 0% 108.0n ± 0% -0.46% (p=0.000 n=20)
MemmoveUnalignedDst/0 2.804n ± 0% 2.447n ± 0% -12.70% (p=0.000 n=20)
MemmoveUnalignedDst/1 2.802n ± 0% 2.491n ± 0% -11.12% (p=0.000 n=20)
MemmoveUnalignedDst/2 3.202n ± 0% 2.808n ± 0% -12.29% (p=0.000 n=20)
MemmoveUnalignedDst/3 3.202n ± 0% 2.814n ± 0% -12.12% (p=0.000 n=20)
MemmoveUnalignedDst/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/5 3.202n ± 0% 3.203n ± 0% +0.03% (p=0.014 n=20)
MemmoveUnalignedDst/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/10 3.602n ± 0% 3.602n ± 0% ~ (p=0.091 n=20)
MemmoveUnalignedDst/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.613 n=20)
MemmoveUnalignedDst/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.165 n=20)
MemmoveUnalignedDst/13 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/14 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/15 3.602n ± 0% 3.602n ± 0% 0.00% (p=0.027 n=20)
MemmoveUnalignedDst/16 3.602n ± 0% 3.602n ± 0% ~ (p=0.661 n=20)
MemmoveUnalignedDst/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedDst/64 6.804n ± 0% 6.804n ± 0% ~ (p=0.204 n=20)
MemmoveUnalignedDst/128 12.61n ± 0% 12.61n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedDst/256 16.33n ± 2% 16.32n ± 2% ~ (p=0.839 n=20)
MemmoveUnalignedDst/512 25.61n ± 0% 24.71n ± 0% -3.51% (p=0.000 n=20)
MemmoveUnalignedDst/1024 42.81n ± 0% 42.82n ± 0% ~ (p=0.973 n=20)
MemmoveUnalignedDst/2048 74.86n ± 0% 76.03n ± 0% +1.56% (p=0.000 n=20)
MemmoveUnalignedDst/4096 152.0n ± 11% 152.0n ± 0% 0.00% (p=0.013 n=20)
MemmoveUnalignedDstOverlap/32 5.319n ± 0% 5.558n ± 1% +4.50% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/64 8.006n ± 0% 8.025n ± 0% +0.24% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/128 9.631n ± 0% 9.601n ± 0% -0.31% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/256 13.79n ± 2% 13.58n ± 1% ~ (p=0.234 n=20)
MemmoveUnalignedDstOverlap/512 21.38n ± 0% 21.30n ± 0% -0.37% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/1024 41.71n ± 0% 41.70n ± 0% ~ (p=0.887 n=20)
MemmoveUnalignedDstOverlap/2048 81.63n ± 0% 81.61n ± 0% ~ (p=0.481 n=20)
MemmoveUnalignedDstOverlap/4096 162.6n ± 0% 162.6n ± 0% ~ (p=0.171 n=20)
MemmoveUnalignedSrc/0 2.808n ± 0% 2.482n ± 0% -11.61% (p=0.000 n=20)
MemmoveUnalignedSrc/1 2.804n ± 0% 2.577n ± 0% -8.08% (p=0.000 n=20)
MemmoveUnalignedSrc/2 3.202n ± 0% 2.806n ± 0% -12.37% (p=0.000 n=20)
MemmoveUnalignedSrc/3 3.202n ± 0% 2.808n ± 0% -12.30% (p=0.000 n=20)
MemmoveUnalignedSrc/4 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/5 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/6 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/7 3.202n ± 0% 3.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/8 3.602n ± 0% 3.202n ± 0% -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/9 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/10 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/11 3.602n ± 0% 3.602n ± 0% ~ (p=0.746 n=20)
MemmoveUnalignedSrc/12 3.602n ± 0% 3.602n ± 0% ~ (p=0.407 n=20)
MemmoveUnalignedSrc/13 3.603n ± 0% 3.602n ± 0% -0.03% (p=0.001 n=20)
MemmoveUnalignedSrc/14 3.603n ± 0% 3.602n ± 0% -0.01% (p=0.013 n=20)
MemmoveUnalignedSrc/15 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/16 3.602n ± 0% 3.602n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/32 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrc/64 4.803n ± 0% 4.803n ± 0% 0.00% (p=0.008 n=20)
MemmoveUnalignedSrc/128 8.405n ± 0% 8.405n ± 0% 0.00% (p=0.003 n=20)
MemmoveUnalignedSrc/256 12.04n ± 3% 12.20n ± 2% ~ (p=0.151 n=20)
MemmoveUnalignedSrc/512 19.11n ± 0% 19.10n ± 3% ~ (p=0.621 n=20)
MemmoveUnalignedSrc/1024 35.62n ± 0% 35.62n ± 0% ~ (p=0.407 n=20)
MemmoveUnalignedSrc/2048 68.04n ± 0% 68.35n ± 0% +0.46% (p=0.000 n=20)
MemmoveUnalignedSrc/4096 133.2n ± 1% 133.3n ± 0% ~ (p=0.131 n=20)
MemmoveUnalignedSrcDst/f_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_0 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_1 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_4 4.202n ± 0% 4.202n ± 0% ~ (p=0.661 n=20)
MemmoveUnalignedSrcDst/f_16_7 4.202n ± 0% 4.202n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_7 4.203n ± 0% 4.202n ± 0% -0.02% (p=0.008 n=20)
MemmoveUnalignedSrcDst/f_64_0 6.103n ± 0% 6.100n ± 0% ~ (p=0.595 n=20)
MemmoveUnalignedSrcDst/b_64_0 6.103n ± 0% 6.102n ± 0% ~ (p=0.973 n=20)
MemmoveUnalignedSrcDst/f_64_1 7.419n ± 0% 7.226n ± 0% -2.59% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_1 6.745n ± 0% 6.941n ± 0% +2.89% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_4 7.420n ± 0% 7.223n ± 0% -2.65% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_4 6.753n ± 0% 6.941n ± 0% +2.79% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_7 7.423n ± 0% 7.204n ± 0% -2.96% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_7 6.750n ± 0% 6.941n ± 0% +2.83% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_0 12.96n ± 0% 12.99n ± 0% +0.27% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_256_0 12.91n ± 0% 12.94n ± 0% +0.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_1 17.21n ± 0% 17.21n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_1 17.61n ± 0% 17.61n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_4 16.21n ± 0% 16.21n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_4 16.41n ± 0% 16.41n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_7 14.12n ± 0% 14.10n ± 0% ~ (p=0.307 n=20)
MemmoveUnalignedSrcDst/b_256_7 14.81n ± 0% 14.81n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_0 109.3n ± 0% 109.4n ± 0% +0.09% (p=0.004 n=20)
MemmoveUnalignedSrcDst/b_4096_0 109.6n ± 0% 109.6n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_4096_1 113.5n ± 0% 113.5n ± 0% ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_4096_1 113.7n ± 0% 113.7n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_4 112.3n ± 0% 112.3n ± 0% ~ (p=0.763 n=20)
MemmoveUnalignedSrcDst/b_4096_4 112.6n ± 0% 112.9n ± 1% +0.31% (p=0.032 n=20)
MemmoveUnalignedSrcDst/f_4096_7 110.6n ± 0% 110.6n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/b_4096_7 111.1n ± 0% 111.1n ± 0% ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_65536_0 4.801µ ± 0% 4.818µ ± 0% +0.34% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_0 5.027µ ± 0% 5.036µ ± 0% +0.19% (p=0.007 n=20)
MemmoveUnalignedSrcDst/f_65536_1 4.815µ ± 0% 4.729µ ± 0% -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_1 4.659µ ± 0% 4.737µ ± 1% +1.69% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_4 4.807µ ± 0% 4.721µ ± 0% -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_4 4.659µ ± 0% 4.601µ ± 0% -1.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_7 4.868µ ± 0% 4.759µ ± 0% -2.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_7 4.665µ ± 0% 4.709µ ± 0% +0.93% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/32 6.804n ± 0% 6.810n ± 0% +0.09% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/64 10.41n ± 0% 10.42n ± 0% +0.10% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/128 11.59n ± 0% 11.58n ± 0% ~ (p=0.414 n=20)
MemmoveUnalignedSrcOverlap/256 14.22n ± 0% 14.29n ± 0% +0.46% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/512 23.11n ± 0% 23.04n ± 0% -0.28% (p=0.001 n=20)
MemmoveUnalignedSrcOverlap/1024 41.44n ± 0% 41.47n ± 0% ~ (p=0.693 n=20)
MemmoveUnalignedSrcOverlap/2048 81.25n ± 0% 81.25n ± 0% ~ (p=0.405 n=20)
MemmoveUnalignedSrcOverlap/4096 166.1n ± 0% 166.1n ± 0% ~ (p=0.451 n=20)
geomean 13.02n 12.69n -2.51%
¹ all samples are equal
Change-Id: I712adc7670f6ae360714ec5a770d00d76c8700ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/618815
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
|
|
324f41b748 |
cmd/compile: fix inlining name mangling for blank label
Fixes #70175 Change-Id: I13767d951455854b03ad6707ff9292cfe9097ee9 Reviewed-on: https://go-review.googlesource.com/c/go/+/624377 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
|
|
|
63ba2b9d84 |
cmd/compile,internal/runtime/maps: stack allocated maps and small alloc
The compiler will stack allocate the Map struct and initial group if possible. Stack maps are initialized inline without calling into the runtime. Small heap allocated maps use makemap_small. These are the same heuristics as existing maps. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I6c371d1309716fd1c38a3212d417b3c76db5c9b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/622042 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> |
|
|
|
f782e16162 |
runtime,internal/runtime/maps: specialized swissmaps
Add all the specialized variants that exist for the existing maps. Like the existing maps, the fast variants do not support indirect key/elem. Note that as of this CL, the Get and Put methods on Map/table are effectively dead. They are only reachable from the internal/runtime/maps unit tests. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap Change-Id: I95297750be6200f34ec483e4cfc897f048c26db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/616463 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> |
|
|
|
4dcbb00be2 |
cmd/compile: teach prove about min/max phi operations
If there is a phi that is computing the minimum of its two inputs, then we know the result of the phi is smaller than or equal to both of its inputs. Similarly for maxiumum (although max seems less useful). This pattern happens for the case n := copy(a, b) n is the minimum of len(a) and len(b), so with this optimization we know both n <= len(a) and n <= len(b). That extra information is helpful for subsequent slicing of a or b. Fixes #16833 Change-Id: Ib4238fd1edae0f2940f62a5516a6b363bbe7928c Reviewed-on: https://go-review.googlesource.com/c/go/+/622240 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
aef81a7551 |
cmd/compile: add rules to optimize go codes to constant 0 on loong64
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
BinaryTree17 7.735 ± 1% 7.716 ± 1% -0.23% (p=0.041 n=15)
Fannkuch11 2.645 ± 0% 2.646 ± 0% +0.05% (p=0.013 n=15)
FmtFprintfEmpty 35.87n ± 0% 35.89n ± 0% +0.06% (p=0.000 n=15)
FmtFprintfString 59.54n ± 0% 59.47n ± 0% ~ (p=0.213 n=15)
FmtFprintfInt 62.23n ± 0% 62.06n ± 0% ~ (p=0.212 n=15)
FmtFprintfIntInt 98.16n ± 0% 97.90n ± 0% -0.26% (p=0.000 n=15)
FmtFprintfPrefixedInt 117.0n ± 0% 116.7n ± 0% -0.26% (p=0.000 n=15)
FmtFprintfFloat 204.6n ± 0% 204.2n ± 0% -0.20% (p=0.000 n=15)
FmtManyArgs 456.3n ± 0% 455.4n ± 0% -0.20% (p=0.000 n=15)
GobDecode 7.210m ± 0% 7.156m ± 1% -0.75% (p=0.000 n=15)
GobEncode 8.143m ± 1% 8.177m ± 1% ~ (p=0.806 n=15)
Gzip 280.2m ± 0% 279.7m ± 0% -0.19% (p=0.005 n=15)
Gunzip 32.71m ± 0% 32.65m ± 0% -0.19% (p=0.000 n=15)
HTTPClientServer 53.76µ ± 0% 53.65µ ± 0% ~ (p=0.083 n=15)
JSONEncode 9.297m ± 0% 9.295m ± 0% ~ (p=0.806 n=15)
JSONDecode 46.97m ± 1% 47.07m ± 1% ~ (p=0.683 n=15)
Mandelbrot200 4.602m ± 0% 4.600m ± 0% -0.05% (p=0.001 n=15)
GoParse 4.682m ± 0% 4.670m ± 1% -0.25% (p=0.001 n=15)
RegexpMatchEasy0_32 59.80n ± 0% 59.63n ± 0% -0.28% (p=0.000 n=15)
RegexpMatchEasy0_1K 458.3n ± 0% 457.3n ± 0% -0.22% (p=0.001 n=15)
RegexpMatchEasy1_32 59.39n ± 0% 59.23n ± 0% -0.27% (p=0.000 n=15)
RegexpMatchEasy1_1K 557.9n ± 0% 556.6n ± 0% -0.23% (p=0.001 n=15)
RegexpMatchMedium_32 803.6n ± 0% 801.8n ± 0% -0.22% (p=0.001 n=15)
RegexpMatchMedium_1K 27.32µ ± 0% 27.26µ ± 0% -0.21% (p=0.000 n=15)
RegexpMatchHard_32 1.385µ ± 0% 1.382µ ± 0% -0.22% (p=0.000 n=15)
RegexpMatchHard_1K 40.93µ ± 0% 40.83µ ± 0% -0.24% (p=0.000 n=15)
Revcomp 474.8m ± 0% 474.3m ± 0% ~ (p=0.250 n=15)
Template 77.41m ± 1% 76.63m ± 1% -1.01% (p=0.023 n=15)
TimeParse 271.1n ± 0% 271.2n ± 0% +0.04% (p=0.022 n=15)
TimeFormat 290.0n ± 0% 289.8n ± 0% ~ (p=0.118 n=15)
geomean 51.73µ 51.64µ -0.18%
Change-Id: I45a1e6c85bb3cea0f62766ec932432803e9af10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/619315
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
|
|
bbdc65bb38 |
test: add a test for wasm memory usage
Test that a small Wasm program uses 8 MB of linear memory. This reflects the current allocator. We test an exact value, but if the allocator changes, we can update or relax this. Updates #69018. Change-Id: Ifc0bb420af008bd30cde4745b3efde3ce091b683 Reviewed-on: https://go-review.googlesource.com/c/go/+/622378 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
bb07aa644b |
cmd/compile: add shift optimization test
For #69635 Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e Reviewed-on: https://go-review.googlesource.com/c/go/+/621755 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> |
|
|
|
711552e98a |
cmd/compile: optimize type switch for a single runtime known type with a case var
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b Reviewed-on: https://go-review.googlesource.com/c/go/+/618476 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
|
|
|
1846dd5a31 |
cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused several shift related codegen test failures. Add new rules to fix the test regressions, and cleanup tests which are changed but not regressed. Some CLRLSLDI tests are removed as they are no test CLRLSLDI rules. Fixes #70003 Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034 Reviewed-on: https://go-review.googlesource.com/c/go/+/622236 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
91d07ac71c |
cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
MemclrKnownSize1 2.4070n ± 0% 0.4004n ± 0% -83.37% (p=0.000 n=20)
MemclrKnownSize2 2.1365n ± 0% 0.4004n ± 0% -81.26% (p=0.000 n=20)
MemclrKnownSize4 2.4445n ± 0% 0.4004n ± 0% -83.62% (p=0.000 n=20)
MemclrKnownSize8 2.4200n ± 0% 0.4004n ± 0% -83.45% (p=0.000 n=20)
MemclrKnownSize16 2.8030n ± 0% 0.8007n ± 0% -71.43% (p=0.000 n=20)
MemclrKnownSize32 2.803n ± 0% 1.602n ± 0% -42.85% (p=0.000 n=20)
MemclrKnownSize64 3.250n ± 0% 2.402n ± 0% -26.08% (p=0.000 n=20)
MemclrKnownSize112 6.006n ± 0% 2.819n ± 0% -53.06% (p=0.000 n=20)
MemclrKnownSize128 6.006n ± 0% 3.240n ± 0% -46.05% (p=0.000 n=20)
MemclrKnownSize192 6.807n ± 0% 5.205n ± 0% -23.53% (p=0.000 n=20)
MemclrKnownSize248 7.608n ± 0% 6.301n ± 0% -17.19% (p=0.000 n=20)
MemclrKnownSize256 7.608n ± 0% 6.707n ± 0% -11.84% (p=0.000 n=20)
MemclrKnownSize512 13.61n ± 0% 13.61n ± 0% ~ (p=0.374 n=20)
MemclrKnownSize1024 26.43n ± 0% 26.43n ± 0% ~ (p=0.826 n=20)
MemclrKnownSize4096 103.3n ± 0% 103.3n ± 0% ~ (p=1.000 n=20)
MemclrKnownSize512KiB 26.29µ ± 0% 26.29µ ± 0% -0.00% (p=0.012 n=20)
geomean 10.05n 5.006n -50.18%
| bench.old | bench.new |
| B/s | B/s vs base |
MemclrKnownSize1 396.2Mi ± 0% 2381.9Mi ± 0% +501.21% (p=0.000 n=20)
MemclrKnownSize2 892.8Mi ± 0% 4764.0Mi ± 0% +433.59% (p=0.000 n=20)
MemclrKnownSize4 1.524Gi ± 0% 9.305Gi ± 0% +510.56% (p=0.000 n=20)
MemclrKnownSize8 3.079Gi ± 0% 18.609Gi ± 0% +504.42% (p=0.000 n=20)
MemclrKnownSize16 5.316Gi ± 0% 18.609Gi ± 0% +250.05% (p=0.000 n=20)
MemclrKnownSize32 10.63Gi ± 0% 18.61Gi ± 0% +75.00% (p=0.000 n=20)
MemclrKnownSize64 18.34Gi ± 0% 24.81Gi ± 0% +35.27% (p=0.000 n=20)
MemclrKnownSize112 17.37Gi ± 0% 37.01Gi ± 0% +113.08% (p=0.000 n=20)
MemclrKnownSize128 19.85Gi ± 0% 36.80Gi ± 0% +85.39% (p=0.000 n=20)
MemclrKnownSize192 26.27Gi ± 0% 34.35Gi ± 0% +30.77% (p=0.000 n=20)
MemclrKnownSize248 30.36Gi ± 0% 36.66Gi ± 0% +20.75% (p=0.000 n=20)
MemclrKnownSize256 31.34Gi ± 0% 35.55Gi ± 0% +13.43% (p=0.000 n=20)
MemclrKnownSize512 35.02Gi ± 0% 35.03Gi ± 0% +0.00% (p=0.030 n=20)
MemclrKnownSize1024 36.09Gi ± 0% 36.09Gi ± 0% ~ (p=0.101 n=20)
MemclrKnownSize4096 36.93Gi ± 0% 36.93Gi ± 0% +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB 18.57Gi ± 0% 18.57Gi ± 0% +0.00% (p=0.041 n=20)
geomean 10.13Gi 20.33Gi +100.72%
Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
|
|
|
|
7d9802ac5e |
go/types, types2: qualify named types in error messages with type kind
Change the description of an operand x that has a named type of sorts
by providing a description of the type structure (array, struct, slice,
pointer, etc).
For instance, given a (variable) operand x of a struct type T, the
operand is mentioned as (new):
x (variable of struct type T)
instead of (old):
x (variable of type T)
This approach is also used when a basic type is renamed, for instance
as in:
x (value of uint type big.Word)
which makes it clear that big.Word is a uint.
This change is expected to produce more informative error messages.
Fixes #69955.
Change-Id: I544b0698f753a522c3b6e1800a492a94974fbab7
Reviewed-on: https://go-review.googlesource.com/c/go/+/621458
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
|
|
bdc6dbbc64 |
go/types: improve recursive type error message
This change improves error message for recursive types.
Currently, compilation of the [following program](https://go.dev/play/p/3ef84ObpzfG):
package main
type T1[T T2] struct{}
type T2[T T1] struct{}
returns an error:
./prog.go:3:6: invalid recursive type T1
./prog.go:3:6: T1 refers to
./prog.go:4:6: T2 refers to
./prog.go:3:6: T1
With the patch applied the error message looks like:
./prog.go:3:6: invalid recursive type T1
./prog.go:3:6: T1 refers to T2
./prog.go:4:6: T2 refers to T1
Change-Id: Ic07cdffcffb1483c672b241fede4e694269b5b79
GitHub-Last-Rev:
|
|
|
|
74163c895a |
cmd/compile: use STP/LDP around morestack on arm64
The spill/restore code around morestack is almost never exectued, so we should make it as small as possible. Using 2-register loads/stores makes sense here. Also, the offsets from SP are pretty small so the offset almost always fits in the (smaller than a normal load/store) offset field of the instruction. Makes cmd/go 0.6% smaller. Change-Id: I8845283c1b269a259498153924428f6173bda293 Reviewed-on: https://go-review.googlesource.com/c/go/+/621556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
8668d7bbb9 |
test: split non-regabi stack map test
CL 594596 already did this for regabi, but missed non-regabi. Stack allocated swiss maps don't call rand32. For #54766. Change-Id: I312ea77532ecc6fa860adfea58ea00b01683ca69 Reviewed-on: https://go-review.googlesource.com/c/go/+/621615 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
|
|
|
20ed603118 |
cmd/link: generate Mach-O UUID when -B flag is specified
Currently, on Mach-O, the Go linker doesn't generate LC_UUID in internal linking mode. This causes some macOS system tools unable to track the binary, as well as in some cases the binary unable to access local network on macOS 15. This CL makes the linker start generate LC_UUID. Currently, the UUID is generated if the -B flag is specified. And we'll make it generate UUID by default in a later CL. The -B flag is currently for generating GNU build ID on ELF, which is a similar concept to Mach-O's UUID. Instead of introducing another flag, we just use the same flag and the same setting. Specifically, "-B gobuildid" will generate a UUID based on the Go build ID. For #68678. Cq-Include-Trybots: luci.golang.try:gotip-darwin-amd64_14,gotip-darwin-arm64_13 Change-Id: I90089a78ba144110bf06c1c6836daf2d737ff10a Reviewed-on: https://go-review.googlesource.com/c/go/+/618595 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ingo Oeser <nightlyone@googlemail.com> Reviewed-by: Than McIntosh <thanm@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
d94b7a1876 |
cmd/compile,internal/runtime/maps: add extendible hashing
Extendible hashing splits a swisstable map into many swisstables. This keeps grow operations small. For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap Change-Id: Id91f34af9e686bf35eb8882ee479956ece89e821 Reviewed-on: https://go-review.googlesource.com/c/go/+/604936 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
|
|
|
ef3e1dae2f |
cmd/compile: optimize loong64 with register indexed load/store
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20)
Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20)
FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20)
FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20)
FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20)
FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20)
FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20)
FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20)
GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20)
GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20)
Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20)
Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20)
HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20)
JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20)
JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20)
Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20)
GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20)
RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20)
RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20)
RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20)
Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20)
Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20)
TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20)
TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20)
geomean 51.98µ 50.82µ -2.22%
Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
|
|
c39bc22c14 |
all: wire up swisstable maps
Use the new SwissTable-based map in internal/runtime/maps as the basis for the runtime map when GOEXPERIMENT=swissmap. Integration is complete enough to pass all.bash. Notable missing features: * Race integration / concurrent write detection * Stack-allocated maps * Specialized "fast" map variants * Indirect key / elem For #54766. Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap Change-Id: Ie97b656b6d8e05c0403311ae08fef9f51756a639 Reviewed-on: https://go-review.googlesource.com/c/go/+/594596 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
5428570af7 |
cmd/compile: use call block instead of entry block for tail call expansion
The expand-calls pass assumed that tail calls were always done in the entry block. That used to be true, but with tail calls in wrappers (enabled by CL 578235) and libfuzzer instrumentation, that is no longer the case. Libfuzzer instrumentation adds an IF statement to the start of the wrapper function. Fixes #69825 Change-Id: I9ab7133691d8235f9df128be39bff154b0b8853b Reviewed-on: https://go-review.googlesource.com/c/go/+/619075 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> |
|
|
|
7e2487cf65 |
cmd/compile: avoid dynamic type when possible
If the expression type is a single compile-time known type, use that type instead of the dynamic one, so the later passes of the compiler could skip un-necessary runtime calls. Thanks Youlin Feng for writing the original test case. Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773 Reviewed-on: https://go-review.googlesource.com/c/go/+/616975 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
e470a00cdf |
test: fix test issue 69434 for riscv64
CL 615915 simplified test for issue 69434, using gcflags maymorestack to force stack moving, making program failed with invalid stack pointer. However, it seems that this maymorestack is broken on riscv64. At least gotip-linux-riscv64 is currently broken. This CL fixes this problem by using the initial approach, growing stack size big enough to force stack moving. Updates #69434 Fixes #69714 Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I95255fba884a200f75bcda34d58e9717e4a952ad Reviewed-on: https://go-review.googlesource.com/c/go/+/616698 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
bae2e968e2 |
go/parser, syntax: better error message for parameter missing type
Fixes #69506. Change-Id: I18215e11f214b12d5f65be1d1740181e427f8817 Reviewed-on: https://go-review.googlesource.com/c/go/+/617015 Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
677b6cc175 |
test: simplify issue 69434 test
Updates #69434 Change-Id: I780c5ed63561eb8fa998bb0e6cdc77a904ff29c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/615915 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
7ba074fe43 |
reflect: remove calling mapiterkey, mapiterelem
It makes use of the hiter structure which matches runtime.hiter's.
This change mainly improves the performance of Next method of MapIter.
goos: darwin
goarch: arm64
pkg: reflect
cpu: Apple M2
│ ./old.txt │ ./new.txt │
│ sec/op │ sec/op vs base │
MapIterNext-8 61.95n ± 0% 54.95n ± 0% -11.28% (p=0.000 n=10)
for the change of `test/escape_reflect.go`:
removing mapiterkey, mapiterelem would cause leaking MapIter content
when calling SetIterKey and SetIterValue,
and this may cause map bucket to be allocated on heap instead of stack.
Reproduce:
```
{
m := map[int]int{1: 2} // escapes to heap after this change
it := reflect.ValueOf(m).MapRange()
it.Next()
var k, v int
reflect.ValueOf(&k).Elem().SetIterKey(it)
reflect.ValueOf(&v).Elem().SetIterValue(it)
println(k, v)
}
```
This CL would not introduce abi.NoEscape to fix this. It may need futher
optimization and tests on hiter field usage and its escape analysis.
Fixes #69416
Change-Id: Ibaa33bcf86228070b4a505b9512680791aa59f04
Reviewed-on: https://go-review.googlesource.com/c/go/+/612616
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
|
|
db40d1a4c4 |
cmd/compile: fix wrong esacpe analysis for rangefunc
CL 584596 "-range<N>" suffix to the name of closure generated for a rangefunc loop body. However, this breaks the condition that escape analysis uses for checking whether a closure contains within function, which is "F.funcN" for outer function "F" and closure "funcN". Fixing this by adding new "-rangeN" to the condition. Fixes #69434 Fixes #69507 Change-Id: I411de8f63b69a6514a9e9504d49d62e00ce4115d Reviewed-on: https://go-review.googlesource.com/c/go/+/614096 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
|
|
|
4f881115d4 |
runtime: move getcallersp to internal/runtime/sys
Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040 Reviewed-on: https://go-review.googlesource.com/c/go/+/613261 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
|
|
|
81c92352a7 |
runtime: move getcallerpc to internal/runtime/sys
Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
|
|
|
f117d1c9b5 |
test: add test for issue 24755
Fixes #24755 Change-Id: I00b276c5c2acb227d42a069d1af6027e4b499d31 Reviewed-on: https://go-review.googlesource.com/c/go/+/613115 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Tim King <taking@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
|
|
|
8343980c70 |
all: add test for issue 20027
Fixes #20027 Change-Id: Ia616d43c0affa7b927ddfb53755072c94ba27917 Reviewed-on: https://go-review.googlesource.com/c/go/+/612618 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Tim King <taking@google.com> |
|
|
|
f243cf6016 |
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on loong64
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values like arm64 and mips64.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 15.98n ± 0% 15.94n ± 0% -0.25% (p=0.000 n=20)
Acosh 27.75n ± 0% 25.56n ± 0% -7.89% (p=0.000 n=20)
Asin 15.85n ± 0% 15.76n ± 0% -0.57% (p=0.000 n=20)
Asinh 39.79n ± 0% 37.69n ± 0% -5.28% (p=0.000 n=20)
Atan 7.261n ± 0% 7.242n ± 0% -0.27% (p=0.000 n=20)
Atanh 28.30n ± 0% 27.62n ± 0% -2.40% (p=0.000 n=20)
Atan2 15.85n ± 0% 15.75n ± 0% -0.63% (p=0.000 n=20)
Cbrt 27.02n ± 0% 21.08n ± 0% -21.98% (p=0.000 n=20)
Ceil 2.830n ± 1% 2.896n ± 1% +2.31% (p=0.000 n=20)
Copysign 0.8022n ± 0% 0.8004n ± 0% -0.22% (p=0.000 n=20)
Cos 11.64n ± 0% 11.61n ± 0% -0.26% (p=0.000 n=20)
Cosh 35.98n ± 0% 33.44n ± 0% -7.05% (p=0.000 n=20)
Erf 10.09n ± 0% 10.08n ± 0% -0.10% (p=0.000 n=20)
Erfc 11.40n ± 0% 11.35n ± 0% -0.44% (p=0.000 n=20)
Erfinv 12.31n ± 0% 12.29n ± 0% -0.16% (p=0.000 n=20)
Erfcinv 12.16n ± 0% 12.17n ± 0% +0.08% (p=0.000 n=20)
Exp 28.41n ± 0% 26.44n ± 0% -6.95% (p=0.000 n=20)
ExpGo 28.68n ± 0% 27.07n ± 0% -5.60% (p=0.000 n=20)
Expm1 17.21n ± 0% 16.75n ± 0% -2.67% (p=0.000 n=20)
Exp2 24.71n ± 0% 23.01n ± 0% -6.88% (p=0.000 n=20)
Exp2Go 25.17n ± 0% 23.91n ± 0% -4.99% (p=0.000 n=20)
Abs 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.224 n=20)
Dim 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
Floor 2.848n ± 0% 2.859n ± 0% +0.39% (p=0.000 n=20)
Max 3.074n ± 0% 3.071n ± 0% ~ (p=0.481 n=20)
Min 3.179n ± 0% 3.176n ± 0% -0.09% (p=0.003 n=20)
Mod 49.62n ± 0% 44.82n ± 0% -9.67% (p=0.000 n=20)
Frexp 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=20)
Gamma 18.01n ± 0% 17.61n ± 0% -2.22% (p=0.000 n=20)
Hypot 7.204n ± 0% 7.604n ± 0% +5.55% (p=0.000 n=20)
HypotGo 7.204n ± 0% 7.604n ± 0% +5.56% (p=0.000 n=20)
Ilogb 6.003n ± 0% 6.003n ± 0% ~ (p=0.407 n=20)
J0 76.43n ± 0% 76.24n ± 0% -0.25% (p=0.000 n=20)
J1 76.44n ± 0% 76.44n ± 0% ~ (p=1.000 n=20)
Jn 168.2n ± 0% 168.5n ± 0% +0.18% (p=0.000 n=20)
Ldexp 8.804n ± 0% 7.604n ± 0% -13.63% (p=0.000 n=20)
Lgamma 19.01n ± 0% 19.01n ± 0% ~ (p=0.695 n=20)
Log 19.38n ± 0% 19.12n ± 0% -1.34% (p=0.000 n=20)
Logb 6.003n ± 0% 6.003n ± 0% ~ (p=1.000 n=20)
Log1p 18.57n ± 0% 16.72n ± 0% -9.96% (p=0.000 n=20)
Log10 20.67n ± 0% 20.45n ± 0% -1.06% (p=0.000 n=20)
Log2 9.605n ± 0% 8.804n ± 0% -8.34% (p=0.000 n=20)
Modf 4.402n ± 0% 4.402n ± 0% ~ (p=1.000 n=20)
Nextafter32 7.204n ± 0% 5.603n ± 0% -22.22% (p=0.000 n=20)
Nextafter64 6.803n ± 0% 6.003n ± 0% -11.76% (p=0.000 n=20)
PowInt 39.62n ± 0% 37.22n ± 0% -6.06% (p=0.000 n=20)
PowFrac 120.9n ± 0% 108.9n ± 0% -9.93% (p=0.000 n=20)
Pow10Pos 1.601n ± 0% 1.601n ± 0% ~ (p=0.487 n=20)
Pow10Neg 2.675n ± 0% 2.675n ± 0% ~ (p=1.000 n=20)
Round 3.018n ± 0% 2.401n ± 0% -20.46% (p=0.000 n=20)
RoundToEven 3.822n ± 0% 3.001n ± 0% -21.48% (p=0.000 n=20)
Remainder 45.62n ± 0% 42.42n ± 0% -7.01% (p=0.000 n=20)
Signbit 0.9075n ± 0% 0.8004n ± 0% -11.81% (p=0.000 n=20)
Sin 12.65n ± 0% 12.65n ± 0% ~ (p=0.503 n=20)
Sincos 14.81n ± 0% 14.60n ± 0% -1.42% (p=0.000 n=20)
Sinh 36.75n ± 0% 35.11n ± 0% -4.46% (p=0.000 n=20)
SqrtIndirect 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
SqrtLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtIndirectLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtGoLatency 52.85n ± 0% 40.82n ± 0% -22.76% (p=0.000 n=20)
SqrtPrime 887.4n ± 0% 887.4n ± 0% ~ (p=0.751 n=20)
Tan 13.95n ± 0% 13.97n ± 0% +0.18% (p=0.000 n=20)
Tanh 36.79n ± 0% 34.89n ± 0% -5.16% (p=0.000 n=20)
Trunc 2.849n ± 0% 2.861n ± 0% +0.42% (p=0.000 n=20)
Y0 77.44n ± 0% 77.64n ± 0% +0.26% (p=0.000 n=20)
Y1 74.41n ± 0% 74.33n ± 0% -0.11% (p=0.000 n=20)
Yn 158.7n ± 0% 159.0n ± 0% +0.19% (p=0.000 n=20)
Float64bits 0.8774n ± 0% 0.4002n ± 0% -54.39% (p=0.000 n=20)
Float64frombits 0.8042n ± 0% 0.4002n ± 0% -50.24% (p=0.000 n=20)
Float32bits 1.1230n ± 0% 0.5336n ± 0% -52.48% (p=0.000 n=20)
Float32frombits 1.0670n ± 0% 0.8004n ± 0% -24.99% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.605 n=20)
geomean 10.87n 10.10n -7.15%
¹ all samples are equal
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 33.10n ± 0% 31.95n ± 2% -3.46% (p=0.000 n=20)
Acosh 58.38n ± 0% 50.44n ± 0% -13.60% (p=0.000 n=20)
Asin 32.70n ± 0% 31.94n ± 0% -2.32% (p=0.000 n=20)
Asinh 57.65n ± 0% 50.83n ± 0% -11.82% (p=0.000 n=20)
Atan 14.21n ± 0% 14.21n ± 0% ~ (p=0.501 n=20)
Atanh 60.86n ± 0% 54.44n ± 0% -10.56% (p=0.000 n=20)
Atan2 32.02n ± 0% 34.02n ± 0% +6.25% (p=0.000 n=20)
Cbrt 55.58n ± 0% 40.64n ± 0% -26.88% (p=0.000 n=20)
Ceil 9.566n ± 0% 9.566n ± 0% ~ (p=0.463 n=20)
Copysign 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.806 n=20)
Cos 18.02n ± 0% 18.02n ± 0% ~ (p=0.191 n=20)
Cosh 64.44n ± 0% 65.64n ± 0% +1.86% (p=0.000 n=20)
Erf 16.15n ± 0% 16.16n ± 0% ~ (p=0.770 n=20)
Erfc 18.71n ± 0% 18.83n ± 0% +0.61% (p=0.000 n=20)
Erfinv 19.33n ± 0% 19.34n ± 0% ~ (p=0.513 n=20)
Erfcinv 18.90n ± 0% 19.78n ± 0% +4.63% (p=0.000 n=20)
Exp 50.04n ± 0% 49.66n ± 0% -0.75% (p=0.000 n=20)
ExpGo 50.03n ± 0% 50.03n ± 0% ~ (p=0.723 n=20)
Expm1 28.41n ± 0% 28.27n ± 0% -0.49% (p=0.000 n=20)
Exp2 50.08n ± 0% 51.23n ± 0% +2.31% (p=0.000 n=20)
Exp2Go 49.77n ± 0% 49.89n ± 0% +0.24% (p=0.000 n=20)
Abs 0.8009n ± 0% 0.8006n ± 0% ~ (p=0.317 n=20)
Dim 1.987n ± 0% 1.993n ± 0% +0.28% (p=0.001 n=20)
Floor 8.543n ± 0% 8.548n ± 0% ~ (p=0.509 n=20)
Max 6.670n ± 0% 6.672n ± 0% ~ (p=0.335 n=20)
Min 6.694n ± 0% 6.694n ± 0% ~ (p=0.459 n=20)
Mod 56.44n ± 0% 53.23n ± 0% -5.70% (p=0.000 n=20)
Frexp 8.409n ± 0% 7.606n ± 0% -9.55% (p=0.000 n=20)
Gamma 35.64n ± 0% 35.23n ± 0% -1.15% (p=0.000 n=20)
Hypot 11.21n ± 0% 10.61n ± 0% -5.31% (p=0.000 n=20)
HypotGo 11.50n ± 0% 11.01n ± 0% -4.30% (p=0.000 n=20)
Ilogb 7.606n ± 0% 6.804n ± 0% -10.54% (p=0.000 n=20)
J0 125.3n ± 0% 126.5n ± 0% +0.96% (p=0.000 n=20)
J1 124.9n ± 0% 125.3n ± 0% +0.32% (p=0.000 n=20)
Jn 264.3n ± 0% 265.9n ± 0% +0.61% (p=0.000 n=20)
Ldexp 9.606n ± 0% 9.204n ± 0% -4.19% (p=0.000 n=20)
Lgamma 38.82n ± 0% 38.85n ± 0% +0.06% (p=0.019 n=20)
Log 38.44n ± 0% 28.04n ± 0% -27.06% (p=0.000 n=20)
Logb 8.405n ± 0% 7.605n ± 0% -9.52% (p=0.000 n=20)
Log1p 31.62n ± 0% 27.11n ± 0% -14.26% (p=0.000 n=20)
Log10 38.83n ± 0% 28.42n ± 0% -26.81% (p=0.000 n=20)
Log2 11.21n ± 0% 10.41n ± 0% -7.14% (p=0.000 n=20)
Modf 5.204n ± 0% 5.205n ± 0% ~ (p=0.983 n=20)
Nextafter32 8.809n ± 0% 7.208n ± 0% -18.18% (p=0.000 n=20)
Nextafter64 8.405n ± 0% 8.406n ± 0% +0.01% (p=0.007 n=20)
PowInt 48.83n ± 0% 44.78n ± 0% -8.28% (p=0.000 n=20)
PowFrac 146.9n ± 0% 142.1n ± 0% -3.23% (p=0.000 n=20)
Pow10Pos 2.334n ± 0% 2.333n ± 0% ~ (p=0.110 n=20)
Pow10Neg 4.803n ± 0% 4.803n ± 0% ~ (p=0.130 n=20)
Round 4.816n ± 0% 3.819n ± 0% -20.70% (p=0.000 n=20)
RoundToEven 5.735n ± 0% 5.204n ± 0% -9.26% (p=0.000 n=20)
Remainder 52.05n ± 0% 49.64n ± 0% -4.63% (p=0.000 n=20)
Signbit 1.201n ± 0% 1.001n ± 0% -16.65% (p=0.000 n=20)
Sin 20.63n ± 0% 20.64n ± 0% +0.05% (p=0.040 n=20)
Sincos 23.82n ± 0% 24.62n ± 0% +3.36% (p=0.000 n=20)
Sinh 71.25n ± 0% 68.44n ± 0% -3.94% (p=0.000 n=20)
SqrtIndirect 2.001n ± 0% 2.001n ± 0% ~ (p=0.182 n=20)
SqrtLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.754 n=20)
SqrtIndirectLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.773 n=20)
SqrtGoLatency 60.84n ± 0% 81.26n ± 0% +33.56% (p=0.000 n=20)
SqrtPrime 1.791µ ± 0% 1.791µ ± 0% ~ (p=0.784 n=20)
Tan 27.22n ± 0% 27.22n ± 0% ~ (p=0.819 n=20)
Tanh 70.88n ± 0% 69.04n ± 0% -2.60% (p=0.000 n=20)
Trunc 8.543n ± 0% 8.543n ± 0% ~ (p=0.784 n=20)
Y0 122.9n ± 0% 122.9n ± 0% ~ (p=0.559 n=20)
Y1 123.3n ± 0% 121.7n ± 0% -1.30% (p=0.000 n=20)
Yn 263.0n ± 0% 262.6n ± 0% -0.15% (p=0.000 n=20)
Float64bits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float64frombits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float32bits 1.7010n ± 0% 0.8005n ± 0% -52.94% (p=0.000 n=20)
Float32frombits 1.5010n ± 0% 0.8005n ± 0% -46.67% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.238 n=20)
geomean 17.41n 16.15n -7.19%
Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/604737
Auto-Submit: Tim King <taking@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
|
|
|
|
2c5b707b3b |
cmd/compile: optimize RotateLeft8/16 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
RotateLeft8 1.401n ± 0% 1.201n ± 0% -14.28% (p=0.000 n=20)
RotateLeft16 1.4010n ± 0% 0.8032n ± 0% -42.67% (p=0.000 n=20)
geomean 1.401n 0.9822n -29.90%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
RotateLeft8 1.576n ± 0% 1.310n ± 0% -16.88% (p=0.000 n=20)
RotateLeft16 1.576n ± 0% 1.166n ± 0% -26.02% (p=0.000 n=20)
geomean 1.576n 1.236n -21.58%
Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604738
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
|