go/src
Guoqi Chen 1763ee199d runtime: optimizing memclrNoHeapPointers implementation using SIMD on loong64
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                        |  bench.old   |            bench.new.256             |
                        |    sec/op    |    sec/op     vs base                |
Memclr/5                   3.204n ± 0%    2.804n ± 0%  -12.48% (p=0.000 n=10)
Memclr/16                  3.204n ± 0%    3.204n ± 0%        ~ (p=0.465 n=10)
Memclr/64                  5.267n ± 0%    4.005n ± 0%  -23.96% (p=0.000 n=10)
Memclr/256                10.280n ± 0%    5.400n ± 0%  -47.47% (p=0.000 n=10)
Memclr/4096               107.00n ± 1%    30.24n ± 0%  -71.74% (p=0.000 n=10)
Memclr/65536              1675.0n ± 0%    431.1n ± 0%  -74.26% (p=0.000 n=10)
Memclr/1M                  52.61µ ± 0%    32.82µ ± 0%  -37.62% (p=0.000 n=10)
Memclr/4M                  210.3µ ± 0%    131.3µ ± 0%  -37.59% (p=0.000 n=10)
Memclr/8M                  420.3µ ± 0%    262.5µ ± 0%  -37.54% (p=0.000 n=10)
Memclr/16M                 857.4µ ± 1%    542.9µ ± 3%  -36.68% (p=0.000 n=10)
Memclr/64M                 3.658m ± 3%    2.173m ± 0%  -40.59% (p=0.000 n=10)
MemclrUnaligned/0_5        4.264n ± 1%    4.359n ± 0%   +2.23% (p=0.000 n=10)
MemclrUnaligned/0_16       4.595n ± 0%    4.599n ± 0%   +0.10% (p=0.020 n=10)
MemclrUnaligned/0_64       5.356n ± 0%    5.122n ± 0%   -4.37% (p=0.000 n=10)
MemclrUnaligned/0_256     10.370n ± 0%    5.907n ± 1%  -43.03% (p=0.000 n=10)
MemclrUnaligned/0_4096    107.10n ± 0%    37.35n ± 0%  -65.13% (p=0.000 n=10)
MemclrUnaligned/0_65536   1694.0n ± 0%    441.7n ± 0%  -73.93% (p=0.000 n=10)
MemclrUnaligned/1_5        4.272n ± 0%    4.348n ± 0%   +1.76% (p=0.000 n=10)
MemclrUnaligned/1_16       4.593n ± 0%    4.608n ± 0%   +0.33% (p=0.002 n=10)
MemclrUnaligned/1_64       7.610n ± 0%    5.293n ± 0%  -30.45% (p=0.000 n=10)
MemclrUnaligned/1_256     12.230n ± 0%    9.012n ± 0%  -26.31% (p=0.000 n=10)
MemclrUnaligned/1_4096    114.10n ± 0%    39.50n ± 0%  -65.38% (p=0.000 n=10)
MemclrUnaligned/1_65536   1705.0n ± 0%    468.8n ± 0%  -72.50% (p=0.000 n=10)
MemclrUnaligned/4_5        4.283n ± 1%    4.346n ± 0%   +1.48% (p=0.000 n=10)
MemclrUnaligned/4_16       4.599n ± 0%    4.605n ± 0%   +0.12% (p=0.000 n=10)
MemclrUnaligned/4_64       7.572n ± 1%    5.283n ± 0%  -30.24% (p=0.000 n=10)
MemclrUnaligned/4_256     12.215n ± 0%    9.212n ± 0%  -24.58% (p=0.000 n=10)
MemclrUnaligned/4_4096    114.35n ± 0%    39.48n ± 0%  -65.47% (p=0.000 n=10)
MemclrUnaligned/4_65536   1705.0n ± 0%    469.2n ± 0%  -72.48% (p=0.000 n=10)
MemclrUnaligned/7_5        4.296n ± 1%    4.349n ± 0%   +1.22% (p=0.000 n=10)
MemclrUnaligned/7_16       4.601n ± 0%    4.606n ± 0%   +0.11% (p=0.004 n=10)
MemclrUnaligned/7_64       7.609n ± 0%    5.296n ± 1%  -30.39% (p=0.000 n=10)
MemclrUnaligned/7_256     12.200n ± 0%    9.011n ± 0%  -26.14% (p=0.000 n=10)
MemclrUnaligned/7_4096    114.00n ± 0%    39.51n ± 0%  -65.34% (p=0.000 n=10)
MemclrUnaligned/7_65536   1704.0n ± 0%    469.5n ± 0%  -72.45% (p=0.000 n=10)
MemclrUnaligned/0_1M       52.57µ ± 0%    32.83µ ± 0%  -37.54% (p=0.000 n=10)
MemclrUnaligned/0_4M       210.1µ ± 0%    131.3µ ± 0%  -37.53% (p=0.000 n=10)
MemclrUnaligned/0_8M       420.8µ ± 0%    262.5µ ± 0%  -37.62% (p=0.000 n=10)
MemclrUnaligned/0_16M      846.2µ ± 0%    528.4µ ± 0%  -37.56% (p=0.000 n=10)
MemclrUnaligned/0_64M      3.425m ± 1%    2.187m ± 3%  -36.16% (p=0.000 n=10)
MemclrUnaligned/1_1M       52.56µ ± 0%    32.84µ ± 0%  -37.52% (p=0.000 n=10)
MemclrUnaligned/1_4M       210.5µ ± 0%    131.3µ ± 0%  -37.62% (p=0.000 n=10)
MemclrUnaligned/1_8M       420.5µ ± 0%    262.7µ ± 0%  -37.53% (p=0.000 n=10)
MemclrUnaligned/1_16M      845.2µ ± 0%    528.3µ ± 0%  -37.49% (p=0.000 n=10)
MemclrUnaligned/1_64M      3.381m ± 0%    2.243m ± 3%  -33.66% (p=0.000 n=10)
MemclrUnaligned/4_1M       52.56µ ± 0%    32.85µ ± 0%  -37.50% (p=0.000 n=10)
MemclrUnaligned/4_4M       210.1µ ± 0%    131.3µ ± 0%  -37.49% (p=0.000 n=10)
MemclrUnaligned/4_8M       420.0µ ± 0%    262.6µ ± 0%  -37.48% (p=0.000 n=10)
MemclrUnaligned/4_16M      844.8µ ± 0%    528.7µ ± 0%  -37.41% (p=0.000 n=10)
MemclrUnaligned/4_64M      3.382m ± 1%    2.211m ± 4%  -34.63% (p=0.000 n=10)
MemclrUnaligned/7_1M       52.59µ ± 0%    32.84µ ± 0%  -37.56% (p=0.000 n=10)
MemclrUnaligned/7_4M       210.2µ ± 0%    131.3µ ± 0%  -37.54% (p=0.000 n=10)
MemclrUnaligned/7_8M       420.1µ ± 0%    262.7µ ± 0%  -37.47% (p=0.000 n=10)
MemclrUnaligned/7_16M      845.1µ ± 0%    528.7µ ± 0%  -37.43% (p=0.000 n=10)
MemclrUnaligned/7_64M      3.369m ± 0%    2.313m ± 1%  -31.34% (p=0.000 n=10)
MemclrRange/1K_2K         2707.0n ± 0%    972.4n ± 0%  -64.08% (p=0.000 n=10)
MemclrRange/2K_8K          8.816µ ± 0%    2.519µ ± 0%  -71.43% (p=0.000 n=10)
MemclrRange/4K_16K         8.333µ ± 0%    2.240µ ± 0%  -73.12% (p=0.000 n=10)
MemclrRange/160K_228K      83.47µ ± 0%    31.27µ ± 0%  -62.54% (p=0.000 n=10)
MemclrKnownSize1          0.4003n ± 0%   0.4004n ± 0%        ~ (p=0.119 n=10)
MemclrKnownSize2          0.4003n ± 0%   0.4005n ± 0%        ~ (p=0.069 n=10)
MemclrKnownSize4          0.4003n ± 0%   0.4005n ± 0%        ~ (p=0.100 n=10)
MemclrKnownSize8          0.4003n ± 0%   0.4004n ± 0%   +0.04% (p=0.047 n=10)
MemclrKnownSize16         0.8011n ± 0%   0.8012n ± 0%        ~ (p=0.926 n=10)
MemclrKnownSize32          1.602n ± 0%    1.602n ± 0%        ~ (p=0.772 n=10)
MemclrKnownSize64          2.405n ± 0%    2.404n ± 0%        ~ (p=0.780 n=10)
MemclrKnownSize112         2.804n ± 0%    2.804n ± 0%        ~ (p=0.538 n=10)
MemclrKnownSize128         3.204n ± 0%    3.205n ± 0%        ~ (p=0.105 n=10)
MemclrKnownSize192         4.808n ± 0%    4.807n ± 0%        ~ (p=0.688 n=10)
MemclrKnownSize248         6.347n ± 0%    6.346n ± 0%        ~ (p=0.133 n=10)
MemclrKnownSize256         6.560n ± 0%    6.573n ± 0%   +0.19% (p=0.001 n=10)
MemclrKnownSize512        13.010n ± 0%    6.809n ± 0%  -47.66% (p=0.000 n=10)
MemclrKnownSize1024       25.830n ± 0%    8.412n ± 0%  -67.43% (p=0.000 n=10)
MemclrKnownSize4096       102.70n ± 0%    27.64n ± 0%  -73.09% (p=0.000 n=10)
MemclrKnownSize512KiB      26.30µ ± 0%    16.42µ ± 0%  -37.59% (p=0.000 n=10)
geomean                    629.8n         393.2n       -37.57%

Change-Id: I2b9fe834c31d786d2e30cc02c65a6f9c455c4e8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/657835
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-03-26 17:58:32 -07:00
..
archive archive/zip: error on ReadDir if there are invalid file names 2025-03-11 15:19:38 -07:00
arena
bufio
builtin
bytes bytes: speed up Replace 2025-03-25 13:02:53 -07:00
cmd cmd/internal/obj/arm64: add support for BTI instruction 2025-03-26 03:48:50 -07:00
cmp cmp: add examples for Compare and Less 2025-03-11 15:22:39 -07:00
compress compress/flate,compress/lzw: fix incorrect godoc links 2025-03-07 11:33:01 -08:00
container
context context: skip allocs test with -asan 2025-03-04 16:37:03 -08:00
crypto crypto/tls: add missing RUnlock in ticketKeys 2025-03-20 08:08:47 -07:00
database/sql
debug debug/elf: add riscv attributes definitions 2025-03-14 15:08:23 -07:00
embed
encoding encoding/asn1: make sure implicit fields roundtrip 2025-03-14 11:40:43 -07:00
errors
expvar
flag flag: replace interface{} -> any for textValue.Get method 2025-02-28 08:43:46 -08:00
fmt
go go/build: prioritize build constraints in docs 2025-03-24 10:22:30 -07:00
hash
html html/template: document comment stripping 2025-03-17 08:52:14 -07:00
image
index/suffixarray
internal internal/poll: support async file operations on Windows 2025-03-26 13:05:03 -07:00
io std: add //go:fix inline directives to some deprecated functions 2025-02-15 08:06:58 -08:00
iter
log log/slog: Handler doc points to handler guide 2025-03-26 11:06:06 -07:00
maps
math cmd/asm: add LCDBR instruction on s390x 2025-03-24 07:05:55 -07:00
mime mime/multipart: add helper to build content-disposition header contents 2025-03-12 16:20:01 -07:00
net net: run unix socket stream tests on Windows 2025-03-26 13:22:29 -07:00
os os: add Root.Link 2025-03-24 07:53:38 -07:00
path path/filepath: use RtlIsDosDeviceName_U to detect Windows devices 2025-02-19 09:41:00 -08:00
plugin
reflect reflect: document Method(ByName) w.r.t dead code elimination 2025-03-17 13:09:29 -07:00
regexp
runtime runtime: optimizing memclrNoHeapPointers implementation using SIMD on loong64 2025-03-26 17:58:32 -07:00
slices
sort
strconv strconv: use builtin min function in commonPrefixLenIgnoreCase 2025-02-24 17:38:00 -08:00
strings strings: don't assert on Replace's allocs for ASAN 2025-03-20 14:49:24 -07:00
structs
sync sync: document behavior of Map.Delete when key is not present 2025-03-05 03:28:07 -08:00
syscall syscall: use testing.T.Context 2025-03-13 05:24:54 -07:00
testdata
testing testing/slogtest: test nested groups in empty record 2025-03-25 19:19:09 -07:00
text text/template: add an if func example 2025-03-07 11:32:13 -08:00
time runtime, time: don't use monotonic clock inside synctest bubbles 2025-03-18 10:50:51 -07:00
unicode unicode/utf8: use builtin max function to simplify code 2025-02-25 10:16:36 -08:00
unique unique: use runtime.AddCleanup instead of runtime.SetFinalizer 2025-02-24 09:11:32 -08:00
unsafe
vendor all: update golang.org/x/net 2025-03-04 13:19:15 -08:00
weak weak: test the use of runtime.AddCleanup 2025-02-25 08:44:32 -08:00
Make.dist
README.vendor
all.bash
all.bat
all.rc
bootstrap.bash
buildall.bash
clean.bash
clean.bat
clean.rc
cmp.bash
go.mod all: update golang.org/x/net 2025-03-04 13:19:15 -08:00
go.sum all: update golang.org/x/net 2025-03-04 13:19:15 -08:00
make.bash
make.bat
make.rc
race.bash
race.bat
run.bash
run.bat
run.rc

README.vendor

Vendoring in std and cmd
========================

The Go command maintains copies of external packages needed by the
standard library in the src/vendor and src/cmd/vendor directories.

There are two modules, std and cmd, defined in src/go.mod and
src/cmd/go.mod. When a package outside std or cmd is imported
by a package inside std or cmd, the import path is interpreted
as if it had a "vendor/" prefix. For example, within "crypto/tls",
an import of "golang.org/x/crypto/cryptobyte" resolves to
"vendor/golang.org/x/crypto/cryptobyte". When a package with the
same path is imported from a package outside std or cmd, it will
be resolved normally. Consequently, a binary may be built with two
copies of a package at different versions if the package is
imported normally and vendored by the standard library.

Vendored packages are internally renamed with a "vendor/" prefix
to preserve the invariant that all packages have distinct paths.
This is necessary to avoid compiler and linker conflicts. Adding
a "vendor/" prefix also maintains the invariant that standard
library packages begin with a dotless path element.

The module requirements of std and cmd do not influence version
selection in other modules. They are only considered when running
module commands like 'go get' and 'go mod vendor' from a directory
in GOROOT/src.

Maintaining vendor directories
==============================

Before updating vendor directories, ensure that module mode is enabled.
Make sure that GO111MODULE is not set in the environment, or that it is
set to 'on' or 'auto', and if you use a go.work file, set GOWORK=off.

Also, ensure that 'go env GOROOT' shows the root of this Go source
tree. Otherwise, the results are undefined. It's recommended to build
Go from source and use that 'go' binary to update its source tree.

Requirements may be added, updated, and removed with 'go get'.
The vendor directory may be updated with 'go mod vendor'.
A typical sequence might be:

    cd src  # or src/cmd
    go get golang.org/x/net@master
    go mod tidy
    go mod vendor

Use caution when passing '-u' to 'go get'. The '-u' flag updates
modules providing all transitively imported packages, not only
the module providing the target package.

Note that 'go mod vendor' only copies packages that are transitively
imported by packages in the current module. If a new package is needed,
it should be imported before running 'go mod vendor'.