Commit Graph

5200 Commits

Author SHA1 Message Date
thepudds 8163ea1458 weak: prevent unsafe conversions using weak pointers
Prevent conversions between Pointer types,
like we do for sync/atomic.Pointer.

Fixes #71583

Change-Id: I20e83106d8a27996f221e6cd9d52637b0442cea4
Reviewed-on: https://go-review.googlesource.com/c/go/+/647195
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-02-06 13:16:59 -08:00
Jakub Ciolek cd595be6d6 cmd/compile: prefer an add when shifting left by 1
ADD(Q|L) has generally twice the throughput.

Came up in CL 626998.

Throughput by arch:

Zen 4:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.25

Intel Alder Lake:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.2

Intel Haswell:

SHLL (R64, 1):   0.5
ADD  (R64, R64): 0.25

Also include a minor opt for:

(x + x) << c -> x << (c + 1)

Before this, the code:

func addShift(x int64) int64 {
    return (x + x) << 1
}

emitted two instructions:

        ADDQ    AX, AX
        SHLQ    $1, AX

but we can do it in a single shift:

        SHLQ    $2, AX

Add a codegen test for clearing the last bit.

compilecmp linux/amd64:

math
math.sqrt 243 -> 242  (-0.41%)

math [cmd/compile]
math.sqrt 243 -> 242  (-0.41%)

runtime
runtime.selectgo 5455 -> 5445  (-0.18%)
runtime.sysargs 665 -> 662  (-0.45%)
runtime.isPinned 145 -> 141  (-2.76%)
runtime.atoi64 198 -> 194  (-2.02%)
runtime.setPinned 714 -> 709  (-0.70%)

runtime [cmd/compile]
runtime.sysargs 665 -> 662  (-0.45%)
runtime.setPinned 714 -> 709  (-0.70%)
runtime.atoi64 198 -> 194  (-2.02%)
runtime.isPinned 145 -> 141  (-2.76%)

strconv
strconv.computeBounds 109 -> 107  (-1.83%)
strconv.FormatInt 201 -> 197  (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266  (-2.47%)
strconv.small 144 -> 134  (-6.94%)
strconv.AppendInt 357 -> 344  (-3.64%)
strconv.ryuDigits32 490 -> 488  (-0.41%)
strconv.AppendUint 342 -> 340  (-0.58%)

strconv [cmd/compile]
strconv.FormatInt 201 -> 197  (-1.99%)
strconv.ryuFtoaShortest 1298 -> 1266  (-2.47%)
strconv.ryuDigits32 490 -> 488  (-0.41%)
strconv.AppendUint 342 -> 340  (-0.58%)
strconv.computeBounds 109 -> 107  (-1.83%)
strconv.small 144 -> 134  (-6.94%)
strconv.AppendInt 357 -> 344  (-3.64%)

image
image.Rectangle.Inset 101 -> 97  (-3.96%)

regexp/syntax
regexp/syntax.inCharClass.func1 111 -> 110  (-0.90%)
regexp/syntax.(*compiler).quest 586 -> 573  (-2.22%)
regexp/syntax.ranges.Less 153 -> 150  (-1.96%)
regexp/syntax.(*compiler).loop 583 -> 568  (-2.57%)

time
time.Time.Before 179 -> 161  (-10.06%)
time.Time.Compare 189 -> 166  (-12.17%)
time.Time.Sub 444 -> 425  (-4.28%)
time.Time.UnixMicro 106 -> 95  (-10.38%)
time.div 592 -> 587  (-0.84%)
time.Time.UnixNano 85 -> 78  (-8.24%)
time.(*Time).UnixMilli 141 -> 140  (-0.71%)
time.Time.UnixMilli 106 -> 95  (-10.38%)
time.(*Time).UnixMicro 141 -> 140  (-0.71%)
time.Time.After 179 -> 161  (-10.06%)
time.Time.Equal 170 -> 150  (-11.76%)
time.Time.AppendBinary 766 -> 757  (-1.17%)
time.Time.IsZero 74 -> 66  (-10.81%)
time.(*Time).UnixNano 124 -> 113  (-8.87%)
time.(*Time).IsZero 113 -> 108  (-4.42%)

regexp
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569  (-3.56%)
regexp.QuoteMeta 485 -> 469  (-3.30%)

regexp/syntax [cmd/compile]
regexp/syntax.inCharClass.func1 111 -> 110  (-0.90%)
regexp/syntax.(*compiler).loop 583 -> 568  (-2.57%)
regexp/syntax.(*compiler).quest 586 -> 573  (-2.22%)
regexp/syntax.ranges.Less 153 -> 150  (-1.96%)

encoding/base64
encoding/base64.decodedLen 92 -> 90  (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97  (-2.02%)

time [cmd/compile]
time.(*Time).IsZero 113 -> 108  (-4.42%)
time.Time.IsZero 74 -> 66  (-10.81%)
time.(*Time).UnixNano 124 -> 113  (-8.87%)
time.Time.UnixMilli 106 -> 95  (-10.38%)
time.Time.Equal 170 -> 150  (-11.76%)
time.Time.UnixMicro 106 -> 95  (-10.38%)
time.(*Time).UnixMicro 141 -> 140  (-0.71%)
time.Time.Before 179 -> 161  (-10.06%)
time.Time.UnixNano 85 -> 78  (-8.24%)
time.Time.AppendBinary 766 -> 757  (-1.17%)
time.div 592 -> 587  (-0.84%)
time.Time.After 179 -> 161  (-10.06%)
time.Time.Compare 189 -> 166  (-12.17%)
time.(*Time).UnixMilli 141 -> 140  (-0.71%)
time.Time.Sub 444 -> 425  (-4.28%)

index/suffixarray
index/suffixarray.sais_8_32 1677 -> 1645  (-1.91%)
index/suffixarray.sais_32 1677 -> 1645  (-1.91%)
index/suffixarray.sais_64 1677 -> 1654  (-1.37%)
index/suffixarray.sais_8_64 1677 -> 1654  (-1.37%)
index/suffixarray.writeInt 249 -> 247  (-0.80%)

os
os.Expand 1070 -> 1051  (-1.78%)
os.Chtimes 787 -> 774  (-1.65%)

regexp [cmd/compile]
regexp.(*Regexp).FindAllStringSubmatch.func1 590 -> 569  (-3.56%)
regexp.QuoteMeta 485 -> 469  (-3.30%)

encoding/base64 [cmd/compile]
encoding/base64.decodedLen 92 -> 90  (-2.17%)
encoding/base64.(*Encoding).DecodedLen 99 -> 97  (-2.02%)

encoding/hex
encoding/hex.Encode 138 -> 136  (-1.45%)
encoding/hex.(*decoder).Read 830 -> 824  (-0.72%)

crypto/des
crypto/des.initFeistelBox 235 -> 229  (-2.55%)
crypto/des.cryptBlock 549 -> 538  (-2.00%)

os [cmd/compile]
os.Chtimes 787 -> 774  (-1.65%)
os.Expand 1070 -> 1051  (-1.78%)

math/big
math/big.newFloat 238 -> 223  (-6.30%)
math/big.nat.mul 2138 -> 2122  (-0.75%)
math/big.karatsubaSqr 1372 -> 1369  (-0.22%)
math/big.(*Float).sqrtInverse 895 -> 878  (-1.90%)
math/big.basicSqr 1032 -> 1017  (-1.45%)

cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.TimeToTimespec 72 -> 66  (-8.33%)

encoding/json
encoding/json.Indent 404 -> 403  (-0.25%)
encoding/json.MarshalIndent 303 -> 297  (-1.98%)

testing
testing.(*T).Deadline 84 -> 82  (-2.38%)
testing.(*M).Run 3545 -> 3525  (-0.56%)

archive/zip
archive/zip.headerFileInfo.ModTime 229 -> 223  (-2.62%)

encoding/gob
encoding/gob.(*encoderState).encodeInt 474 -> 469  (-1.05%)

crypto/elliptic
crypto/elliptic.Marshal 728 -> 714  (-1.92%)

debug/buildinfo
debug/buildinfo.readString 325 -> 315  (-3.08%)

image/png
image/png.(*decoder).readImagePass 10866 -> 10834  (-0.29%)

archive/tar
archive/tar.Header.allowedFormats.func3 1768 -> 1736  (-1.81%)
archive/tar.formatPAXTime 389 -> 358  (-7.97%)
archive/tar.(*Writer).writeGNUHeader 741 -> 727  (-1.89%)
archive/tar.readGNUSparseMap0x1 709 -> 695  (-1.97%)
archive/tar.(*Writer).templateV7Plus 915 -> 909  (-0.66%)

crypto/internal/cryptotest
crypto/internal/cryptotest.TestHash.func4 890 -> 879  (-1.24%)
crypto/internal/cryptotest.TestStream.func6.1 646 -> 645  (-0.15%)
crypto/internal/cryptotest.testCipher.func3 1300 -> 1289  (-0.85%)

internal/pkgbits
internal/pkgbits.(*Encoder).Int64 113 -> 103  (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72  (-2.70%)

testing/quick
testing/quick.(*Config).getRand 316 -> 315  (-0.32%)

log/slog
log/slog.TimeValue 489 -> 479  (-2.04%)

runtime/pprof
runtime/pprof.(*profileBuilder).build 2341 -> 2322  (-0.81%)

internal/coverage/cfile
internal/coverage/cfile.(*emitState).openMetaFile 824 -> 822  (-0.24%)
internal/coverage/cfile.(*emitState).openCounterFile 904 -> 892  (-1.33%)

cmd/internal/objabi
cmd/internal/objabi.expandArgs 1177 -> 1169  (-0.68%)

crypto/ecdsa
crypto/ecdsa.pointFromAffine 1162 -> 1144  (-1.55%)

net
net.minNonzeroTime 313 -> 308  (-1.60%)
net.cgoLookupAddrPTR 812 -> 797  (-1.85%)
net.(*IPNet).String 851 -> 827  (-2.82%)
net.IP.AppendText 488 -> 471  (-3.48%)
net.IPMask.String 281 -> 270  (-3.91%)
net.partialDeadline 374 -> 366  (-2.14%)
net.hexString 249 -> 240  (-3.61%)
net.IP.String 454 -> 453  (-0.22%)

internal/fuzz
internal/fuzz.newPcgRand 240 -> 234  (-2.50%)

crypto/x509
crypto/x509.(*Certificate).isValid 2642 -> 2611  (-1.17%)

cmd/internal/obj/s390x
cmd/internal/obj/s390x.buildop 33676 -> 33644  (-0.10%)

encoding/hex [cmd/compile]
encoding/hex.(*decoder).Read 830 -> 824  (-0.72%)
encoding/hex.Encode 138 -> 136  (-1.45%)

cmd/internal/objabi [cmd/compile]
cmd/internal/objabi.expandArgs 1177 -> 1169  (-0.68%)

math/big [cmd/compile]
math/big.(*Float).sqrtInverse 895 -> 878  (-1.90%)
math/big.nat.mul 2138 -> 2122  (-0.75%)
math/big.karatsubaSqr 1372 -> 1369  (-0.22%)
math/big.basicSqr 1032 -> 1017  (-1.45%)
math/big.newFloat 238 -> 223  (-6.30%)

encoding/json [cmd/compile]
encoding/json.MarshalIndent 303 -> 297  (-1.98%)
encoding/json.Indent 404 -> 403  (-0.25%)

cmd/covdata
main.(*metaMerge).emitCounters 985 -> 973  (-1.22%)

runtime/pprof [cmd/compile]
runtime/pprof.(*profileBuilder).build 2341 -> 2322  (-0.81%)

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).fill 722 -> 703  (-2.63%)

cmd/dist
main.runInstall 19081 -> 19049  (-0.17%)

crypto/tls
crypto/tls.extractPadding 176 -> 175  (-0.57%)
slices.Clone[[]crypto/tls.SignatureScheme,crypto/tls.SignatureScheme] 253 -> 247  (-2.37%)
slices.Clone[[]uint16,uint16] 253 -> 247  (-2.37%)
slices.Clone[[]crypto/tls.CurveID,crypto/tls.CurveID] 253 -> 247  (-2.37%)
crypto/tls.(*Config).cipherSuites 335 -> 326  (-2.69%)
slices.DeleteFunc[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 437 -> 434  (-0.69%)
crypto/tls.dial 1349 -> 1339  (-0.74%)
slices.DeleteFunc[go.shape.[]uint16,go.shape.uint16] 437 -> 434  (-0.69%)

internal/pkgbits [cmd/compile]
internal/pkgbits.(*Encoder).Int64 113 -> 103  (-8.85%)
internal/pkgbits.(*Encoder).rawVarint 74 -> 72  (-2.70%)

cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*source).fill 722 -> 703  (-2.63%)

cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.buildop 33676 -> 33644  (-0.10%)

cmd/go/internal/trace
cmd/go/internal/trace.Flow 910 -> 886  (-2.64%)
cmd/go/internal/trace.(*Span).Done 311 -> 304  (-2.25%)
cmd/go/internal/trace.StartSpan 620 -> 615  (-0.81%)

cmd/internal/script
cmd/internal/script.(*Engine).Execute.func2 534 -> 528  (-1.12%)

cmd/link/internal/loader
cmd/link/internal/loader.(*Loader).SetSymSect 344 -> 338  (-1.74%)

net/http
net/http.(*Transport).queueForIdleConn 1797 -> 1766  (-1.73%)
net/http.(*Transport).getConn 2149 -> 2131  (-0.84%)
net/http.(*http2ClientConn).tooIdleLocked 207 -> 197  (-4.83%)
net/http.(*http2responseWriter).SetWriteDeadline.func1 520 -> 508  (-2.31%)
net/http.(*Cookie).Valid 837 -> 818  (-2.27%)
net/http.(*http2responseWriter).SetReadDeadline 373 -> 357  (-4.29%)
net/http.checkIfRange 701 -> 690  (-1.57%)
net/http.(*http2SettingsFrame).Value 325 -> 298  (-8.31%)
net/http.(*http2SettingsFrame).HasDuplicates 777 -> 767  (-1.29%)
net/http.(*Server).Serve 1746 -> 1739  (-0.40%)
net/http.http2traceGotConn 569 -> 556  (-2.28%)

net/http/pprof
net/http/pprof.collectProfile 242 -> 239  (-1.24%)

cmd/compile/internal/coverage
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438  (-0.23%)

cmd/vendor/golang.org/x/telemetry/internal/upload
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).findWork 4570 -> 4540  (-0.66%)
cmd/vendor/golang.org/x/telemetry/internal/upload.(*uploader).reports 3604 -> 3572  (-0.89%)

cmd/compile/internal/coverage [cmd/compile]
cmd/compile/internal/coverage.metaHashAndLen 439 -> 438  (-0.23%)

cmd/vendor/golang.org/x/text/language
cmd/vendor/golang.org/x/text/language.regionGroupDist 287 -> 284  (-1.05%)

cmd/go/internal/vcweb
cmd/go/internal/vcweb.(*Server).overview.func1 1045 -> 1041  (-0.38%)

cmd/go/internal/vcs
cmd/go/internal/vcs.expand 761 -> 741  (-2.63%)

cmd/compile/internal/inline/inlheur
slices.stableCmpFunc[go.shape.struct 2300 -> 2284  (-0.70%)

cmd/compile/internal/inline/inlheur [cmd/compile]
slices.stableCmpFunc[go.shape.struct 2300 -> 2284  (-0.70%)

cmd/go/internal/modfetch/codehost
cmd/go/internal/modfetch/codehost.bzrParseStat 2217 -> 2213  (-0.18%)

cmd/link/internal/ld
cmd/link/internal/ld.decodetypeStructFieldCount 157 -> 152  (-3.18%)
cmd/link/internal/ld.(*Link).address 12559 -> 12495  (-0.51%)
cmd/link/internal/ld.(*dodataState).allocateDataSections 18345 -> 18205  (-0.76%)
cmd/link/internal/ld.elfshreloc 618 -> 616  (-0.32%)
cmd/link/internal/ld.(*deadcodePass).decodetypeMethods 794 -> 779  (-1.89%)
cmd/link/internal/ld.(*dodataState).assignDsymsToSection 668 -> 663  (-0.75%)
cmd/link/internal/ld.relocSectFn 285 -> 284  (-0.35%)
cmd/link/internal/ld.decodetypeIfaceMethodCount 146 -> 144  (-1.37%)
cmd/link/internal/ld.decodetypeArrayLen 157 -> 152  (-3.18%)

cmd/link/internal/arm64
cmd/link/internal/arm64.gensymlate.func1 895 -> 888  (-0.78%)

cmd/go/internal/modload
cmd/go/internal/modload.queryProxy.func3 1029 -> 1012  (-1.65%)

cmd/go/internal/load
cmd/go/internal/load.(*Package).setBuildInfo 8453 -> 8447  (-0.07%)

cmd/go/internal/clean
cmd/go/internal/clean.runClean 2120 -> 2104  (-0.75%)

cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978  (-1.59%)
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719  (-1.51%)
cmd/compile/internal/ssa.(*debugState).buildLocationLists 3326 -> 3294  (-0.96%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.(*debugState).processValue 9756 -> 9724  (-0.33%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054  (-2.32%)

cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.rewriteValueARM64_OpARM64MOVHstoreidx2 730 -> 719  (-1.51%)
cmd/compile/internal/ssa.(*poset).aliasnodes 2010 -> 1978  (-1.59%)
cmd/compile/internal/ssa.(*poset).mergeroot 1079 -> 1054  (-2.32%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDQconst 3069 -> 2941  (-4.17%)
cmd/compile/internal/ssa.rewriteValueAMD64_OpAMD64ADDLconst 3069 -> 2941  (-4.17%)

file                                                before   after    Δ       %
math/bits.s                                         2352     2354     +2      +0.085%
math/bits [cmd/compile].s                           2352     2354     +2      +0.085%
math.s                                              35675    35674    -1      -0.003%
math [cmd/compile].s                                35675    35674    -1      -0.003%
runtime.s                                           577251   577245   -6      -0.001%
runtime [cmd/compile].s                             642419   642438   +19     +0.003%
sort.s                                              37434    37435    +1      +0.003%
strconv.s                                           48391    48343    -48     -0.099%
sort [cmd/compile].s                                37434    37435    +1      +0.003%
bufio.s                                             21386    21418    +32     +0.150%
strconv [cmd/compile].s                             48391    48343    -48     -0.099%
image.s                                             34978    35022    +44     +0.126%
regexp/syntax.s                                     81719    81781    +62     +0.076%
time.s                                              94341    94184    -157    -0.166%
regexp.s                                            60411    60399    -12     -0.020%
bufio [cmd/compile].s                               21512    21544    +32     +0.149%
encoding/binary.s                                   34062    34087    +25     +0.073%
regexp/syntax [cmd/compile].s                       81719    81781    +62     +0.076%
encoding/base64.s                                   11907    11903    -4      -0.034%
time [cmd/compile].s                                94341    94184    -157    -0.166%
index/suffixarray.s                                 41633    41527    -106    -0.255%
os.s                                                101770   101738   -32     -0.031%
regexp [cmd/compile].s                              60411    60399    -12     -0.020%
encoding/binary [cmd/compile].s                     37173    37198    +25     +0.067%
encoding/base64 [cmd/compile].s                     11907    11903    -4      -0.034%
os/exec.s                                           23900    23907    +7      +0.029%
encoding/hex.s                                      6038     6030     -8      -0.132%
crypto/des.s                                        5073     5056     -17     -0.335%
os [cmd/compile].s                                  102030   101998   -32     -0.031%
vendor/golang.org/x/net/http2/hpack.s               22027    22033    +6      +0.027%
math/big.s                                          164808   164753   -55     -0.033%
cmd/vendor/golang.org/x/sys/unix.s                  121450   121444   -6      -0.005%
encoding/json.s                                     110294   110287   -7      -0.006%
testing.s                                           115303   115281   -22     -0.019%
archive/zip.s                                       65329    65325    -4      -0.006%
os/user.s                                           10078    10080    +2      +0.020%
encoding/gob.s                                      143788   143783   -5      -0.003%
crypto/elliptic.s                                   30686    30704    +18     +0.059%
go/doc/comment.s                                    49401    49433    +32     +0.065%
debug/buildinfo.s                                   9095     9085     -10     -0.110%
image/png.s                                         36113    36081    -32     -0.089%
archive/tar.s                                       71994    71897    -97     -0.135%
crypto/internal/cryptotest.s                        60872    60849    -23     -0.038%
internal/pkgbits.s                                  20441    20429    -12     -0.059%
testing/quick.s                                     8236     8235     -1      -0.012%
log/slog.s                                          77568    77558    -10     -0.013%
internal/trace/internal/oldtrace.s                  52885    52896    +11     +0.021%
runtime/pprof.s                                     123978   123969   -9      -0.007%
internal/coverage/cfile.s                           25198    25184    -14     -0.056%
cmd/internal/objabi.s                               19954    19946    -8      -0.040%
crypto/ecdsa.s                                      29159    29141    -18     -0.062%
log/slog/internal/benchmarks.s                      6694     6695     +1      +0.015%
net.s                                               299569   299503   -66     -0.022%
os/exec [cmd/compile].s                             23888    23895    +7      +0.029%
internal/trace.s                                    179226   179240   +14     +0.008%
internal/fuzz.s                                     86190    86191    +1      +0.001%
crypto/x509.s                                       177195   177164   -31     -0.017%
cmd/internal/obj/s390x.s                            121642   121610   -32     -0.026%
cmd/internal/obj/ppc64.s                            140118   140122   +4      +0.003%
encoding/hex [cmd/compile].s                        6149     6141     -8      -0.130%
cmd/internal/objabi [cmd/compile].s                 19954    19946    -8      -0.040%
cmd/internal/obj/arm64.s                            158523   158555   +32     +0.020%
go/doc/comment [cmd/compile].s                      49512    49544    +32     +0.065%
math/big [cmd/compile].s                            166394   166339   -55     -0.033%
encoding/json [cmd/compile].s                       110712   110705   -7      -0.006%
cmd/covdata.s                                       39699    39687    -12     -0.030%
runtime/pprof [cmd/compile].s                       125209   125200   -9      -0.007%
cmd/compile/internal/syntax.s                       181755   181736   -19     -0.010%
cmd/dist.s                                          177893   177861   -32     -0.018%
crypto/tls.s                                        389157   389113   -44     -0.011%
internal/pkgbits [cmd/compile].s                    41644    41632    -12     -0.029%
cmd/compile/internal/syntax [cmd/compile].s         196105   196086   -19     -0.010%
cmd/compile/internal/types.s                        71315    71345    +30     +0.042%
cmd/internal/obj/s390x [cmd/compile].s              121733   121701   -32     -0.026%
cmd/go/internal/trace.s                             4796     4760     -36     -0.751%
cmd/internal/obj/arm64 [cmd/compile].s              168120   168147   +27     +0.016%
cmd/internal/obj/ppc64 [cmd/compile].s              140219   140223   +4      +0.003%
cmd/internal/script.s                               83442    83436    -6      -0.007%
cmd/link/internal/loader.s                          93299    93294    -5      -0.005%
net/http.s                                          620639   620472   -167    -0.027%
net/http/pprof.s                                    35016    35013    -3      -0.009%
cmd/compile/internal/coverage.s                     6668     6667     -1      -0.015%
cmd/vendor/golang.org/x/telemetry/internal/upload.s 34210    34148    -62     -0.181%
cmd/compile/internal/coverage [cmd/compile].s       6664     6663     -1      -0.015%
cmd/vendor/golang.org/x/text/language.s             48077    48074    -3      -0.006%
cmd/go/internal/vcweb.s                             45193    45189    -4      -0.009%
cmd/go/internal/vcs.s                               44749    44729    -20     -0.045%
cmd/compile/internal/inline/inlheur.s               83758    83742    -16     -0.019%
cmd/compile/internal/inline/inlheur [cmd/compile].s 84773    84757    -16     -0.019%
cmd/go/internal/modfetch/codehost.s                 89098    89094    -4      -0.004%
cmd/trace.s                                         257550   257564   +14     +0.005%
cmd/link/internal/ld.s                              641945   641706   -239    -0.037%
cmd/link/internal/arm64.s                           34805    34798    -7      -0.020%
cmd/go/internal/modload.s                           328971   328954   -17     -0.005%
cmd/go/internal/load.s                              178877   178871   -6      -0.003%
cmd/go/internal/clean.s                             11006    10990    -16     -0.145%
cmd/compile/internal/ssa.s                          3552843  3553347  +504    +0.014%
cmd/compile/internal/ssa [cmd/compile].s            3752511  3753123  +612    +0.016%
total                                               36179015 36178687 -328    -0.001%

Change-Id: I251c2898ccf3c9931d162d87dabbd49cf4ec73a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641757
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-06 09:05:20 -08:00
Jakub Ciolek b7c9cdd53c cmd/compile: establish limits of bool to uint8 conversions
Improves bound check elimination for:

func arrayLargeEnough(b bool, a [2]int64) int64 {
    c := byte(0)
    if b {
        c = 1
    }

    // this bound check gets elided
    return a[c]
}

We also detect never true branches like:

func cCanOnlyBe0or1(b bool) byte {
    var c byte
    if b {
        c = 1
    }
    // this statement can never be true so we can elide it
    if c == 2 {
        c = 3
    }

    return c
}

Hits a few times:

crypto/internal/sysrand
crypto/internal/sysrand.Read 357 -> 349  (-2.24%)

testing
testing.(*F).Fuzz.func1.1 837 -> 828  (-1.08%)

image/png
image/png.(*Encoder).Encode 1735 -> 1733  (-0.12%)

vendor/golang.org/x/crypto/cryptobyte
vendor/golang.org/x/crypto/cryptobyte.(*Builder).callContinuation 187 -> 185  (-1.07%)

crypto/internal/sysrand [cmd/compile]
crypto/internal/sysrand.Read 357 -> 349  (-2.24%)

go/parser
go/parser.(*parser).parseType 463 -> 457  (-1.30%)
go/parser.(*parser).embeddedElem 633 -> 626  (-1.11%)
go/parser.(*parser).parseFuncDecl 917 -> 914  (-0.33%)
go/parser.(*parser).parseDotsType 393 -> 391  (-0.51%)
go/parser.(*parser).error 1061 -> 1054  (-0.66%)
go/parser.(*parser).parseTypeName 537 -> 532  (-0.93%)
go/parser.(*parser).parseParamDecl 1478 -> 1451  (-1.83%)
go/parser.(*parser).parseFuncTypeOrLit 498 -> 495  (-0.60%)
go/parser.(*parser).parseValue 375 -> 371  (-1.07%)
go/parser.(*parser).parseElementList 594 -> 593  (-0.17%)
go/parser.(*parser).parseResult 593 -> 583  (-1.69%)
go/parser.(*parser).parseElement 506 -> 504  (-0.40%)
go/parser.(*parser).parseImportSpec 1110 -> 1108  (-0.18%)
go/parser.(*parser).parseStructType 741 -> 735  (-0.81%)
go/parser.(*parser).parseTypeSpec 1054 -> 1048  (-0.57%)
go/parser.(*parser).parseIdentList 625 -> 623  (-0.32%)
go/parser.(*parser).parseOperand 1221 -> 1199  (-1.80%)
go/parser.(*parser).parseIndexOrSliceOrInstance 2713 -> 2694  (-0.70%)
go/parser.(*parser).parseSwitchStmt 1458 -> 1447  (-0.75%)
go/parser.(*parser).parseArrayFieldOrTypeInstance 1865 -> 1861  (-0.21%)
go/parser.(*parser).parseExpr 307 -> 305  (-0.65%)
go/parser.(*parser).parseSelector 427 -> 425  (-0.47%)
go/parser.(*parser).parseTypeInstance 1433 -> 1420  (-0.91%)
go/parser.(*parser).parseCaseClause 629 -> 626  (-0.48%)
go/parser.(*parser).parseParameterList 4212 -> 4189  (-0.55%)
go/parser.(*parser).parsePointerType 393 -> 391  (-0.51%)
go/parser.(*parser).parseFuncType 465 -> 463  (-0.43%)
go/parser.(*parser).parseTypeAssertion 559 -> 557  (-0.36%)
go/parser.(*parser).parseSimpleStmt 2443 -> 2388  (-2.25%)
go/parser.(*parser).parseCallOrConversion 1093 -> 1087  (-0.55%)
go/parser.(*parser).parseForStmt 2168 -> 2159  (-0.42%)
go/parser.(*parser).embeddedTerm 657 -> 649  (-1.22%)
go/parser.(*parser).parseCommClause 1509 -> 1501  (-0.53%)

cmd/internal/objfile
cmd/internal/objfile.(*goobjFile).symbols 5299 -> 5274  (-0.47%)

net
net.initConfVal 378 -> 374  (-1.06%)
net.(*conf).hostLookupOrder 269 -> 267  (-0.74%)
net.(*conf).addrLookupOrder 261 -> 255  (-2.30%)

cmd/internal/obj/loong64
cmd/internal/obj/loong64.(*ctxt0).oplook 1829 -> 1813  (-0.87%)

cmd/internal/obj/mips
cmd/internal/obj/mips.(*ctxt0).oplook 1428 -> 1400  (-1.96%)

go/types
go/types.(*typeWriter).signature 605 -> 601  (-0.66%)
go/types.(*Checker).instantiateSignature 1469 -> 1467  (-0.14%)

go/parser [cmd/compile]
go/parser.(*parser).parseSwitchStmt 1458 -> 1447  (-0.75%)
go/parser.(*parser).parseDotsType 393 -> 391  (-0.51%)
go/parser.(*parser).embeddedElem 633 -> 626  (-1.11%)
go/parser.(*parser).parseTypeAssertion 559 -> 557  (-0.36%)
go/parser.(*parser).parseCommClause 1509 -> 1501  (-0.53%)
go/parser.(*parser).parseCaseClause 629 -> 626  (-0.48%)
go/parser.(*parser).parseImportSpec 1110 -> 1108  (-0.18%)
go/parser.(*parser).parseTypeSpec 1054 -> 1048  (-0.57%)
go/parser.(*parser).parseElementList 594 -> 593  (-0.17%)
go/parser.(*parser).parseParamDecl 1478 -> 1451  (-1.83%)
go/parser.(*parser).parseType 463 -> 457  (-1.30%)
go/parser.(*parser).parseSimpleStmt 2443 -> 2388  (-2.25%)
go/parser.(*parser).parseIdentList 625 -> 623  (-0.32%)
go/parser.(*parser).parseTypeInstance 1433 -> 1420  (-0.91%)
go/parser.(*parser).parseResult 593 -> 583  (-1.69%)
go/parser.(*parser).parseValue 375 -> 371  (-1.07%)
go/parser.(*parser).parseFuncDecl 917 -> 914  (-0.33%)
go/parser.(*parser).error 1061 -> 1054  (-0.66%)
go/parser.(*parser).parseElement 506 -> 504  (-0.40%)
go/parser.(*parser).parseFuncType 465 -> 463  (-0.43%)
go/parser.(*parser).parsePointerType 393 -> 391  (-0.51%)
go/parser.(*parser).parseTypeName 537 -> 532  (-0.93%)
go/parser.(*parser).parseExpr 307 -> 305  (-0.65%)
go/parser.(*parser).parseFuncTypeOrLit 498 -> 495  (-0.60%)
go/parser.(*parser).parseStructType 741 -> 735  (-0.81%)
go/parser.(*parser).parseOperand 1221 -> 1199  (-1.80%)
go/parser.(*parser).parseIndexOrSliceOrInstance 2713 -> 2694  (-0.70%)
go/parser.(*parser).parseForStmt 2168 -> 2159  (-0.42%)
go/parser.(*parser).parseParameterList 4212 -> 4189  (-0.55%)
go/parser.(*parser).parseArrayFieldOrTypeInstance 1865 -> 1861  (-0.21%)
go/parser.(*parser).parseSelector 427 -> 425  (-0.47%)
go/parser.(*parser).parseCallOrConversion 1093 -> 1087  (-0.55%)
go/parser.(*parser).embeddedTerm 657 -> 649  (-1.22%)

crypto/tls
crypto/tls.(*Conn).clientHandshake 3430 -> 3421  (-0.26%)

cmd/internal/obj/mips [cmd/compile]
cmd/internal/obj/mips.(*ctxt0).oplook 1428 -> 1400  (-1.96%)

cmd/internal/obj/loong64 [cmd/compile]
cmd/internal/obj/loong64.(*ctxt0).oplook 1829 -> 1813  (-0.87%)

cmd/compile/internal/types2
cmd/compile/internal/types2.(*typeWriter).signature 605 -> 601  (-0.66%)
cmd/compile/internal/types2.(*Checker).infer 10646 -> 10614  (-0.30%)
cmd/compile/internal/types2.(*Checker).instantiateSignature 1567 -> 1561  (-0.38%)

cmd/compile/internal/types2 [cmd/compile]
cmd/compile/internal/types2.(*Checker).instantiateSignature 1567 -> 1561  (-0.38%)
cmd/compile/internal/types2.(*typeWriter).signature 605 -> 601  (-0.66%)
cmd/compile/internal/types2.(*Checker).infer 10718 -> 10654  (-0.60%)

cmd/vendor/golang.org/x/arch/s390x/s390xasm
cmd/vendor/golang.org/x/arch/s390x/s390xasm.GoSyntax 36778 -> 36682  (-0.26%)

net/http
net/http.(*Client).do 4202 -> 4170  (-0.76%)
net/http.(*http2clientStream).writeRequest 3692 -> 3686  (-0.16%)

cmd/vendor/github.com/ianlancetaylor/demangle
cmd/vendor/github.com/ianlancetaylor/demangle.(*rustState).genericArgs 466 -> 463  (-0.64%)

cmd/compile/internal/devirtualize
cmd/compile/internal/devirtualize.ProfileGuided.func1 1364 -> 1357  (-0.51%)

cmd/compile/internal/inline/interleaved
cmd/compile/internal/inline/interleaved.DevirtualizeAndInlinePackage.func2 533 -> 526  (-1.31%)

cmd/compile/internal/devirtualize [cmd/compile]
cmd/compile/internal/devirtualize.ProfileGuided.func1 1343 -> 1332  (-0.82%)

cmd/compile/internal/inline/interleaved [cmd/compile]
cmd/compile/internal/inline/interleaved.DevirtualizeAndInlinePackage.func2 533 -> 526  (-1.31%)

cmd/link/internal/ld
cmd/link/internal/ld.mustLinkExternal 2739 -> 2674  (-2.37%)

cmd/compile/internal/ssa
cmd/compile/internal/ssa.(*poset).Ordered 391 -> 389  (-0.51%)
cmd/compile/internal/ssa.(*poset).Equal 318 -> 313  (-1.57%)
cmd/compile/internal/ssa.(*poset).Undo 1842 -> 1832  (-0.54%)
cmd/compile/internal/ssa.(*expandState).decomposeAsNecessary 4587 -> 4555  (-0.70%)
cmd/compile/internal/ssa.(*poset).OrderedOrEqual 390 -> 389  (-0.26%)
cmd/compile/internal/ssa.(*poset).NonEqual 613 -> 606  (-1.14%)

cmd/compile/internal/ssa [cmd/compile]
cmd/compile/internal/ssa.(*poset).OrderedOrEqual 368 -> 365  (-0.82%)
cmd/compile/internal/ssa.(*poset).Equal 318 -> 313  (-1.57%)
cmd/compile/internal/ssa.(*expandState).decomposeAsNecessary 4952 -> 4938  (-0.28%)
cmd/compile/internal/ssa.(*poset).NonEqual 613 -> 606  (-1.14%)
cmd/compile/internal/ssa.(*poset).SetEqual 2533 -> 2505  (-1.11%)
cmd/compile/internal/ssa.(*poset).SetNonEqual 785 -> 777  (-1.02%)
cmd/compile/internal/ssa.(*poset).Ordered 370 -> 366  (-1.08%)

cmd/compile/internal/gc [cmd/compile]
cmd/compile/internal/gc.Main.DevirtualizeAndInlinePackage.func2 492 -> 489  (-0.61%)

file                                                    before   after    Δ       %
crypto/internal/sysrand.s                               1553     1545     -8      -0.515%
internal/zstd.s                                         49179    49190    +11     +0.022%
testing.s                                               115197   115188   -9      -0.008%
image/png.s                                             36109    36107    -2      -0.006%
vendor/golang.org/x/crypto/cryptobyte.s                 30980    30978    -2      -0.006%
crypto/internal/sysrand [cmd/compile].s                 1553     1545     -8      -0.515%
go/parser.s                                             112638   112354   -284    -0.252%
cmd/internal/objfile.s                                  49994    49969    -25     -0.050%
net.s                                                   299558   299546   -12     -0.004%
cmd/internal/obj/loong64.s                              71651    71635    -16     -0.022%
cmd/internal/obj/mips.s                                 59681    59653    -28     -0.047%
go/types.s                                              558839   558833   -6      -0.001%
cmd/compile/internal/types.s                            71305    71306    +1      +0.001%
go/parser [cmd/compile].s                               112749   112465   -284    -0.252%
crypto/tls.s                                            388859   388850   -9      -0.002%
cmd/internal/obj/mips [cmd/compile].s                   59792    59764    -28     -0.047%
cmd/internal/obj/loong64 [cmd/compile].s                71762    71746    -16     -0.022%
cmd/compile/internal/types2.s                           540608   540566   -42     -0.008%
cmd/compile/internal/types2 [cmd/compile].s             577428   577354   -74     -0.013%
cmd/vendor/golang.org/x/arch/s390x/s390xasm.s           267664   267568   -96     -0.036%
net/http.s                                              620704   620666   -38     -0.006%
cmd/vendor/github.com/ianlancetaylor/demangle.s         299991   299988   -3      -0.001%
cmd/compile/internal/devirtualize.s                     21452    21445    -7      -0.033%
cmd/compile/internal/inline/interleaved.s               8358     8351     -7      -0.084%
cmd/compile/internal/devirtualize [cmd/compile].s       20994    20983    -11     -0.052%
cmd/compile/internal/inline/interleaved [cmd/compile].s 8328     8321     -7      -0.084%
cmd/link/internal/ld.s                                  641802   641737   -65     -0.010%
cmd/compile/internal/ssa.s                              3552939  3552957  +18     +0.001%
cmd/compile/internal/ssa [cmd/compile].s                3752191  3752197  +6      +0.000%
cmd/compile/internal/ssagen.s                           405780   405786   +6      +0.001%
cmd/compile/internal/ssagen [cmd/compile].s             434472   434496   +24     +0.006%
cmd/compile/internal/gc [cmd/compile].s                 38499    38496    -3      -0.008%
total                                                   36185267 36184243 -1024   -0.003%

Change-Id: I867222b0f907b29d32b2676e55c6b5789ec56511
Reviewed-on: https://go-review.googlesource.com/c/go/+/642716
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-02-05 09:04:25 -08:00
Youlin Feng 0825475599 cmd/compile: do not treat OpLocalAddr as load in DSE
Fixes #70409
Fixes #47107

Change-Id: I82a66c46f6b76c68e156b5d937273b0316975d44
Reviewed-on: https://go-review.googlesource.com/c/go/+/629016
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2025-02-04 12:52:01 -08:00
Ian Lance Taylor 82337de9f2 test/issue71226: add cast to avoid clang error
Change-Id: I2d8ecb7b5f48943697d454d09947fdb1817809d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/646295
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
2025-02-03 10:14:38 -08:00
Jakub Ciolek e57769d5ad cmd/compile: on AMD64, prefer XOR/AND for (x & 1) == 0 check
It's shorter to encode. Additionally, XOR and AND generally
have higher throughput than BT/SET*.

compilecmp:

runtime
runtime.(*sweepClass).split 58 -> 56  (-3.45%)
runtime.sweepClass.split 14 -> 11  (-21.43%)

runtime [cmd/compile]
runtime.(*sweepClass).split 58 -> 56  (-3.45%)
runtime.sweepClass.split 14 -> 11  (-21.43%)

strconv
strconv.ryuFtoaShortest changed

strconv [cmd/compile]
strconv.ryuFtoaShortest changed

math/big
math/big.(*Int).MulRange 255 -> 252  (-1.18%)

testing/quick
testing/quick.sizedValue changed

internal/fuzz
internal/fuzz.(*pcgRand).bool 69 -> 70  (+1.45%)

cmd/internal/obj/x86
cmd/internal/obj/x86.(*AsmBuf).asmevex changed

math/big [cmd/compile]
math/big.(*Int).MulRange 255 -> 252  (-1.18%)

cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.(*AsmBuf).asmevex changed

net/http
net/http.(*http2stream).isPushed 11 -> 10  (-9.09%)

cmd/vendor/github.com/google/pprof/internal/binutils
cmd/vendor/github.com/google/pprof/internal/binutils.(*file).computeBase changed

Change-Id: I9cb2987eb263c85ee4e93d6f8455c91a55273173
Reviewed-on: https://go-review.googlesource.com/c/go/+/640975
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-03 08:42:01 -08:00
Ian Lance Taylor 6adf54a3eb cmd/cgo: declare _GoString{Len,Ptr} in _cgo_export.h
Fixes #71226

Change-Id: I91c46a4310a9c7a9fcd1e3a131ca16e46949edb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/642235
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2025-02-03 08:24:48 -08:00
Ian Lance Taylor bf351677c4 cmd/cgo: add C declaration parameter unused attribute
Fixes #71225

Change-Id: I3e60fdf632f2aa0e63b24225f13e4ace49906925
Reviewed-on: https://go-review.googlesource.com/c/go/+/642196
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2025-02-03 08:24:47 -08:00
Michael Pratt 78e6f2a1c8 runtime: rename mapiterinit and mapiternext
mapiterinit allows external linkname. These users must allocate their
own iter struct for initialization by mapiterinit. Since the type is
unexported, they also must define the struct themselves. As a result,
they of course define the struct matching the old hiter definition (in
map_noswiss.go).

The old definition is smaller on 32-bit platforms. On those platforms,
mapiternext will clobber memory outside of the caller's allocation.

On all platforms, the pointer layout between the old hiter and new
maps.Iter does not match. Thus the GC may miss pointers and free
reachable objects early, or it may see non-pointers that look like heap
pointers and throw due to invalid references to free objects.

To avoid these issues, we must keep mapiterinit and mapiternext with the
old hiter definition. The most straightforward way to do this is to use
mapiterinit and mapiternext as a compatibility layer between the old and
new iter types.

The first step to that is to move normal map use off of these functions,
which is what this CL does.

Introduce new mapIterStart and mapIterNext functions that replace the
former functions everywhere in the toolchain. These have the same
behavior as the old functions.

This CL temporarily makes the old functions throw to ensure we don't
have hidden dependencies on them. We cannot remove them entirely because
GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj
requires all builtins to exist regardless of GOEXPERIMENT. The next CL
will introduce the compatibility layer.

I want to avoid using linkname between runtime and reflect, as that
would also allow external linknames. So mapIterStart and mapIterNext are
duplicated in reflect, which can be done trivially, as it imports
internal/runtime/maps.

For #71408.

Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669
Reviewed-on: https://go-review.googlesource.com/c/go/+/643898
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-01-28 10:54:43 -08:00
Keith Randall c5e205e928 internal/runtime/maps: re-enable some tests
Re-enable tests for stack-allocated maps and fast map accessors.
Those are implemented now.

Update #54766

Change-Id: I8c019702bd9fb077b2fe3f7c78e8e9e10d2263a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/642376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-01-14 09:55:06 -08:00
Keith Randall 44a6f817ea cmd/compile: fix write barrier coalescing
We can't coalesce a non-WB store with a subsequent Move, as the
result of the store might be the source of the move.

There's a simple codegen test. Not sure how we might do a real test,
as all the repro's I've come up with are very expensive and unreliable.

Fixes #71228

Change-Id: If18bf181a266b9b90964e2591cd2e61a7168371c
Reviewed-on: https://go-review.googlesource.com/c/go/+/642197
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-01-12 22:49:39 -08:00
Damien Neil 4da905bcf0 cmd/internal/objabi, internal/runtime: increase nosplit limit on OpenBSD
OpenBSD is bumping up against the nosplit limit, and openbsd/ppc64
is over it. Increase StackGuardMultiplier on OpenBSD, matching AIX.

Change-Id: I61e17c99ce77e1fd3f368159dc4615aeae99e913
Reviewed-on: https://go-review.googlesource.com/c/go/+/632996
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-12-06 00:35:41 +00:00
Jorropo d241ea8d5c sync/atomic: add missing leak tests for And & Or
Theses tests were forgot because when CL 462298 was originally written
And & Or atomics were not available in go.
Git were smart enough to rebase over And's & Or's addition.
After most reviews and before merging it were pointed I should
make theses new intrinsics noescape.
When doing this last minute addition I forgot to add tests.

Change-Id: I457f98315c0aee91d5743058ab76f256856cb782
Reviewed-on: https://go-review.googlesource.com/c/go/+/633416
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-12-04 17:30:39 +00:00
David Chase d524c954b1 cmd/compile: use very high budget for once-called closures
This should make it much more likely that rangefunc
iterators become "plain inline code".

Change-Id: I8026603afdc5249f60cc663c4bc15cb1d26d1c83
Reviewed-on: https://go-review.googlesource.com/c/go/+/630696
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-22 02:04:41 +00:00
Youlin Feng c4e6ab9750 cmd/compile: modify CSE to remove redundant OpLocalAddrs
Remove the OpLocalAddrs that are unnecessary in the CSE pass, so the
following passes like DSE and memcombine can do its work better.

Fixes #70300

Change-Id: I600025d49eeadb3ca4f092d614428399750f69bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/628075
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-11-22 00:12:03 +00:00
Keith Randall f0b0109242 cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
This came up in some swissmap code.

Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83
Reviewed-on: https://go-review.googlesource.com/c/go/+/629516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-11-21 22:57:04 +00:00
Keith Randall 0a0a7a5642 cmd/compile: fix rewrite rules for multiply/add
x - (y - c) == (x - y) + c, not (x - y) - c. Oops.

Fixes #70481

Change-Id: I0e54d8e65dd9843c6b92c543ac69d69ee21f617c
Reviewed-on: https://go-review.googlesource.com/c/go/+/630397
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Auto-Submit: Keith Randall <khr@golang.org>
2024-11-21 15:06:30 +00:00
Junyang Shao 558f5372fc cmd/compile,testing: implement one-time rampup logic for testing.B.Loop
testing.B.Loop now does its own loop scheduling without interaction with b.N.
b.N will be updated to the actual iterations b.Loop controls when b.Loop returns false.

This CL also added tests for fixed iteration count (benchtime=100x case).

This CL also ensured that b.Loop() is inlined.

For #61515

Change-Id: Ia15f4462f4830ef4ec51327520ff59910eb4bb58
Reviewed-on: https://go-review.googlesource.com/c/go/+/627755
Reviewed-by: Michael Pratt <mpratt@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-20 23:19:48 +00:00
Filippo Valsorda 566cf1c108 crypto/ecdh: move implementation to crypto/internal/fips/ecdh
This intentionally gives up on the property of not computing the public
key until requested. It was nice, but it was making the code too
complex. The average use case is to call PublicKey immediately after
GenerateKey anyway.

Added support in the module for P-224, just in case we'd ever want to
support it in crypto/ecdh.

Tried various ways to fix test/fixedbugs/issue52193.go to be meaningful,
but crypto/ecdh is pretty complex and all the solutions would end up
locking in crypto/ecdh structure rather than compiler behavior. The rest
of that test is good enough on its own anyway. If we do the work in the
future of making crypto/ecdh zero-allocations using the affordances of
the compiler, we can add a more robust TestAllocations on our side.

For #69536

Change-Id: I68ac3955180cb31f6f96a0ef57604aaed88ab311
Reviewed-on: https://go-review.googlesource.com/c/go/+/628315
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-19 23:01:37 +00:00
David Chase 170436c045 cmd/compile: strongly favor closure inlining
This tweaks the inlining cost knob for closures
specifically, they receive a doubled budget.  The
rationale for this is that closures have a lot of
"crud" in their IR that will disappear after inlining,
so the standard budget penalizes them unnecessarily.

This is also the cause of these bugs -- looking at the
code involved, these closures "should" be inlineable,
therefore tweak the parameters until behavior matches
expectations.  It's not costly in binary size, because
the only-called-from-one-site case is common (especially
for rangefunc iterators).

I can imagine better fixes and I am going to try to
get that done, but this one is small and makes things
better.

Fixes #69411, #69539.

Change-Id: I8a892c40323173a723799e0ddad69dcc2724a8f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/629195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-19 00:04:51 +00:00
Michael Anthony Knyszek 53b2b64b64 sync: add explicit noCopy fields to Map, Mutex, and Once
Following CLs will refactor Mutex and change the internals of Map. This
ends up breaking tests in x/tools for the copylock vet check, because
the error message changes. Let's insulate ourselves from such things
permanently by adding an explicit noCopy field. We'll update the vet
check to accept that as the problem, rather than depend on less explicit
internals.

We capture Once here too to clean up the error message as well.

Change-Id: Iead985fc8ec9ef3ea5ff615f26dde17bb03aeadb
Reviewed-on: https://go-review.googlesource.com/c/go/+/627777
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Tim King <taking@google.com>
2024-11-18 18:52:54 +00:00
Keith Randall 5a0f2a7a7c cmd/compile: remove gc programs from stack frame objects
This is a two-pronged approach. First, try to keep large objects
off the stack frame. Second, if they do manage to appear anyway,
use straight bitmasks instead of gc programs.

Generally probably a good idea to keep large objects out of stack frames.
But particularly keeping gc programs off the stack simplifies
runtime code a bit.

This CL sets the limit of most stack objects to 131072 bytes (on 64-bit archs).
There can still be large objects if allocated by a late pass, like order, or
they are required to be on the stack, like function arguments.
But the size for the bitmasks for these objects isn't a huge deal,
as we have already have (probably several) bitmasks for the frame
liveness map itself.

Change-Id: I6d2bed0e9aa9ac7499955562c6154f9264061359
Reviewed-on: https://go-review.googlesource.com/c/go/+/542815
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-18 18:43:25 +00:00
Jorropo e30ce3c498 sync/atomic: make intrinsics noescape except 64bits op on 32bits arch and unsafe.Pointer
Fixes #16241

I made 64 bits op on 32 bits arches still leak since it was kinda promised.

The promised leaks were wider than this but I don't belive it's effect can
be observed in an breaking maner without using unsafe the way it's currently
setup.

Change-Id: I66d8df47bfe49bce3efa64ac668a2a55f70733a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/462298
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-11-15 17:11:46 +00:00
Robert Griesemer 2eac154b1c cmd/compile: better error message when offending/missing token is a keyword
Prefix keywords (type, default, case, etc.) with "keyword" in error
messages to make them less ambiguous.

Fixes #68589.

Change-Id: I1eb92d1382f621b934167b3a4c335045da26be9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/623819
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
2024-11-14 02:14:13 +00:00
Xiaolin Zhao ab55465098 cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     1.7240n ± 0%   0.8120n ± 0%  -52.90% (p=0.000 n=20)
TrailingZeros8    1.0530n ± 0%   0.8015n ± 0%  -23.88% (p=0.000 n=20)
TrailingZeros16    2.072n ± 0%    1.015n ± 0%  -51.01% (p=0.000 n=20)
TrailingZeros32   1.7160n ± 0%   0.8122n ± 0%  -52.67% (p=0.000 n=20)
TrailingZeros64   2.0060n ± 0%   0.8125n ± 0%  -59.50% (p=0.000 n=20)
geomean            1.669n        0.8470n       -49.25%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     2.6275n ± 0%   0.9120n ± 0%  -65.29% (p=0.000 n=20)
TrailingZeros8     1.451n ± 0%    1.163n ± 0%  -19.85% (p=0.000 n=20)
TrailingZeros16    3.069n ± 0%    1.201n ± 0%  -60.87% (p=0.000 n=20)
TrailingZeros32   2.9060n ± 0%   0.9115n ± 0%  -68.63% (p=0.000 n=20)
TrailingZeros64   2.6305n ± 0%   0.9115n ± 0%  -65.35% (p=0.000 n=20)
geomean            2.456n         1.011n       -58.83%

This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-11-13 00:57:25 +00:00
Youlin Feng 1f8fa4941f runtime: fix iterator returns map entries after clear (pre-swissmap)
Fixes #70189
Fixes #59411

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-noswissmap
Change-Id: I4ef7ecd7e996330189309cb2a658cf34bf9e1119
Reviewed-on: https://go-review.googlesource.com/c/go/+/625275
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-12 21:08:08 +00:00
Paul E. Murphy 745ec75719 cmd/compile/internal/ssa: improve carry addition rules on PPC64
Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.

Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-11-12 17:40:44 +00:00
Guoqi Chen fb9b946adc cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.

Benchmark results on loongson 3A5000 and 3A6000 machines:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base               |
OnesCount      4.413n ± 0%     1.401n ± 0%   -68.25% (p=0.000 n=10)
OnesCount8     1.364n ± 0%     1.363n ± 0%         ~ (p=0.130 n=10)
OnesCount16    2.112n ± 0%     1.534n ± 0%   -27.37% (p=0.000 n=10)
OnesCount32    4.533n ± 0%     1.529n ± 0%   -66.27% (p=0.000 n=10)
OnesCount64    4.565n ± 0%     1.531n ± 1%   -66.46% (p=0.000 n=10)
geomean        3.048n          1.470n        -51.78%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base              |
OnesCount       3.553n ± 0%     1.201n ± 0%  -66.20% (p=0.000 n=10)
OnesCount8     0.8021n ± 0%    0.8004n ± 0%   -0.21% (p=0.000 n=10)
OnesCount16     1.216n ± 0%     1.000n ± 0%  -17.76% (p=0.000 n=10)
OnesCount32     3.006n ± 0%     1.035n ± 0%  -65.57% (p=0.000 n=10)
OnesCount64     3.503n ± 0%     1.035n ± 0%  -70.45% (p=0.000 n=10)
geomean         2.053n          1.006n       -51.01%

Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-12 00:48:04 +00:00
sunnymilk fe2da30cb5 cmd/compile: keep variables alive in testing.B.Loop loops
For the loop body guarded by testing.B.Loop, we disable function inlining and devirtualization inside. The only legal form to be matched is `for b.Loop() {...}`.

For #61515

Change-Id: I2e226f08cb4614667cbded498a7821dffe3f72d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/612043
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Junyang Shao <shaojunyang@google.com>
Commit-Queue: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-11 21:52:00 +00:00
Cherry Mui 5a9aeef9d5 cmd/compile: allow more types for wasmimport/wasmexport parameters and results
As proposed on #66984, this CL allows more types to be used as
wasmimport/wasmexport function parameters and results.
Specifically, bool, string, and uintptr are now allowed, and also
pointer types that point to allowed element types. Allowed element
types includes sized integer and floating point types (including
small integer types like uint8 which are not directly allowed as
a parameter type), bool, array whose element type is allowed, and
struct whose fields are allowed element type and also include a
struct.HostLayout field.

For #66984.

Change-Id: Ie5452a1eda21c089780dfb4d4246de6008655c84
Reviewed-on: https://go-review.googlesource.com/c/go/+/626615
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-11 17:11:50 +00:00
Xiaolin Zhao 583d750fa1 cmd/compile: wire up bits.Reverse intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     2.8130n ± 0%   0.8008n ± 0%  -71.53% (p=0.000 n=20)
Reverse8    0.7014n ± 0%   0.4040n ± 0%  -42.40% (p=0.000 n=20)
Reverse16   1.2975n ± 0%   0.6632n ± 1%  -48.89% (p=0.000 n=20)
Reverse32   2.7520n ± 0%   0.4042n ± 0%  -85.31% (p=0.000 n=20)
Reverse64   2.8970n ± 0%   0.4041n ± 0%  -86.05% (p=0.000 n=20)
geomean      1.828n        0.5116n       -72.01%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     4.0050n ± 0%   0.8011n ± 0%  -80.00% (p=0.000 n=20)
Reverse8    0.8010n ± 0%   0.5210n ± 1%  -34.96% (p=0.000 n=20)
Reverse16   1.6160n ± 0%   0.6008n ± 0%  -62.82% (p=0.000 n=20)
Reverse32   3.8550n ± 0%   0.5179n ± 0%  -86.57% (p=0.000 n=20)
Reverse64   3.8050n ± 0%   0.5177n ± 0%  -86.40% (p=0.000 n=20)
geomean      2.378n        0.5828n       -75.49%

Updates #59120

This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-11-11 00:08:45 +00:00
Xiaolin Zhao e6cc9d228a cmd/compile: implement FMA codegen for loong64
Benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
    |  bench.old   |              bench.new              |
    |    sec/op    |   sec/op     vs base                |
FMA   25.930n ± 0%   2.002n ± 0%  -92.28% (p=0.000 n=10)

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
    |  bench.old   |              bench.new              |
    |    sec/op    |   sec/op     vs base                |
FMA   32.840n ± 0%   2.002n ± 0%  -93.90% (p=0.000 n=10)

Updates #59120

This patch is a copy of CL 483355.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: I88b89d23f00864f9173a182a47ee135afec7ed6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/625335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-08 01:05:48 +00:00
Xiaolin Zhao d6fb0ab2c7 cmd/compile: wire up Bswap/ReverseBytes intrinsics for loong64
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
ReverseBytes     2.0020n ± 0%   0.4040n ± 0%  -79.82% (p=0.000 n=20)
ReverseBytes16   0.8866n ± 1%   0.8007n ± 0%   -9.69% (p=0.000 n=20)
ReverseBytes32   1.2195n ± 0%   0.8007n ± 0%  -34.34% (p=0.000 n=20)
ReverseBytes64   2.0705n ± 0%   0.8008n ± 0%  -61.32% (p=0.000 n=20)
geomean           1.455n        0.6749n       -53.62%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
ReverseBytes     2.8040n ± 0%   0.5205n ± 0%  -81.44% (p=0.000 n=20)
ReverseBytes16   0.7066n ± 0%   0.8011n ± 0%  +13.37% (p=0.000 n=20)
ReverseBytes32   1.5500n ± 0%   0.8010n ± 0%  -48.32% (p=0.000 n=20)
ReverseBytes64   2.7665n ± 0%   0.8010n ± 0%  -71.05% (p=0.000 n=20)
geomean           1.707n        0.7192n       -57.87%

Updates #59120

This patch is a copy of CL 483357.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: If355354cd031533df91991fcc3392e5a6c314295
Reviewed-on: https://go-review.googlesource.com/c/go/+/624576
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-06 03:12:50 +00:00
Xiaolin Zhao d98c51809d cmd/compile: wire up math/bits.Len intrinsics for loong64
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).

Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.

Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
               |  bench.old  |              bench.new              |
               |   sec/op    |   sec/op     vs base                |
LeadingZeros     3.660n ± 0%   1.348n ± 0%  -63.17% (p=0.000 n=20)
LeadingZeros8    1.777n ± 0%   1.767n ± 0%   -0.56% (p=0.000 n=20)
LeadingZeros16   2.816n ± 0%   1.770n ± 0%  -37.14% (p=0.000 n=20)
LeadingZeros32   5.293n ± 1%   1.683n ± 0%  -68.21% (p=0.000 n=20)
LeadingZeros64   3.622n ± 0%   1.349n ± 0%  -62.76% (p=0.000 n=20)
geomean          3.229n        1.571n       -51.35%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
               |  bench.old   |              bench.new               |
               |    sec/op    |    sec/op     vs base                |
LeadingZeros      2.410n ± 0%    1.103n ± 1%  -54.23% (p=0.000 n=20)
LeadingZeros8     1.236n ± 0%    1.501n ± 0%  +21.44% (p=0.000 n=20)
LeadingZeros16    2.106n ± 0%    1.501n ± 0%  -28.73% (p=0.000 n=20)
LeadingZeros32    2.860n ± 0%    1.324n ± 0%  -53.72% (p=0.000 n=20)
LeadingZeros64   2.6135n ± 0%   0.9509n ± 0%  -63.62% (p=0.000 n=20)
geomean           2.159n         1.256n       -41.81%

Updates #59120

This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>

Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-11-06 00:40:40 +00:00
Youlin Feng cb163ff60b cmd/compile: init limit for newly created value in prove pass
Fixes: #70156

Change-Id: I2e5dc2a39a8e54ec5f18c5f9d1644208cffb2e9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/624695
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-11-05 16:55:14 +00:00
Xiaolin Zhao 5f88755f43 cmd/compile: add loong64-specific inlining for runtime.memmove
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                                 |   bench.old   |               bench.new                |
                                 |    sec/op     |    sec/op     vs base                  |
Memmove/0                          0.8004n ±  0%   0.4002n ± 0%  -50.00% (p=0.000 n=20)
Memmove/1                           2.494n ±  0%    2.136n ± 0%  -14.35% (p=0.000 n=20)
Memmove/2                           2.802n ±  0%    2.512n ± 0%  -10.35% (p=0.000 n=20)
Memmove/3                           2.802n ±  0%    2.497n ± 0%  -10.92% (p=0.000 n=20)
Memmove/4                           3.202n ±  0%    2.808n ± 0%  -12.30% (p=0.000 n=20)
Memmove/5                           2.821n ±  0%    2.658n ± 0%   -5.76% (p=0.000 n=20)
Memmove/6                           2.819n ±  0%    2.657n ± 0%   -5.73% (p=0.000 n=20)
Memmove/7                           2.820n ±  0%    2.654n ± 0%   -5.87% (p=0.000 n=20)
Memmove/8                           3.202n ±  0%    2.814n ± 0%  -12.12% (p=0.000 n=20)
Memmove/9                           3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/10                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/11                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/12                          3.202n ±  0%    3.010n ± 0%   -6.01% (p=0.000 n=20)
Memmove/13                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/14                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/15                          3.202n ±  0%    3.010n ± 0%   -6.01% (p=0.000 n=20)
Memmove/16                          3.202n ±  0%    3.009n ± 0%   -6.03% (p=0.000 n=20)
Memmove/32                          3.602n ±  0%    3.603n ± 0%   +0.03% (p=0.000 n=20)
Memmove/64                          4.202n ±  0%    4.204n ± 0%   +0.05% (p=0.000 n=20)
Memmove/128                         8.005n ±  0%    8.007n ± 0%   +0.02% (p=0.000 n=20)
Memmove/256                         11.21n ±  0%    10.81n ± 0%   -3.57% (p=0.000 n=20)
Memmove/512                         17.65n ±  0%    17.96n ± 0%   +1.73% (p=0.000 n=20)
Memmove/1024                        30.48n ±  0%    30.46n ± 0%   -0.07% (p=0.000 n=20)
Memmove/2048                        56.43n ±  0%    56.30n ± 0%   -0.24% (p=0.000 n=20)
Memmove/4096                        107.7n ±  0%    107.6n ± 0%   -0.09% (p=0.000 n=20)
MemmoveOverlap/32                   4.002n ±  0%    4.003n ± 0%   +0.02% (p=0.002 n=20)
MemmoveOverlap/64                   4.603n ±  0%    4.603n ± 0%        ~ (p=0.286 n=20)
MemmoveOverlap/128                  8.704n ±  0%    8.699n ± 0%        ~ (p=0.180 n=20)
MemmoveOverlap/256                  12.01n ±  0%    11.76n ± 0%   -2.08% (p=0.000 n=20)
MemmoveOverlap/512                  18.42n ±  0%    18.36n ± 0%   -0.33% (p=0.000 n=20)
MemmoveOverlap/1024                 31.23n ±  0%    31.16n ± 0%   -0.21% (p=0.000 n=20)
MemmoveOverlap/2048                 57.42n ±  0%    56.82n ± 0%   -1.04% (p=0.000 n=20)
MemmoveOverlap/4096                 108.5n ±  0%    108.0n ± 0%   -0.46% (p=0.000 n=20)
MemmoveUnalignedDst/0               2.804n ±  0%    2.447n ± 0%  -12.70% (p=0.000 n=20)
MemmoveUnalignedDst/1               2.802n ±  0%    2.491n ± 0%  -11.12% (p=0.000 n=20)
MemmoveUnalignedDst/2               3.202n ±  0%    2.808n ± 0%  -12.29% (p=0.000 n=20)
MemmoveUnalignedDst/3               3.202n ±  0%    2.814n ± 0%  -12.12% (p=0.000 n=20)
MemmoveUnalignedDst/4               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/5               3.202n ±  0%    3.203n ± 0%   +0.03% (p=0.014 n=20)
MemmoveUnalignedDst/6               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/7               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/8               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedDst/9               3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/10              3.602n ±  0%    3.602n ± 0%        ~ (p=0.091 n=20)
MemmoveUnalignedDst/11              3.602n ±  0%    3.602n ± 0%        ~ (p=0.613 n=20)
MemmoveUnalignedDst/12              3.602n ±  0%    3.602n ± 0%        ~ (p=0.165 n=20)
MemmoveUnalignedDst/13              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/14              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/15              3.602n ±  0%    3.602n ± 0%    0.00% (p=0.027 n=20)
MemmoveUnalignedDst/16              3.602n ±  0%    3.602n ± 0%        ~ (p=0.661 n=20)
MemmoveUnalignedDst/32              4.002n ±  0%    4.002n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedDst/64              6.804n ±  0%    6.804n ± 0%        ~ (p=0.204 n=20)
MemmoveUnalignedDst/128             12.61n ±  0%    12.61n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedDst/256             16.33n ±  2%    16.32n ± 2%        ~ (p=0.839 n=20)
MemmoveUnalignedDst/512             25.61n ±  0%    24.71n ± 0%   -3.51% (p=0.000 n=20)
MemmoveUnalignedDst/1024            42.81n ±  0%    42.82n ± 0%        ~ (p=0.973 n=20)
MemmoveUnalignedDst/2048            74.86n ±  0%    76.03n ± 0%   +1.56% (p=0.000 n=20)
MemmoveUnalignedDst/4096            152.0n ± 11%    152.0n ± 0%    0.00% (p=0.013 n=20)
MemmoveUnalignedDstOverlap/32       5.319n ±  0%    5.558n ± 1%   +4.50% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/64       8.006n ±  0%    8.025n ± 0%   +0.24% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/128      9.631n ±  0%    9.601n ± 0%   -0.31% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/256      13.79n ±  2%    13.58n ± 1%        ~ (p=0.234 n=20)
MemmoveUnalignedDstOverlap/512      21.38n ±  0%    21.30n ± 0%   -0.37% (p=0.000 n=20)
MemmoveUnalignedDstOverlap/1024     41.71n ±  0%    41.70n ± 0%        ~ (p=0.887 n=20)
MemmoveUnalignedDstOverlap/2048     81.63n ±  0%    81.61n ± 0%        ~ (p=0.481 n=20)
MemmoveUnalignedDstOverlap/4096     162.6n ±  0%    162.6n ± 0%        ~ (p=0.171 n=20)
MemmoveUnalignedSrc/0               2.808n ±  0%    2.482n ± 0%  -11.61% (p=0.000 n=20)
MemmoveUnalignedSrc/1               2.804n ±  0%    2.577n ± 0%   -8.08% (p=0.000 n=20)
MemmoveUnalignedSrc/2               3.202n ±  0%    2.806n ± 0%  -12.37% (p=0.000 n=20)
MemmoveUnalignedSrc/3               3.202n ±  0%    2.808n ± 0%  -12.30% (p=0.000 n=20)
MemmoveUnalignedSrc/4               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/5               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/6               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/7               3.202n ±  0%    3.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/8               3.602n ±  0%    3.202n ± 0%  -11.10% (p=0.000 n=20)
MemmoveUnalignedSrc/9               3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/10              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/11              3.602n ±  0%    3.602n ± 0%        ~ (p=0.746 n=20)
MemmoveUnalignedSrc/12              3.602n ±  0%    3.602n ± 0%        ~ (p=0.407 n=20)
MemmoveUnalignedSrc/13              3.603n ±  0%    3.602n ± 0%   -0.03% (p=0.001 n=20)
MemmoveUnalignedSrc/14              3.603n ±  0%    3.602n ± 0%   -0.01% (p=0.013 n=20)
MemmoveUnalignedSrc/15              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/16              3.602n ±  0%    3.602n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/32              4.002n ±  0%    4.002n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrc/64              4.803n ±  0%    4.803n ± 0%    0.00% (p=0.008 n=20)
MemmoveUnalignedSrc/128             8.405n ±  0%    8.405n ± 0%    0.00% (p=0.003 n=20)
MemmoveUnalignedSrc/256             12.04n ±  3%    12.20n ± 2%        ~ (p=0.151 n=20)
MemmoveUnalignedSrc/512             19.11n ±  0%    19.10n ± 3%        ~ (p=0.621 n=20)
MemmoveUnalignedSrc/1024            35.62n ±  0%    35.62n ± 0%        ~ (p=0.407 n=20)
MemmoveUnalignedSrc/2048            68.04n ±  0%    68.35n ± 0%   +0.46% (p=0.000 n=20)
MemmoveUnalignedSrc/4096            133.2n ±  1%    133.3n ± 0%        ~ (p=0.131 n=20)
MemmoveUnalignedSrcDst/f_16_0       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_0       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_1       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_1       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_16_4       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_4       4.202n ±  0%    4.202n ± 0%        ~ (p=0.661 n=20)
MemmoveUnalignedSrcDst/f_16_7       4.202n ±  0%    4.202n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_16_7       4.203n ±  0%    4.202n ± 0%   -0.02% (p=0.008 n=20)
MemmoveUnalignedSrcDst/f_64_0       6.103n ±  0%    6.100n ± 0%        ~ (p=0.595 n=20)
MemmoveUnalignedSrcDst/b_64_0       6.103n ±  0%    6.102n ± 0%        ~ (p=0.973 n=20)
MemmoveUnalignedSrcDst/f_64_1       7.419n ±  0%    7.226n ± 0%   -2.59% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_1       6.745n ±  0%    6.941n ± 0%   +2.89% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_4       7.420n ±  0%    7.223n ± 0%   -2.65% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_4       6.753n ±  0%    6.941n ± 0%   +2.79% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_64_7       7.423n ±  0%    7.204n ± 0%   -2.96% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_64_7       6.750n ±  0%    6.941n ± 0%   +2.83% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_0      12.96n ±  0%    12.99n ± 0%   +0.27% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_256_0      12.91n ±  0%    12.94n ± 0%   +0.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_256_1      17.21n ±  0%    17.21n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_1      17.61n ±  0%    17.61n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_4      16.21n ±  0%    16.21n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_256_4      16.41n ±  0%    16.41n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_256_7      14.12n ±  0%    14.10n ± 0%        ~ (p=0.307 n=20)
MemmoveUnalignedSrcDst/b_256_7      14.81n ±  0%    14.81n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_0     109.3n ±  0%    109.4n ± 0%   +0.09% (p=0.004 n=20)
MemmoveUnalignedSrcDst/b_4096_0     109.6n ±  0%    109.6n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/f_4096_1     113.5n ±  0%    113.5n ± 0%        ~ (p=1.000 n=20)
MemmoveUnalignedSrcDst/b_4096_1     113.7n ±  0%    113.7n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_4096_4     112.3n ±  0%    112.3n ± 0%        ~ (p=0.763 n=20)
MemmoveUnalignedSrcDst/b_4096_4     112.6n ±  0%    112.9n ± 1%   +0.31% (p=0.032 n=20)
MemmoveUnalignedSrcDst/f_4096_7     110.6n ±  0%    110.6n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/b_4096_7     111.1n ±  0%    111.1n ± 0%        ~ (p=1.000 n=20) ¹
MemmoveUnalignedSrcDst/f_65536_0    4.801µ ±  0%    4.818µ ± 0%   +0.34% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_0    5.027µ ±  0%    5.036µ ± 0%   +0.19% (p=0.007 n=20)
MemmoveUnalignedSrcDst/f_65536_1    4.815µ ±  0%    4.729µ ± 0%   -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_1    4.659µ ±  0%    4.737µ ± 1%   +1.69% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_4    4.807µ ±  0%    4.721µ ± 0%   -1.78% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_4    4.659µ ±  0%    4.601µ ± 0%   -1.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/f_65536_7    4.868µ ±  0%    4.759µ ± 0%   -2.23% (p=0.000 n=20)
MemmoveUnalignedSrcDst/b_65536_7    4.665µ ±  0%    4.709µ ± 0%   +0.93% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/32       6.804n ±  0%    6.810n ± 0%   +0.09% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/64       10.41n ±  0%    10.42n ± 0%   +0.10% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/128      11.59n ±  0%    11.58n ± 0%        ~ (p=0.414 n=20)
MemmoveUnalignedSrcOverlap/256      14.22n ±  0%    14.29n ± 0%   +0.46% (p=0.000 n=20)
MemmoveUnalignedSrcOverlap/512      23.11n ±  0%    23.04n ± 0%   -0.28% (p=0.001 n=20)
MemmoveUnalignedSrcOverlap/1024     41.44n ±  0%    41.47n ± 0%        ~ (p=0.693 n=20)
MemmoveUnalignedSrcOverlap/2048     81.25n ±  0%    81.25n ± 0%        ~ (p=0.405 n=20)
MemmoveUnalignedSrcOverlap/4096     166.1n ±  0%    166.1n ± 0%        ~ (p=0.451 n=20)
geomean                             13.02n          12.69n        -2.51%
¹ all samples are equal

Change-Id: I712adc7670f6ae360714ec5a770d00d76c8700ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/618815
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-11-05 00:44:11 +00:00
Cuong Manh Le 324f41b748 cmd/compile: fix inlining name mangling for blank label
Fixes #70175

Change-Id: I13767d951455854b03ad6707ff9292cfe9097ee9
Reviewed-on: https://go-review.googlesource.com/c/go/+/624377
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2024-11-04 16:46:35 +00:00
Michael Pratt 63ba2b9d84 cmd/compile,internal/runtime/maps: stack allocated maps and small alloc
The compiler will stack allocate the Map struct and initial group if
possible.

Stack maps are initialized inline without calling into the runtime.
Small heap allocated maps use makemap_small.

These are the same heuristics as existing maps.

For #54766.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I6c371d1309716fd1c38a3212d417b3c76db5c9b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/622042
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-30 15:43:54 +00:00
Michael Pratt f782e16162 runtime,internal/runtime/maps: specialized swissmaps
Add all the specialized variants that exist for the existing maps.

Like the existing maps, the fast variants do not support indirect
key/elem.

Note that as of this CL, the Get and Put methods on Map/table are
effectively dead. They are only reachable from the internal/runtime/maps
unit tests.

For #54766.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I95297750be6200f34ec483e4cfc897f048c26db7
Reviewed-on: https://go-review.googlesource.com/c/go/+/616463
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-10-30 15:14:31 +00:00
Keith Randall 4dcbb00be2 cmd/compile: teach prove about min/max phi operations
If there is a phi that is computing the minimum of its two inputs,
then we know the result of the phi is smaller than or equal to both
of its inputs. Similarly for maxiumum (although max seems less useful).

This pattern happens for the case

  n := copy(a, b)

n is the minimum of len(a) and len(b), so with this optimization we
know both n <= len(a) and n <= len(b). That extra information is
helpful for subsequent slicing of a or b.

Fixes #16833

Change-Id: Ib4238fd1edae0f2940f62a5516a6b363bbe7928c
Reviewed-on: https://go-review.googlesource.com/c/go/+/622240
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2024-10-29 16:46:48 +00:00
Xiaolin Zhao aef81a7551 cmd/compile: add rules to optimize go codes to constant 0 on loong64
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
                      │  old.bench  │             new.bench              │
                      │   sec/op    │   sec/op     vs base               │
BinaryTree17             7.735 ± 1%    7.716 ± 1%  -0.23% (p=0.041 n=15)
Fannkuch11               2.645 ± 0%    2.646 ± 0%  +0.05% (p=0.013 n=15)
FmtFprintfEmpty         35.87n ± 0%   35.89n ± 0%  +0.06% (p=0.000 n=15)
FmtFprintfString        59.54n ± 0%   59.47n ± 0%       ~ (p=0.213 n=15)
FmtFprintfInt           62.23n ± 0%   62.06n ± 0%       ~ (p=0.212 n=15)
FmtFprintfIntInt        98.16n ± 0%   97.90n ± 0%  -0.26% (p=0.000 n=15)
FmtFprintfPrefixedInt   117.0n ± 0%   116.7n ± 0%  -0.26% (p=0.000 n=15)
FmtFprintfFloat         204.6n ± 0%   204.2n ± 0%  -0.20% (p=0.000 n=15)
FmtManyArgs             456.3n ± 0%   455.4n ± 0%  -0.20% (p=0.000 n=15)
GobDecode               7.210m ± 0%   7.156m ± 1%  -0.75% (p=0.000 n=15)
GobEncode               8.143m ± 1%   8.177m ± 1%       ~ (p=0.806 n=15)
Gzip                    280.2m ± 0%   279.7m ± 0%  -0.19% (p=0.005 n=15)
Gunzip                  32.71m ± 0%   32.65m ± 0%  -0.19% (p=0.000 n=15)
HTTPClientServer        53.76µ ± 0%   53.65µ ± 0%       ~ (p=0.083 n=15)
JSONEncode              9.297m ± 0%   9.295m ± 0%       ~ (p=0.806 n=15)
JSONDecode              46.97m ± 1%   47.07m ± 1%       ~ (p=0.683 n=15)
Mandelbrot200           4.602m ± 0%   4.600m ± 0%  -0.05% (p=0.001 n=15)
GoParse                 4.682m ± 0%   4.670m ± 1%  -0.25% (p=0.001 n=15)
RegexpMatchEasy0_32     59.80n ± 0%   59.63n ± 0%  -0.28% (p=0.000 n=15)
RegexpMatchEasy0_1K     458.3n ± 0%   457.3n ± 0%  -0.22% (p=0.001 n=15)
RegexpMatchEasy1_32     59.39n ± 0%   59.23n ± 0%  -0.27% (p=0.000 n=15)
RegexpMatchEasy1_1K     557.9n ± 0%   556.6n ± 0%  -0.23% (p=0.001 n=15)
RegexpMatchMedium_32    803.6n ± 0%   801.8n ± 0%  -0.22% (p=0.001 n=15)
RegexpMatchMedium_1K    27.32µ ± 0%   27.26µ ± 0%  -0.21% (p=0.000 n=15)
RegexpMatchHard_32      1.385µ ± 0%   1.382µ ± 0%  -0.22% (p=0.000 n=15)
RegexpMatchHard_1K      40.93µ ± 0%   40.83µ ± 0%  -0.24% (p=0.000 n=15)
Revcomp                 474.8m ± 0%   474.3m ± 0%       ~ (p=0.250 n=15)
Template                77.41m ± 1%   76.63m ± 1%  -1.01% (p=0.023 n=15)
TimeParse               271.1n ± 0%   271.2n ± 0%  +0.04% (p=0.022 n=15)
TimeFormat              290.0n ± 0%   289.8n ± 0%       ~ (p=0.118 n=15)
geomean                 51.73µ        51.64µ       -0.18%

Change-Id: I45a1e6c85bb3cea0f62766ec932432803e9af10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/619315
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-10-29 01:17:54 +00:00
Cherry Mui bbdc65bb38 test: add a test for wasm memory usage
Test that a small Wasm program uses 8 MB of linear memory. This
reflects the current allocator. We test an exact value, but if the
allocator changes, we can update or relax this.

Updates #69018.

Change-Id: Ifc0bb420af008bd30cde4745b3efde3ce091b683
Reviewed-on: https://go-review.googlesource.com/c/go/+/622378
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-28 13:27:58 +00:00
Youlin Feng bb07aa644b cmd/compile: add shift optimization test
For #69635

Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621755
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2024-10-25 15:35:29 +00:00
Youlin Feng 711552e98a cmd/compile: optimize type switch for a single runtime known type with a case var
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/618476
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2024-10-25 02:56:11 +00:00
Paul E. Murphy 1846dd5a31 cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.

Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.

Fixes #70003

Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-24 17:32:18 +00:00
Xiaolin Zhao 91d07ac71c cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                      |  bench.old   |              bench.new               |
                      |    sec/op    |    sec/op     vs base                |
MemclrKnownSize1        2.4070n ± 0%   0.4004n ± 0%  -83.37% (p=0.000 n=20)
MemclrKnownSize2        2.1365n ± 0%   0.4004n ± 0%  -81.26% (p=0.000 n=20)
MemclrKnownSize4        2.4445n ± 0%   0.4004n ± 0%  -83.62% (p=0.000 n=20)
MemclrKnownSize8        2.4200n ± 0%   0.4004n ± 0%  -83.45% (p=0.000 n=20)
MemclrKnownSize16       2.8030n ± 0%   0.8007n ± 0%  -71.43% (p=0.000 n=20)
MemclrKnownSize32        2.803n ± 0%    1.602n ± 0%  -42.85% (p=0.000 n=20)
MemclrKnownSize64        3.250n ± 0%    2.402n ± 0%  -26.08% (p=0.000 n=20)
MemclrKnownSize112       6.006n ± 0%    2.819n ± 0%  -53.06% (p=0.000 n=20)
MemclrKnownSize128       6.006n ± 0%    3.240n ± 0%  -46.05% (p=0.000 n=20)
MemclrKnownSize192       6.807n ± 0%    5.205n ± 0%  -23.53% (p=0.000 n=20)
MemclrKnownSize248       7.608n ± 0%    6.301n ± 0%  -17.19% (p=0.000 n=20)
MemclrKnownSize256       7.608n ± 0%    6.707n ± 0%  -11.84% (p=0.000 n=20)
MemclrKnownSize512       13.61n ± 0%    13.61n ± 0%        ~ (p=0.374 n=20)
MemclrKnownSize1024      26.43n ± 0%    26.43n ± 0%        ~ (p=0.826 n=20)
MemclrKnownSize4096      103.3n ± 0%    103.3n ± 0%        ~ (p=1.000 n=20)
MemclrKnownSize512KiB    26.29µ ± 0%    26.29µ ± 0%   -0.00% (p=0.012 n=20)
geomean                  10.05n         5.006n       -50.18%

                      |  bench.old   |               bench.new                |
                      |     B/s      |      B/s       vs base                 |
MemclrKnownSize1        396.2Mi ± 0%   2381.9Mi ± 0%  +501.21% (p=0.000 n=20)
MemclrKnownSize2        892.8Mi ± 0%   4764.0Mi ± 0%  +433.59% (p=0.000 n=20)
MemclrKnownSize4        1.524Gi ± 0%    9.305Gi ± 0%  +510.56% (p=0.000 n=20)
MemclrKnownSize8        3.079Gi ± 0%   18.609Gi ± 0%  +504.42% (p=0.000 n=20)
MemclrKnownSize16       5.316Gi ± 0%   18.609Gi ± 0%  +250.05% (p=0.000 n=20)
MemclrKnownSize32       10.63Gi ± 0%    18.61Gi ± 0%   +75.00% (p=0.000 n=20)
MemclrKnownSize64       18.34Gi ± 0%    24.81Gi ± 0%   +35.27% (p=0.000 n=20)
MemclrKnownSize112      17.37Gi ± 0%    37.01Gi ± 0%  +113.08% (p=0.000 n=20)
MemclrKnownSize128      19.85Gi ± 0%    36.80Gi ± 0%   +85.39% (p=0.000 n=20)
MemclrKnownSize192      26.27Gi ± 0%    34.35Gi ± 0%   +30.77% (p=0.000 n=20)
MemclrKnownSize248      30.36Gi ± 0%    36.66Gi ± 0%   +20.75% (p=0.000 n=20)
MemclrKnownSize256      31.34Gi ± 0%    35.55Gi ± 0%   +13.43% (p=0.000 n=20)
MemclrKnownSize512      35.02Gi ± 0%    35.03Gi ± 0%    +0.00% (p=0.030 n=20)
MemclrKnownSize1024     36.09Gi ± 0%    36.09Gi ± 0%         ~ (p=0.101 n=20)
MemclrKnownSize4096     36.93Gi ± 0%    36.93Gi ± 0%    +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB   18.57Gi ± 0%    18.57Gi ± 0%    +0.00% (p=0.041 n=20)
geomean                 10.13Gi         20.33Gi       +100.72%

Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-24 08:55:31 +00:00
Robert Griesemer 7d9802ac5e go/types, types2: qualify named types in error messages with type kind
Change the description of an operand x that has a named type of sorts
by providing a description of the type structure (array, struct, slice,
pointer, etc).

For instance, given a (variable) operand x of a struct type T, the
operand is mentioned as (new):

        x (variable of struct type T)

instead of (old):

        x (variable of type T)

This approach is also used when a basic type is renamed, for instance
as in:

        x (value of uint type big.Word)

which makes it clear that big.Word is a uint.

This change is expected to produce more informative error messages.

Fixes #69955.

Change-Id: I544b0698f753a522c3b6e1800a492a94974fbab7
Reviewed-on: https://go-review.googlesource.com/c/go/+/621458
Reviewed-by: Alan Donovan <adonovan@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-23 04:43:01 +00:00
Max Neverov bdc6dbbc64 go/types: improve recursive type error message
This change improves error message for recursive types.
Currently, compilation of the [following program](https://go.dev/play/p/3ef84ObpzfG):

package main

type T1[T T2] struct{}
type T2[T T1] struct{}

returns an error:

./prog.go:3:6: invalid recursive type T1
	./prog.go:3:6: T1 refers to
	./prog.go:4:6: T2 refers to
	./prog.go:3:6: T1

With the patch applied the error message looks like:

./prog.go:3:6: invalid recursive type T1
	./prog.go:3:6: T1 refers to T2
	./prog.go:4:6: T2 refers to T1

Change-Id: Ic07cdffcffb1483c672b241fede4e694269b5b79
GitHub-Last-Rev: cd042fdc38
GitHub-Pull-Request: golang/go#69574
Reviewed-on: https://go-review.googlesource.com/c/go/+/614084
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-22 22:20:29 +00:00
Keith Randall 74163c895a cmd/compile: use STP/LDP around morestack on arm64
The spill/restore code around morestack is almost never exectued, so
we should make it as small as possible. Using 2-register loads/stores
makes sense here. Also, the offsets from SP are pretty small so the
offset almost always fits in the (smaller than a normal load/store)
offset field of the instruction.

Makes cmd/go 0.6% smaller.

Change-Id: I8845283c1b269a259498153924428f6173bda293
Reviewed-on: https://go-review.googlesource.com/c/go/+/621556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-22 16:23:12 +00:00
Michael Pratt 8668d7bbb9 test: split non-regabi stack map test
CL 594596 already did this for regabi, but missed non-regabi.

Stack allocated swiss maps don't call rand32.

For #54766.

Change-Id: I312ea77532ecc6fa860adfea58ea00b01683ca69
Reviewed-on: https://go-review.googlesource.com/c/go/+/621615
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-10-21 20:54:57 +00:00