Unlike C's memmove, Go's memmove must be careful to do indivisible
writes of pointer values because it may be racing with the garbage
collector reading the heap.
We've had various bugs related to this over the years (#36101, #13160,
#12552). Indeed, memmove is a great target for optimization and it's
easy to forget the special requirements of Go's memmove.
The CL documents these (currently unwritten!) requirements. We're also
adding a test that should hopefully keep everyone honest going
forward, though it's hard to be sure we're hitting all cases of
memmove.
Change-Id: I2f59f8d8d6fb42d2f10006b55d605b5efd8ddc24
Reviewed-on: https://go-review.googlesource.com/c/go/+/213418
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add an "offset_" prefix to all cpu feature variable offset constants to
signify that they are not boolean cpu feature variables.
Remove _ from offset constant names.
Change-Id: I6e22a79ebcbe6e2ae54c4ac8764f9260bb3223ff
Reviewed-on: https://go-review.googlesource.com/131215
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Using internal/cpu variables has the benefit of avoiding false sharing
(as those are padded) and allows memory and cache usage for these variables
to be shared by multiple packages.
Change-Id: I2bf68d03091bf52b466cf689230d5d25d5950037
Reviewed-on: https://go-review.googlesource.com/126599
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
For memmove/memclr using jump tables only reduces overall
function performance for both amd64 and 386.
Benchmarks for 32-bit memclr:
name old time/op new time/op delta
Memclr/5-8 8.01ns ± 0% 8.94ns ± 2% +11.59% (p=0.000 n=9+9)
Memclr/16-8 9.05ns ± 0% 9.49ns ± 0% +4.81% (p=0.000 n=8+8)
Memclr/64-8 9.15ns ± 0% 9.49ns ± 0% +3.76% (p=0.000 n=9+10)
Memclr/256-8 16.6ns ± 0% 16.6ns ± 0% ~ (p=1.140 n=10+9)
Memclr/4096-8 179ns ± 0% 166ns ± 0% -7.26% (p=0.000 n=9+8)
Memclr/65536-8 3.36µs ± 1% 3.31µs ± 1% -1.48% (p=0.000 n=10+9)
Memclr/1M-8 59.5µs ± 3% 60.5µs ± 2% +1.67% (p=0.009 n=10+10)
Memclr/4M-8 239µs ± 3% 245µs ± 0% +2.49% (p=0.004 n=10+8)
Memclr/8M-8 618µs ± 2% 614µs ± 1% ~ (p=0.315 n=10+8)
Memclr/16M-8 1.49ms ± 2% 1.47ms ± 1% -1.11% (p=0.029 n=10+10)
Memclr/64M-8 7.06ms ± 1% 7.05ms ± 0% ~ (p=0.573 n=10+8)
[Geo mean] 3.36µs 3.39µs +1.14%
For less predictable data, like loop iteration dependant sizes,
branch table still shows 2-5% worse results.
It also makes code slightly more complicated.
This CL removes TODO note that directly suggest trying this
optimization out. That encourages people to spend their time
in a quite hopeless endeavour.
The code used to implement branch table used a 32/64-entry table
with pointers to TEXT blocks that implemented every associated
label work. Most last entries point to "loop" code that is
a fallthrough for all other sizes that do not map into specialized
routines. The only inefficiency is extra MOVL/MOVQ required
to fetch table pointer itself as MOVL $sym<>(SB)(AX*4) is not valid
in Go asm (it works in other assemblers):
TEXT ·memclrNew(SB), NOSPLIT, $0-8
MOVL ptr+0(FP), DI
MOVL n+4(FP), BX
// Handle 0 separately.
TESTL BX, BX
JEQ _0
LEAL -1(BX), CX // n-1
BSRL CX, CX
// AX or X0 zeroed inside every text block.
MOVL $memclrTable<>(SB), AX
JMP (AX)(CX*4)
_0:
RET
Change-Id: I4f706931b8127f85a8439b95834d5c2485a5d1bf
Reviewed-on: https://go-review.googlesource.com/115678
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The function signatures in the comments used a C-like style. Using
Go function signatures is cleaner.
Change-Id: I1a093ed8fe5df59f3697c613cf3fce58bba4f5c1
Reviewed-on: https://go-review.googlesource.com/113876
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Changes all cpu features to be detected and stored in bools in rt0_go.
Updates: #15403
Change-Id: I5a9961cdec789b331d09c44d86beb53833d5dc3e
Reviewed-on: https://go-review.googlesource.com/41950
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
Fixes#16911.
Fix obsolete inferno-os links, since code.google.com shutdown.
This CL points to the right files by replacing
http://code.google.com/p/inferno-os/source/browse
with
https://bitbucket.org/inferno-os/inferno-os/src/default
To implement the change I wrote and ran this script in the root:
$ grep -Rn 'http://code.google.com/p/inferno-os/source/browse' * \
| cut -d":" -f1 | while read F;do perl -pi -e \
's/http:\/\/code.google.com\/p\/inferno-os\/source\/browse/https:\/\/bitbucket.org\/inferno-os\/inferno-os\/src\/default/g'
$F;done
I excluded any cmd/vendor changes from the commit.
Change-Id: Iaaf828ac8f6fc949019fd01832989d00b29b6749
Reviewed-on: https://go-review.googlesource.com/27994
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Only use REP;MOVSB if:
1) The CPUID flag says it is fast, and
2) The pointers are unaligned
Otherwise, use REP;MOVSQ.
Update #14630
Change-Id: I946b28b87880c08e5eed1ce2945016466c89db66
Reviewed-on: https://go-review.googlesource.com/21300
Reviewed-by: Nigel Tao <nigeltao@golang.org>
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes#14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Make sure that we're moving or zeroing pointers atomically.
Anything that is a multiple of pointer size and at least
pointer aligned might have pointers in it. All the code looks
ok except for the 1-pointer-sized moves.
Fixes#13160
Update #12552
Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45
Reviewed-on: https://go-review.googlesource.com/16668
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>