6a and 8a rearrange memmove such that the fallthrough from move_1or2 to move_0 ends up being a JMP to a RET. Insert an explicit RET to prevent such silliness.
Do the same for memclr as prophylaxis.
benchmark old ns/op new ns/op delta
BenchmarkMemmove1 4.59 4.13 -10.02%
BenchmarkMemmove2 4.58 4.13 -9.83%
LGTM=khr
R=golang-codereviews, dvyukov, minux, ruiu, bradfitz, khr
CC=golang-codereviews
https://golang.org/cl/120930043
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.
Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).
benchmark old ns/op new ns/op delta
BenchmarkMemmove16 4 4 +0.42%
BenchmarkMemmove32 5 5 -0.20%
BenchmarkMemmove64 6 6 -0.81%
BenchmarkMemmove128 7 7 -0.82%
BenchmarkMemmove256 10 10 +1.92%
BenchmarkMemmove512 29 16 -44.90%
BenchmarkMemmove1024 37 25 -31.55%
BenchmarkMemmove2048 55 44 -19.46%
BenchmarkMemmove4096 92 91 -0.76%
benchmark old MB/s new MB/s speedup
BenchmarkMemmove16 3370.61 3356.88 1.00x
BenchmarkMemmove32 6368.68 6386.99 1.00x
BenchmarkMemmove64 10367.37 10462.62 1.01x
BenchmarkMemmove128 17551.16 17713.48 1.01x
BenchmarkMemmove256 24692.81 24142.99 0.98x
BenchmarkMemmove512 17428.70 31687.72 1.82x
BenchmarkMemmove1024 27401.82 40009.45 1.46x
BenchmarkMemmove2048 36884.86 45766.98 1.24x
BenchmarkMemmove4096 44295.91 44627.86 1.01x
LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://golang.org/cl/90500043
On Plan 9, the kernel disallows the use of floating point
instructions while handling a note. Previously, we worked
around this by using a simple loop in place of memmove.
When I added that work-around, I verified that all paths
from the note handler didn't end up calling memmove. Now
that memclr is using SSE instructions, the same process
will have to be done again.
Instead of doing that, however, this CL just punts and
uses unoptimized functions everywhere on Plan 9.
LGTM=rsc
R=rsc, 0intro
CC=golang-codereviews
https://golang.org/cl/73830044
Remove NOPROF/DUPOK from everything.
Edits done with a script, except pclinetest.asm which depended
on the DUPOK flag on main().
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12613044
I have not done the system call stubs in sys_*.s.
I hope to avoid that, because those do not block, so those
frames will not appear in stack traces during garbage
collection.
R=golang-dev, dvyukov, khr
CC=golang-dev
https://golang.org/cl/11360043
Collapse the arch,os-specific directories into the main directory
by renaming xxx/foo.c to foo_xxx.c, and so on.
There are no substantial edits here, except to the Makefile.
The assumption is that the Go tool will #define GOOS_darwin
and GOARCH_amd64 and will make any file named something
like signals_darwin.h available as signals_GOOS.h during the
build. This replaces what used to be done with -I$(GOOS).
There is still work to be done to make runtime build with
standard tools, but this is a big step. After this we will have
to write a script to generate all the generated files so they
can be checked in (instead of generated during the build).
R=r, iant, r, lucio.dere
CC=golang-dev
https://golang.org/cl/5490053
* move memmove to arch-specific subdirectories
* add memmove for arm
* add copyright notices marking them as copied from Inferno
R=ken2
https://golang.org/cl/156061