Commit Graph

11 Commits

Author SHA1 Message Date
Josh Bleecher Snyder bc9e160443 runtime: prevent pointless jmp in amd64 and 386 memmove
6a and 8a rearrange memmove such that the fallthrough from move_1or2 to move_0 ends up being a JMP to a RET. Insert an explicit RET to prevent such silliness.

Do the same for memclr as prophylaxis.

benchmark                old ns/op     new ns/op     delta
BenchmarkMemmove1        4.59          4.13          -10.02%
BenchmarkMemmove2        4.58          4.13          -9.83%

LGTM=khr
R=golang-codereviews, dvyukov, minux, ruiu, bradfitz, khr
CC=golang-codereviews
https://golang.org/cl/120930043
2014-08-01 06:21:08 -07:00
Keith Randall 0c6b55e76b runtime: convert map implementation to Go.
It's a bit slower, but not painfully so.  There is still room for
improvement (saving space so we can use nosplit, and removing the
requirement for hash/eq stubs).

benchmark                              old ns/op     new ns/op     delta
BenchmarkMegMap                        23.5          24.2          +2.98%
BenchmarkMegOneMap                     14.9          15.7          +5.37%
BenchmarkMegEqMap                      71668         72234         +0.79%
BenchmarkMegEmptyMap                   4.05          4.93          +21.73%
BenchmarkSmallStrMap                   21.9          22.5          +2.74%
BenchmarkMapStringKeysEight_16         23.1          26.3          +13.85%
BenchmarkMapStringKeysEight_32         21.9          25.0          +14.16%
BenchmarkMapStringKeysEight_64         21.9          25.1          +14.61%
BenchmarkMapStringKeysEight_1M         21.9          25.0          +14.16%
BenchmarkIntMap                        21.8          12.5          -42.66%
BenchmarkRepeatedLookupStrMapKey32     39.3          30.2          -23.16%
BenchmarkRepeatedLookupStrMapKey1M     322353        322675        +0.10%
BenchmarkNewEmptyMap                   129           136           +5.43%
BenchmarkMapIter                       137           107           -21.90%
BenchmarkMapIterEmpty                  7.14          8.71          +21.99%
BenchmarkSameLengthMap                 5.24          6.82          +30.15%
BenchmarkBigKeyMap                     34.5          35.3          +2.32%
BenchmarkBigValMap                     36.1          36.1          +0.00%
BenchmarkSmallKeyMap                   26.9          26.7          -0.74%

LGTM=rsc
R=golang-codereviews, dave, dvyukov, rsc, gobot, khr
CC=golang-codereviews
https://golang.org/cl/99380043
2014-07-16 14:16:19 -07:00
Rui Ueyama a712e20a1d runtime: speed up amd64 memmove
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.

Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).

benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove16              4            4   +0.42%
BenchmarkMemmove32              5            5   -0.20%
BenchmarkMemmove64              6            6   -0.81%
BenchmarkMemmove128             7            7   -0.82%
BenchmarkMemmove256            10           10   +1.92%
BenchmarkMemmove512            29           16  -44.90%
BenchmarkMemmove1024           37           25  -31.55%
BenchmarkMemmove2048           55           44  -19.46%
BenchmarkMemmove4096           92           91   -0.76%

benchmark                old MB/s     new MB/s  speedup
BenchmarkMemmove16        3370.61      3356.88    1.00x
BenchmarkMemmove32        6368.68      6386.99    1.00x
BenchmarkMemmove64       10367.37     10462.62    1.01x
BenchmarkMemmove128      17551.16     17713.48    1.01x
BenchmarkMemmove256      24692.81     24142.99    0.98x
BenchmarkMemmove512      17428.70     31687.72    1.82x
BenchmarkMemmove1024     27401.82     40009.45    1.46x
BenchmarkMemmove2048     36884.86     45766.98    1.24x
BenchmarkMemmove4096     44295.91     44627.86    1.01x

LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://golang.org/cl/90500043
2014-06-23 12:06:26 -07:00
Anthony Martin 8303a13bb8 runtime: use unoptimized memmove and memclr on Plan 9
On Plan 9, the kernel disallows the use of floating point
instructions while handling a note. Previously, we worked
around this by using a simple loop in place of memmove.

When I added that work-around, I verified that all paths
from the note handler didn't end up calling memmove. Now
that memclr is using SSE instructions, the same process
will have to be done again.

Instead of doing that, however, this CL just punts and
uses unoptimized functions everywhere on Plan 9.

LGTM=rsc
R=rsc, 0intro
CC=golang-codereviews
https://golang.org/cl/73830044
2014-03-12 18:12:25 -07:00
Keith Randall 0273dc131e runtime: convert .s textflags from numbers to symbolic constants.
Remove NOPROF/DUPOK from everything.

Edits done with a script, except pclinetest.asm which depended
on the DUPOK flag on main().

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12613044
2013-08-07 12:20:05 -07:00
Russ Cox 9ddfb64365 runtime: record argument size in assembly functions
I have not done the system call stubs in sys_*.s.
I hope to avoid that, because those do not block, so those
frames will not appear in stack traces during garbage
collection.

R=golang-dev, dvyukov, khr
CC=golang-dev
https://golang.org/cl/11360043
2013-07-16 16:24:09 -04:00
Keith Randall 6021449236 runtime: faster x86 memmove (a.k.a. built-in copy())
REP instructions have a high startup cost, so we handle small
sizes with some straightline code.  The REP MOVSx instructions
are really fast for large sizes.  The cutover is approximately
1K.  We implement up to 128/256 because that is the maximum
SSE register load (loading all data into registers before any
stores lets us ignore copy direction).

(on a Sandy Bridge E5-1650 @ 3.20GHz)
benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove0               3            3   +0.86%
BenchmarkMemmove1               5            5   +5.40%
BenchmarkMemmove2              18            8  -56.84%
BenchmarkMemmove3              18            7  -58.45%
BenchmarkMemmove4              36            7  -78.63%
BenchmarkMemmove5              36            8  -77.91%
BenchmarkMemmove6              36            8  -77.76%
BenchmarkMemmove7              36            8  -77.82%
BenchmarkMemmove8              18            8  -56.33%
BenchmarkMemmove9              18            7  -58.34%
BenchmarkMemmove10             18            7  -58.34%
BenchmarkMemmove11             18            7  -58.45%
BenchmarkMemmove12             36            7  -78.51%
BenchmarkMemmove13             36            7  -78.48%
BenchmarkMemmove14             36            7  -78.56%
BenchmarkMemmove15             36            7  -78.56%
BenchmarkMemmove16             18            7  -58.24%
BenchmarkMemmove32             18            8  -54.33%
BenchmarkMemmove64             18            8  -53.37%
BenchmarkMemmove128            20            9  -55.93%
BenchmarkMemmove256            25           11  -55.16%
BenchmarkMemmove512            33           33   -1.19%
BenchmarkMemmove1024           43           44   +2.06%
BenchmarkMemmove2048           61           61   +0.16%
BenchmarkMemmove4096           95           95   +0.00%

R=golang-dev, bradfitz, remyoudompheng, khr, iant, dominik.honnef
CC=golang-dev
https://golang.org/cl/9038048
2013-05-17 12:53:49 -07:00
Rémy Oudompheng 1b8f51c917 runtime: fix integer overflow in amd64 memmove.
Fixes #4981.

R=bradfitz, fullung, rsc, minux.ma
CC=golang-dev
https://golang.org/cl/7474047
2013-03-09 00:41:03 +01:00
Russ Cox 851f30136d runtime: make more build-friendly
Collapse the arch,os-specific directories into the main directory
by renaming xxx/foo.c to foo_xxx.c, and so on.

There are no substantial edits here, except to the Makefile.
The assumption is that the Go tool will #define GOOS_darwin
and GOARCH_amd64 and will make any file named something
like signals_darwin.h available as signals_GOOS.h during the
build.  This replaces what used to be done with -I$(GOOS).

There is still work to be done to make runtime build with
standard tools, but this is a big step.  After this we will have
to write a script to generate all the generated files so they
can be checked in (instead of generated during the build).

R=r, iant, r, lucio.dere
CC=golang-dev
https://golang.org/cl/5490053
2011-12-16 15:33:58 -05:00
Russ Cox ed0beea27b copy tweaks
* move memmove to arch-specific subdirectories
  * add memmove for arm
  * add copyright notices marking them as copied from Inferno

R=ken2
https://golang.org/cl/156061
2009-11-17 22:16:55 -08:00
Ken Thompson c4606d05da install copy predefined
did not test 386, but should work
shouldnt matter if copy is not used

R=rsc
https://golang.org/cl/156055
2009-11-17 20:41:44 -08:00