Commit Graph

11 Commits

Author SHA1 Message Date
Lynn Boger bb1fd3b5ff cmd/compile: add rules to improve consecutive byte loads and stores on ppc64le
This adds new rules to recognize consecutive byte loads and
stores and lowers them to loads and stores such as lhz, lwz, ld,
sth, stw, std. This change only covers the little endian cases
on little endian machines, such as is found in encoding/binary
UintXX or PutUintXX for little endian. Big endian will be done
later.

Updates were also made to binary_test.go to allow the benchmark
for Uint and PutUint to actually use those functions because
the way they were written, those functions were being
optimized out.

Testcases were also added to cmd/compile/internal/gc/asm_test.go.

Updates #22496

The following improvement can be found in golang.org/x/crypto

poly1305:

Benchmark64-16              142           114           -19.72%
Benchmark1K-16              1717          1424          -17.06%
Benchmark64Unaligned-16     142           113           -20.42%
Benchmark1KUnaligned-16     1721          1428          -17.02%

chacha20poly1305:

BenchmarkChacha20Poly1305Open_64-16     1012       885   -12.55%
BenchmarkChacha20Poly1305Seal_64-16     971        836   -13.90%
BenchmarkChacha20Poly1305Open_1350-16   11113      9539  -14.16%
BenchmarkChacha20Poly1305Seal_1350-16   11013      9392  -14.72%
BenchmarkChacha20Poly1305Open_8K-16     61074      53431 -12.51%
BenchmarkChacha20Poly1305Seal_8K-16     61214      54806 -10.47%

Other improvements of around 10% found in crypto/tls.

Results after updating encoding/binary/binary_test.go:

BenchmarkLittleEndianPutUint64-16     1.87      0.93      -50.27%
BenchmarkLittleEndianPutUint32-16     1.19      0.93      -21.85%
BenchmarkLittleEndianPutUint16-16     1.16      1.03      -11.21%

Change-Id: I7bbe2fbcbd11362d58662fecd907a0c07e6ca2fb
Reviewed-on: https://go-review.googlesource.com/74410
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-11-03 18:46:59 +00:00
Kirill Smelkov 4477fd097f cmd/compile/internal/ssa: combine 2 byte loads + shifts into word load + rolw 8 on AMD64
... and same for stores. This does for binary.BigEndian.Uint16() what
was already done for Uint32 and Uint64 with BSWAP in 10f75748 (CL 32222).

Here is how generated code changes e.g. for the following function
(omitting saying the same prologue/epilogue):

	func get16(b [2]byte) uint16 {
		return binary.BigEndian.Uint16(b[:])
	}

"".get16 t=1 size=21 args=0x10 locals=0x0

	// before
        0x0000 00000 (x.go:15)  MOVBLZX "".b+9(FP), AX
        0x0005 00005 (x.go:15)  MOVBLZX "".b+8(FP), CX
        0x000a 00010 (x.go:15)  SHLL    $8, CX
        0x000d 00013 (x.go:15)  ORL     CX, AX

	// after
	0x0000 00000 (x.go:15)	MOVWLZX	"".b+8(FP), AX
	0x0005 00005 (x.go:15)	ROLW	$8, AX

encoding/binary is speedup overall a bit:

name                    old time/op    new time/op    delta
ReadSlice1000Int32s-4     4.83µs ± 0%    4.83µs ± 0%     ~     (p=0.206 n=4+5)
ReadStruct-4              1.29µs ± 2%    1.28µs ± 1%   -1.27%  (p=0.032 n=4+5)
ReadInts-4                 384ns ± 1%     385ns ± 1%     ~     (p=0.968 n=4+5)
WriteInts-4                534ns ± 3%     526ns ± 0%   -1.54%  (p=0.048 n=4+5)
WriteSlice1000Int32s-4    5.02µs ± 0%    5.11µs ± 3%     ~     (p=0.175 n=4+5)
PutUint16-4               0.59ns ± 0%    0.49ns ± 2%  -16.95%  (p=0.016 n=4+5)
PutUint32-4               0.52ns ± 0%    0.52ns ± 0%     ~     (all equal)
PutUint64-4               0.53ns ± 0%    0.53ns ± 0%     ~     (all equal)
PutUvarint32-4            19.9ns ± 0%    19.9ns ± 1%     ~     (p=0.556 n=4+5)
PutUvarint64-4            54.5ns ± 1%    54.2ns ± 0%     ~     (p=0.333 n=4+5)

name                    old speed      new speed      delta
ReadSlice1000Int32s-4    829MB/s ± 0%   828MB/s ± 0%     ~     (p=0.190 n=4+5)
ReadStruct-4            58.0MB/s ± 2%  58.7MB/s ± 1%   +1.30%  (p=0.032 n=4+5)
ReadInts-4              78.0MB/s ± 1%  77.8MB/s ± 1%     ~     (p=0.968 n=4+5)
WriteInts-4             56.1MB/s ± 3%  57.0MB/s ± 0%     ~     (p=0.063 n=4+5)
WriteSlice1000Int32s-4   797MB/s ± 0%   783MB/s ± 3%     ~     (p=0.190 n=4+5)
PutUint16-4             3.37GB/s ± 0%  4.07GB/s ± 2%  +20.83%  (p=0.016 n=4+5)
PutUint32-4             7.73GB/s ± 0%  7.72GB/s ± 0%     ~     (p=0.556 n=4+5)
PutUint64-4             15.1GB/s ± 0%  15.1GB/s ± 0%     ~     (p=0.905 n=4+5)
PutUvarint32-4           201MB/s ± 0%   201MB/s ± 0%     ~     (p=0.905 n=4+5)
PutUvarint64-4           147MB/s ± 1%   147MB/s ± 0%     ~     (p=0.286 n=4+5)

( "a bit" only because most of the time is spent in reflection-like things
  there, not actual bytes decoding. Even for direct PutUint16 benchmark the
  looping adds overhead and lowers visible benefit. For code-generated encoders /
  decoders actual effect is more than 20% )

Adding Uint32 and Uint64 raw benchmarks too for completeness.

NOTE I had to adjust load-combining rule for bswap case to match first 2 bytes
loads as result of "2-bytes load+shift" -> "loadw + rorw 8" rewrite. Reason is:
for loads+shift, even e.g. into uint16 var

	var b []byte
	var v uin16
	v = uint16(b[1]) | uint16(b[0])<<8

the compiler eventually generates L(ong) shift - SHLLconst [8], probably
because it is more straightforward / other reasons to work on the whole
register. This way 2 bytes rewriting rule is using SHLLconst (not SHLWconst) in
its pattern, and then it always gets matched first, even if 2-byte rule comes
syntactically after 4-byte rule in AMD64.rules because 4-bytes rule seemingly
needs more applyRewrite() cycles to trigger. If 2-bytes rule gets matched for
inner half of

	var b []byte
	var v uin32
	v = uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24

and we keep 4-byte load rule unchanged, the result will be MOVW + RORW $8 and
then series of byte loads and shifts - not one MOVL + BSWAPL.

There is no such problem for stores: there compiler, since it probably knows
store destination is 2 bytes wide, uses SHRWconst 8 (not SHRLconst 8) and thus
2-byte store rule is not a subset of rule for 4-byte stores.

Fixes #17151  (int16 was last missing piece there)

Change-Id: Idc03ba965bfce2b94fef456b02ff6742194748f6
Reviewed-on: https://go-review.googlesource.com/34636
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 22:17:08 +00:00
Blixt 456a01ac47 encoding/binary: add bool support
This change adds support for decoding and encoding the bool type. The
encoding is a single byte, with a zero value for false and a non-zero
value for true.

Closes #16856.

Change-Id: I1d1114b320263691473bb100cad0f380e0204186
Reviewed-on: https://go-review.googlesource.com/28514
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-28 16:20:41 +00:00
Alexandru Moșoi 478b594d51 encoding/binary: fix bound check
The inserted early bound checks cause the slice
to expand beyond the original length of the slice.

Change-Id: Ib38891605f4a9a12d3b9e2071a5f77640b083d2d
Reviewed-on: https://go-review.googlesource.com/20981
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-21 19:22:22 +00:00
Marcel van Lohuizen 6e2deaa1e1 encoding/binary: don't assume b.N > 0
Change-Id: I9e887a0b32baf0adc85fa9e4b85b319e8ef333e9
Reviewed-on: https://go-review.googlesource.com/20853
Reviewed-by: Russ Cox <rsc@golang.org>
2016-03-18 15:54:51 +00:00
Marcel van Lohuizen 705be76b6f encoding/binary: improve error messages for benchmarks
Change-Id: I0f4b6752ecc8b4945ecfde627cdec13fc4bb6a69
Reviewed-on: https://go-review.googlesource.com/20850
Reviewed-by: Russ Cox <rsc@golang.org>
2016-03-18 15:38:58 +00:00
Brad Fitzpatrick 5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Brad Fitzpatrick 519474451a all: make copyright headers consistent with one space after period
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.

Go policy has been single space after periods in comments for some time.

The copyright header template at:

    https://golang.org/doc/contribute.html#copyright

also uses a single space.

Make them all consistent.

Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 23:34:33 +00:00
Joe Tsai b72a4a07c2 encoding/binary: document that Read returns io.EOF iff zero bytes are read
Also add a unit test to lock this behavior into the API.

Fixes #12016

Change-Id: Ib6ec6e7948f0705f3504ede9143b5dc4e790fc44
Reviewed-on: https://go-review.googlesource.com/15171
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-30 22:10:44 +00:00
Robert Griesemer 1dba6eb464 encoding/binary: fix error message
In the process, simplified internal sizeOf and
dataSize functions. Minor positive impact on
performance. Added test case.

benchmark                         old ns/op     new ns/op     delta
BenchmarkReadSlice1000Int32s      14006         14122         +0.83%
BenchmarkReadStruct               2508          2447          -2.43%
BenchmarkReadInts                 921           928           +0.76%
BenchmarkWriteInts                2086          2081          -0.24%
BenchmarkWriteSlice1000Int32s     13440         13497         +0.42%
BenchmarkPutUvarint32             28.5          26.3          -7.72%
BenchmarkPutUvarint64             81.3          76.7          -5.66%

benchmark                         old MB/s     new MB/s     speedup
BenchmarkReadSlice1000Int32s      285.58       283.24       0.99x
BenchmarkReadStruct               27.90        28.60        1.03x
BenchmarkReadInts                 32.57        32.31        0.99x
BenchmarkWriteInts                14.38        14.41        1.00x
BenchmarkWriteSlice1000Int32s     297.60       296.36       1.00x
BenchmarkPutUvarint32             140.55       151.92       1.08x
BenchmarkPutUvarint64             98.36        104.33       1.06x

Fixes #6818.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/149290045
2014-10-02 12:53:51 -07:00
Russ Cox c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00