Commit Graph

7 Commits

Author SHA1 Message Date
Diogo Pinela 19ed0d993c cmd/compile: use staticuint64s instead of staticbytes
There are still two places in src/runtime/string.go that use
staticbytes, so we cannot delete it just yet.

There is a new codegen test to verify that the index calculation
is constant-folded, at least on amd64. ppc64, mips[64] and s390x
cannot currently do that.

There is also a new runtime benchmark to ensure that this does not
slow down performance (tested against parent commit):

name                      old time/op  new time/op  delta
ConvT2EByteSized/bool-4   1.07ns ± 1%  1.07ns ± 1%   ~     (p=0.060 n=14+15)
ConvT2EByteSized/uint8-4  1.06ns ± 1%  1.07ns ± 1%   ~     (p=0.095 n=14+15)

Updates #37612

Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221957
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-04 21:43:01 +00:00
Josh Bleecher Snyder 9828c43288 runtime: prevent allocation when converting small ints to interfaces
Prior to this change, we avoid allocation when
converting 0 to an interface.

This change extends that optimization to larger value types
whose values happens to be in the range 0 to 255.
This is marginally more expensive in the case of a 0 value,
in that the address is computed rather than fixed.

name                         old time/op  new time/op  delta
ConvT2ESmall-8               2.36ns ± 4%  2.65ns ± 4%  +12.23%  (p=0.000 n=87+91)
ConvT2EUintptr-8             2.36ns ± 4%  2.84ns ± 6%  +20.05%  (p=0.000 n=96+99)
ConvT2ELarge-8               23.8ns ± 2%  23.1ns ± 3%   -2.94%  (p=0.000 n=93+95)
ConvT2ISmall-8               2.67ns ± 5%  2.74ns ±27%     ~     (p=0.214 n=99+100)
ConvT2IUintptr-8             2.65ns ± 5%  2.46ns ± 5%   -7.19%  (p=0.000 n=98+98)
ConvT2ILarge-8               24.2ns ± 2%  23.5ns ± 4%   -3.16%  (p=0.000 n=91+97)
ConvT2Ezero/zero/16-8        2.79ns ± 6%  2.99ns ± 4%   +7.52%  (p=0.000 n=94+88)
ConvT2Ezero/zero/32-8        2.34ns ± 3%  2.65ns ± 3%  +13.06%  (p=0.000 n=92+98)
ConvT2Ezero/zero/64-8        2.35ns ± 4%  2.65ns ± 6%  +12.86%  (p=0.000 n=99+94)
ConvT2Ezero/zero/str-8       2.55ns ± 4%  2.54ns ± 4%     ~     (p=0.063 n=97+99)
ConvT2Ezero/zero/slice-8     2.82ns ± 4%  2.85ns ± 5%   +1.00%  (p=0.000 n=99+95)
ConvT2Ezero/zero/big-8       94.3ns ± 5%  93.4ns ± 4%   -0.94%  (p=0.000 n=88+90)
ConvT2Ezero/nonzero/str-8    29.6ns ± 3%  27.7ns ± 3%   -6.69%  (p=0.000 n=98+97)
ConvT2Ezero/nonzero/slice-8  36.6ns ± 2%  37.1ns ± 2%   +1.31%  (p=0.000 n=94+90)
ConvT2Ezero/nonzero/big-8    93.4ns ± 3%  92.7ns ± 3%   -0.74%  (p=0.000 n=88+84)
ConvT2Ezero/smallint/16-8    13.3ns ± 4%   2.7ns ± 6%  -79.82%  (p=0.000 n=100+97)
ConvT2Ezero/smallint/32-8    12.5ns ± 1%   2.9ns ± 5%  -77.17%  (p=0.000 n=85+96)
ConvT2Ezero/smallint/64-8    14.7ns ± 3%   2.6ns ± 3%  -82.05%  (p=0.000 n=94+94)
ConvT2Ezero/largeint/16-8    14.0ns ± 4%  13.2ns ± 7%   -5.44%  (p=0.000 n=95+99)
ConvT2Ezero/largeint/32-8    12.8ns ± 4%  12.9ns ± 3%     ~     (p=0.096 n=99+87)
ConvT2Ezero/largeint/64-8    15.5ns ± 2%  15.0ns ± 2%   -3.46%  (p=0.000 n=95+96)

An example of a program for which this makes a perceptible difference
is running the compiler with the -S flag:

name        old time/op       new time/op       delta
Template          349ms ± 2%        344ms ± 2%   -1.48%  (p=0.000 n=23+25)
Unicode           138ms ± 4%        136ms ± 3%   -1.67%  (p=0.003 n=25+25)
GoTypes           1.25s ± 2%        1.24s ± 2%   -1.11%  (p=0.001 n=24+25)
Compiler          5.73s ± 2%        5.67s ± 2%   -1.09%  (p=0.002 n=25+24)
SSA               20.2s ± 2%        19.9s ± 2%   -1.45%  (p=0.000 n=25+23)
Flate             216ms ± 4%        210ms ± 2%   -2.77%  (p=0.000 n=25+24)
GoParser          283ms ± 2%        278ms ± 3%   -1.58%  (p=0.000 n=23+23)
Reflect           757ms ± 2%        745ms ± 2%   -1.58%  (p=0.000 n=25+25)
Tar               303ms ± 4%        296ms ± 2%   -2.20%  (p=0.000 n=22+23)
XML               415ms ± 2%        411ms ± 3%   -0.94%  (p=0.002 n=25+22)
[Geo mean]        726ms             715ms        -1.59%

name        old user-time/op  new user-time/op  delta
Template          434ms ± 3%        427ms ± 2%   -1.66%  (p=0.000 n=23+24)
Unicode           204ms ±12%        198ms ±12%   -2.83%  (p=0.032 n=25+25)
GoTypes           1.59s ± 2%        1.56s ± 2%   -1.64%  (p=0.000 n=22+25)
Compiler          7.50s ± 1%        7.40s ± 2%   -1.32%  (p=0.000 n=25+25)
SSA               27.2s ± 2%        26.8s ± 2%   -1.50%  (p=0.000 n=24+23)
Flate             266ms ± 6%        254ms ± 3%   -4.38%  (p=0.000 n=25+25)
GoParser          357ms ± 2%        351ms ± 2%   -1.90%  (p=0.000 n=24+23)
Reflect           966ms ± 2%        947ms ± 2%   -1.94%  (p=0.000 n=24+25)
Tar               387ms ± 2%        380ms ± 3%   -1.83%  (p=0.000 n=22+24)
XML               538ms ± 1%        532ms ± 1%   -1.15%  (p=0.000 n=24+20)
[Geo mean]        942ms             923ms        -2.02%

name        old alloc/op      new alloc/op      delta
Template         54.1MB ± 0%       52.9MB ± 0%   -2.26%  (p=0.000 n=25+25)
Unicode          33.5MB ± 0%       33.1MB ± 0%   -1.03%  (p=0.000 n=25+24)
GoTypes           189MB ± 0%        185MB ± 0%   -2.27%  (p=0.000 n=25+25)
Compiler          875MB ± 0%        858MB ± 0%   -1.99%  (p=0.000 n=23+25)
SSA              3.19GB ± 0%       3.13GB ± 0%   -1.95%  (p=0.000 n=25+25)
Flate            32.9MB ± 0%       32.2MB ± 0%   -2.26%  (p=0.000 n=25+25)
GoParser         44.0MB ± 0%       42.9MB ± 0%   -2.33%  (p=0.000 n=25+25)
Reflect           117MB ± 0%        114MB ± 0%   -2.60%  (p=0.000 n=25+25)
Tar              48.6MB ± 0%       47.5MB ± 0%   -2.18%  (p=0.000 n=25+24)
XML              65.7MB ± 0%       64.4MB ± 0%   -1.96%  (p=0.000 n=23+25)
[Geo mean]        118MB             115MB        -2.08%

name        old allocs/op     new allocs/op     delta
Template          1.07M ± 0%        0.92M ± 0%  -14.29%  (p=0.000 n=25+25)
Unicode            539k ± 0%         494k ± 0%   -8.27%  (p=0.000 n=25+25)
GoTypes           3.97M ± 0%        3.43M ± 0%  -13.71%  (p=0.000 n=24+25)
Compiler          17.6M ± 0%        15.4M ± 0%  -12.69%  (p=0.000 n=25+24)
SSA               66.1M ± 0%        58.1M ± 0%  -12.17%  (p=0.000 n=25+25)
Flate              629k ± 0%         536k ± 0%  -14.73%  (p=0.000 n=24+24)
GoParser           929k ± 0%         799k ± 0%  -13.96%  (p=0.000 n=25+25)
Reflect           2.49M ± 0%        2.11M ± 0%  -15.28%  (p=0.000 n=25+25)
Tar                919k ± 0%         788k ± 0%  -14.30%  (p=0.000 n=25+25)
XML               1.28M ± 0%        1.11M ± 0%  -12.85%  (p=0.000 n=24+25)
[Geo mean]        2.32M             2.01M       -13.24%

There is a slight increase in binary size from this change:

file      before    after     Δ       %
addr2line 4307728   4307760   +32     +0.001%
api       5972680   5972728   +48     +0.001%
asm       5114200   5114232   +32     +0.001%
buildid   2843720   2847848   +4128   +0.145%
cgo       4823736   4827864   +4128   +0.086%
compile   24912056  24912104  +48     +0.000%
cover     5259800   5259832   +32     +0.001%
dist      3665080   3665128   +48     +0.001%
doc       4672712   4672744   +32     +0.001%
fix       3376952   3376984   +32     +0.001%
link      6618008   6622152   +4144   +0.063%
nm        4253280   4257424   +4144   +0.097%
objdump   4655376   4659504   +4128   +0.089%
pack      2294280   2294328   +48     +0.002%
pprof     14747476  14751620  +4144   +0.028%
test2json 2819320   2823448   +4128   +0.146%
trace     11665068  11669212  +4144   +0.036%
vet       8342360   8342408   +48     +0.001%

Change-Id: I38ef70244e23069bfd14334061d43ae22a294519
Reviewed-on: https://go-review.googlesource.com/c/go/+/216401
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 23:14:55 +00:00
Josh Bleecher Snyder 504bc3ed24 cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.

This CL adds twelve runtime routines:

convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr

While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.

Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.

name                         old time/op  new time/op  delta
ConvT2Ezero/zero/16-8        17.9ns ± 2%   3.0ns ± 3%  -83.20%  (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8        17.8ns ± 2%   3.0ns ± 3%  -83.15%  (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8        20.1ns ± 1%   3.0ns ± 2%  -84.98%  (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8       32.6ns ± 1%   3.0ns ± 4%  -90.70%  (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8     36.7ns ± 2%   3.0ns ± 2%  -91.78%  (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8       91.9ns ± 2%  85.9ns ± 2%   -6.52%  (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8     17.7ns ± 2%  12.7ns ± 3%  -28.38%  (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8     17.8ns ± 1%  12.7ns ± 1%  -28.44%  (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8     20.0ns ± 1%  15.0ns ± 1%  -24.90%  (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8    32.6ns ± 1%  25.7ns ± 1%  -21.17%  (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8  36.8ns ± 2%  30.4ns ± 1%  -17.32%  (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8    92.1ns ± 2%  85.9ns ± 2%   -6.70%  (p=0.000 n=57+59)

Benchmarks on a real program (the compiler):

name       old time/op      new time/op      delta
Template        227ms ± 5%       221ms ± 2%  -2.48%  (p=0.000 n=30+26)
Unicode         102ms ± 5%       100ms ± 3%  -1.30%  (p=0.009 n=30+26)
GoTypes         656ms ± 5%       659ms ± 4%    ~     (p=0.208 n=30+30)
Compiler        2.82s ± 2%       2.82s ± 1%    ~     (p=0.614 n=29+27)
Flate           128ms ± 2%       128ms ± 5%    ~     (p=0.783 n=27+28)
GoParser        158ms ± 3%       158ms ± 3%    ~     (p=0.261 n=28+30)
Reflect         408ms ± 7%       401ms ± 3%    ~     (p=0.075 n=30+30)
Tar             123ms ± 6%       121ms ± 8%    ~     (p=0.287 n=29+30)
XML             220ms ± 2%       220ms ± 4%    ~     (p=0.805 n=29+29)

name       old user-ns/op   new user-ns/op   delta
Template   281user-ms ± 4%  279user-ms ± 3%  -0.87%  (p=0.044 n=28+28)
Unicode    142user-ms ± 4%  141user-ms ± 3%  -1.04%  (p=0.015 n=30+27)
GoTypes    884user-ms ± 3%  886user-ms ± 2%    ~     (p=0.532 n=30+30)
Compiler   3.94user-s ± 3%  3.92user-s ± 1%    ~     (p=0.185 n=30+28)
Flate      165user-ms ± 2%  165user-ms ± 4%    ~     (p=0.780 n=27+29)
GoParser   209user-ms ± 2%  208user-ms ± 3%    ~     (p=0.453 n=28+30)
Reflect    533user-ms ± 6%  526user-ms ± 3%    ~     (p=0.057 n=30+30)
Tar        156user-ms ± 6%  154user-ms ± 6%    ~     (p=0.133 n=29+30)
XML        288user-ms ± 4%  288user-ms ± 4%    ~     (p=0.633 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.9MB ± 0%  -0.11%  (p=0.000 n=29+29)
Unicode        32.6MB ± 0%      32.6MB ± 0%    ~     (p=0.572 n=29+30)
GoTypes         122MB ± 0%       122MB ± 0%  -0.10%  (p=0.000 n=30+30)
Compiler        482MB ± 0%       481MB ± 0%  -0.07%  (p=0.000 n=30+29)
Flate          26.6MB ± 0%      26.6MB ± 0%    ~     (p=0.096 n=30+30)
GoParser       32.7MB ± 0%      32.6MB ± 0%  -0.06%  (p=0.011 n=28+28)
Reflect        84.2MB ± 0%      84.1MB ± 0%  -0.17%  (p=0.000 n=29+30)
Tar            27.7MB ± 0%      27.7MB ± 0%  -0.05%  (p=0.032 n=27+28)
XML            44.7MB ± 0%      44.7MB ± 0%    ~     (p=0.131 n=28+30)

name       old allocs/op    new allocs/op    delta
Template         373k ± 1%        370k ± 1%  -0.76%  (p=0.000 n=30+30)
Unicode          325k ± 1%        325k ± 1%    ~     (p=0.383 n=29+30)
GoTypes         1.16M ± 0%       1.15M ± 0%  -0.75%  (p=0.000 n=29+30)
Compiler        4.15M ± 0%       4.13M ± 0%  -0.59%  (p=0.000 n=30+29)
Flate            238k ± 1%        237k ± 1%  -0.62%  (p=0.000 n=30+30)
GoParser         304k ± 1%        302k ± 1%  -0.64%  (p=0.000 n=30+28)
Reflect         1.00M ± 0%       0.99M ± 0%  -1.10%  (p=0.000 n=29+30)
Tar              245k ± 1%        244k ± 1%  -0.59%  (p=0.000 n=27+29)
XML              391k ± 1%        389k ± 1%  -0.59%  (p=0.000 n=29+30)

Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 19:23:33 +00:00
David Chase 2270133981 cmd/gc: allocate backing storage for non-escaping interfaces on stack
Extend escape analysis to convT2E and conT2I. If the interface value
does not escape supply runtime with a stack buffer for the object copy.

This is a straight port from .c to .go of Dmitry's patch

Change-Id: Ic315dd50d144d94dd3324227099c116be5ca70b6
Reviewed-on: https://go-review.googlesource.com/8201
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-03-30 16:11:22 +00:00
Josh Bleecher Snyder 25e793d7ea cmd/internal/gc, runtime: speed up some cases of _, ok := i.(T)
Some type assertions of the form _, ok := i.(T) allow efficient inlining.
Such type assertions commonly show up in type switches.
For example, with this optimization, using 6g, the length of
encoding/binary's intDataSize function shrinks from 2224 to 1728 bytes (-22%).

benchmark                    old ns/op     new ns/op     delta
BenchmarkAssertI2E2Blank     4.67          0.82          -82.44%
BenchmarkAssertE2T2Blank     4.38          0.83          -81.05%
BenchmarkAssertE2E2Blank     3.88          0.83          -78.61%
BenchmarkAssertE2E2          14.2          14.4          +1.41%
BenchmarkAssertE2T2          10.3          10.4          +0.97%
BenchmarkAssertI2E2          13.4          13.3          -0.75%

Change-Id: Ie9798c3e85432bb8e0f2c723afc376e233639df7
Reviewed-on: https://go-review.googlesource.com/7697
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-19 16:20:32 +00:00
Josh Bleecher Snyder 77a2113925 cmd/gc: evaluate concrete == interface without allocating
Consider an interface value i of type I and concrete value c of type C.

Prior to this CL, i==c was evaluated as
	I(c) == i

Evaluating I(c) can allocate.

This CL changes the evaluation of i==c to
	x, ok := i.(C); ok && x == c

The new generated code is shorter and does not allocate directly.

If C is small, as it is in every instance in the stdlib,
the new code also uses less stack space
and makes one runtime call instead of two.

If C is very large, the original implementation is used.
The cutoff for "very large" is 1<<16,
following the stack vs heap cutoff used elsewhere.

This kind of comparison occurs in 38 places in the stdlib,
mostly in the net and os packages.

benchmark                     old ns/op     new ns/op     delta
BenchmarkEqEfaceConcrete      29.5          7.92          -73.15%
BenchmarkEqIfaceConcrete      32.1          7.90          -75.39%
BenchmarkNeEfaceConcrete      29.9          7.90          -73.58%
BenchmarkNeIfaceConcrete      35.9          7.90          -77.99%

Fixes #9370.

Change-Id: I7c4555950bcd6406ee5c613be1f2128da2c9a2b7
Reviewed-on: https://go-review.googlesource.com/2096
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2015-02-12 22:23:38 +00:00
Russ Cox c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00