Commit Graph

1842 Commits

Author SHA1 Message Date
Dave Cheney 5c7ae10f66 runtime: merge 64bit lfstack impls
Merge all the 64bit lfstack impls into one file, adjust build tags to
match.

Merge all the comments on the various lfstack implementations for
posterity.

lfstack_amd64.go can probably be merged, but it is slightly different so
that will happen in a followup.

Change-Id: I5362d5e127daa81c9cb9d4fa8a0cc5c5e5c2707c
Reviewed-on: https://go-review.googlesource.com/21591
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 06:47:31 +00:00
Brad Fitzpatrick 8455f3a3d5 os: consolidate os{1,2}_*.go files
Change-Id: I463ca59f486b2842f67f151a55f530ee10663830
Reviewed-on: https://go-review.googlesource.com/21568
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-06 05:04:47 +00:00
Brad Fitzpatrick fd2bb1e30a runtime: rename os1_windows.go to os_windows.go
Change-Id: I11172f3d0e28f17b812e67a4db9cfe513b8e1974
Reviewed-on: https://go-review.googlesource.com/21565
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 04:26:01 +00:00
Brad Fitzpatrick e095f53e9b runtime: merge os{,2}_windows.go into os1_windows.go.
A future CL will rename os1_windows.go to os_windows.go.

Change-Id: I223e76002dd1e9c9d1798fb0beac02c7d3bf4812
Reviewed-on: https://go-review.googlesource.com/21564
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 04:25:44 +00:00
Michael Munday 0f08dd2183 runtime: add s390x support (modified files only)
Change-Id: Ib79ad4a890994ad64edb1feb79bd242d26b5b08a
Reviewed-on: https://go-review.googlesource.com/20945
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-06 04:25:06 +00:00
Shenghou Ma a2eded3421 runtime: get randomness from AT_RANDOM AUXV on linux/arm64
Fixes #15147.

Change-Id: Ibfe46c747dea987787a51eb0c95ccd8c5f24f366
Reviewed-on: https://go-review.googlesource.com/21580
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-06 04:23:37 +00:00
Brad Fitzpatrick 34c58065e5 runtime: rename os1_linux.go to os_linux.go
Change-Id: I938f61763c3256a876d62aeb54ef8c25cc4fc90e
Reviewed-on: https://go-review.googlesource.com/21567
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 03:55:38 +00:00
Brad Fitzpatrick 5103fbfdb2 runtime: merge os_linux.go into os1_linux.go
Change-Id: I791c47014fe69e8529c7b2f0b9a554e47902d46c
Reviewed-on: https://go-review.googlesource.com/21566
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 03:54:47 +00:00
Brad Fitzpatrick 8556c76f88 runtime: minor Windows cleanup
Change-Id: I9a8081ef1109469e9577c642156aa635188d8954
Reviewed-on: https://go-review.googlesource.com/21538
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2016-04-06 02:23:29 +00:00
David Chase d8c815d8b5 cmd/compile: note escape of parts of closured-capture vars
Missed a case for closure calls (OCALLFUNC && indirect) in
esc.go:esccall.

Cleanup to runtime code for windows to more thoroughly hide
a technical escape.  Also made code pickier about failing
to late non-optional kernel32.dll.

Fixes #14409.

Change-Id: Ie75486a2c8626c4583224e02e4872c2875f7bca5
Reviewed-on: https://go-review.googlesource.com/20102
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-05 18:10:09 +00:00
Dmitry Vyukov 475d113b53 runtime: don't burn CPU unnecessarily
Two GC-related functions, scang and casgstatus, wait in an active spin loop.
Active spinning is never a good idea in user-space. Once we wait several
times more than the expected wait time, something unexpected is happenning
(e.g. the thread we are waiting for is descheduled or handling a page fault)
and we need to yield to OS scheduler. Moreover, the expected wait time is
very high for these functions: scang wait time can be tens of milliseconds,
casgstatus can be hundreds of microseconds. It does not make sense to spin
even for that time.

go install -a std profile on a 4-core machine shows that 11% of time is spent
in the active spin in scang:

  6.12%    compile  compile                [.] runtime.scang
  3.27%    compile  compile                [.] runtime.readgstatus
  1.72%    compile  compile                [.] runtime/internal/atomic.Load

The active spin also increases tail latency in the case of the slightest
oversubscription: GC goroutines spend whole quantum in the loop instead of
executing user code.

Here is scang wait time histogram during go install -a std:

13707.0000 - 1815442.7667 [   118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎...
1815442.7667 - 3617178.5333 [     9]: ∎∎∎∎∎∎∎∎∎
3617178.5333 - 5418914.3000 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
5418914.3000 - 7220650.0667 [     5]: ∎∎∎∎∎
7220650.0667 - 9022385.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
9022385.8333 - 10824121.6000 [    13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
10824121.6000 - 12625857.3667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12625857.3667 - 14427593.1333 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
14427593.1333 - 16229328.9000 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16229328.9000 - 18031064.6667 [    32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
18031064.6667 - 19832800.4333 [    28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
19832800.4333 - 21634536.2000 [     6]: ∎∎∎∎∎∎
21634536.2000 - 23436271.9667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23436271.9667 - 25238007.7333 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
25238007.7333 - 27039743.5000 [    27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
27039743.5000 - 28841479.2667 [    20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28841479.2667 - 30643215.0333 [    10]: ∎∎∎∎∎∎∎∎∎∎
30643215.0333 - 32444950.8000 [     7]: ∎∎∎∎∎∎∎
32444950.8000 - 34246686.5667 [     4]: ∎∎∎∎
34246686.5667 - 36048422.3333 [     4]: ∎∎∎∎
36048422.3333 - 37850158.1000 [     1]: ∎
37850158.1000 - 39651893.8667 [     5]: ∎∎∎∎∎
39651893.8667 - 41453629.6333 [     2]: ∎∎
41453629.6333 - 43255365.4000 [     2]: ∎∎
43255365.4000 - 45057101.1667 [     2]: ∎∎
45057101.1667 - 46858836.9333 [     1]: ∎
46858836.9333 - 48660572.7000 [     2]: ∎∎
48660572.7000 - 50462308.4667 [     3]: ∎∎∎
50462308.4667 - 52264044.2333 [     2]: ∎∎
52264044.2333 - 54065780.0000 [     2]: ∎∎

and the zoomed-in first part:

13707.0000 - 19916.7667 [     2]: ∎∎
19916.7667 - 26126.5333 [     2]: ∎∎
26126.5333 - 32336.3000 [     9]: ∎∎∎∎∎∎∎∎∎
32336.3000 - 38546.0667 [     8]: ∎∎∎∎∎∎∎∎
38546.0667 - 44755.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
44755.8333 - 50965.6000 [    10]: ∎∎∎∎∎∎∎∎∎∎
50965.6000 - 57175.3667 [     5]: ∎∎∎∎∎
57175.3667 - 63385.1333 [     6]: ∎∎∎∎∎∎
63385.1333 - 69594.9000 [     5]: ∎∎∎∎∎
69594.9000 - 75804.6667 [     6]: ∎∎∎∎∎∎
75804.6667 - 82014.4333 [     6]: ∎∎∎∎∎∎
82014.4333 - 88224.2000 [     4]: ∎∎∎∎
88224.2000 - 94433.9667 [     1]: ∎
94433.9667 - 100643.7333 [     1]: ∎
100643.7333 - 106853.5000 [     2]: ∎∎
106853.5000 - 113063.2667 [     0]:
113063.2667 - 119273.0333 [     2]: ∎∎
119273.0333 - 125482.8000 [     2]: ∎∎
125482.8000 - 131692.5667 [     1]: ∎
131692.5667 - 137902.3333 [     1]: ∎
137902.3333 - 144112.1000 [     0]:
144112.1000 - 150321.8667 [     2]: ∎∎
150321.8667 - 156531.6333 [     1]: ∎
156531.6333 - 162741.4000 [     1]: ∎
162741.4000 - 168951.1667 [     0]:
168951.1667 - 175160.9333 [     0]:
175160.9333 - 181370.7000 [     1]: ∎
181370.7000 - 187580.4667 [     1]: ∎
187580.4667 - 193790.2333 [     2]: ∎∎
193790.2333 - 200000.0000 [     0]:

Here is casgstatus wait time histogram:

  631.0000 -  5276.6333 [     3]: ∎∎∎
 5276.6333 -  9922.2667 [     5]: ∎∎∎∎∎
 9922.2667 - 14567.9000 [     2]: ∎∎
14567.9000 - 19213.5333 [     6]: ∎∎∎∎∎∎
19213.5333 - 23859.1667 [     5]: ∎∎∎∎∎
23859.1667 - 28504.8000 [     6]: ∎∎∎∎∎∎
28504.8000 - 33150.4333 [     6]: ∎∎∎∎∎∎
33150.4333 - 37796.0667 [     2]: ∎∎
37796.0667 - 42441.7000 [     1]: ∎
42441.7000 - 47087.3333 [     3]: ∎∎∎
47087.3333 - 51732.9667 [     0]:
51732.9667 - 56378.6000 [     1]: ∎
56378.6000 - 61024.2333 [     0]:
61024.2333 - 65669.8667 [     0]:
65669.8667 - 70315.5000 [     0]:
70315.5000 - 74961.1333 [     1]: ∎
74961.1333 - 79606.7667 [     0]:
79606.7667 - 84252.4000 [     0]:
84252.4000 - 88898.0333 [     0]:
88898.0333 - 93543.6667 [     0]:
93543.6667 - 98189.3000 [     0]:
98189.3000 - 102834.9333 [     0]:
102834.9333 - 107480.5667 [     1]: ∎
107480.5667 - 112126.2000 [     0]:
112126.2000 - 116771.8333 [     0]:
116771.8333 - 121417.4667 [     0]:
121417.4667 - 126063.1000 [     0]:
126063.1000 - 130708.7333 [     0]:
130708.7333 - 135354.3667 [     0]:
135354.3667 - 140000.0000 [     1]: ∎

Ideally we eliminate the waiting by switching to async
state machine for GC, but for now just yield to OS scheduler
after a reasonable wait time.

To choose yielding parameters I've measured
golang.org/x/benchmarks/http tail latencies with different yield
delays and oversubscription levels.

With no oversubscription (to the degree possible):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.41ms ±15%  1.41ms ± 5%    ~     (p=0.611 n=13+12)
Latency-95   5.21ms ± 2%  5.15ms ± 2%  -1.15%  (p=0.012 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.54%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.2ms ±10%  -5.46%  (p=0.004 n=12+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.41ms ±15%  1.41ms ± 8%    ~     (p=0.511 n=13+13)
Latency-95   5.21ms ± 2%  5.14ms ± 2%  -1.23%  (p=0.006 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.94%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%  10.1ms ± 8%  -6.14%  (p=0.000 n=12+13)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.41ms ±15%  1.45ms ± 6%    ~     (p=0.724 n=13+13)
Latency-95   5.21ms ± 2%  5.18ms ± 1%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.64%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 5%  -6.72%  (p=0.000 n=12+13)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.41ms ±15%  1.51ms ± 7%  +6.57%  (p=0.002 n=13+13)
Latency-95   5.21ms ± 2%  5.21ms ± 2%    ~     (p=0.960 n=13+13)
Latency-99   7.16ms ± 2%  7.06ms ± 2%  -1.50%  (p=0.012 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 6%  -6.49%  (p=0.000 n=12+13)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.41ms ±15%  1.53ms ± 6%  +8.48%  (p=0.000 n=13+12)
Latency-95   5.21ms ± 2%  5.23ms ± 2%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.08ms ± 2%  -1.21%  (p=0.004 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 3%  -7.99%  (p=0.000 n=12+12)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.41ms ±15%  1.47ms ± 5%    ~     (p=0.072 n=13+13)
Latency-95   5.21ms ± 2%  5.17ms ± 2%    ~     (p=0.091 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.99%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 5%  -7.86%  (p=0.000 n=12+13)

With slight oversubscription (another instance of http benchmark
was running in background with reduced GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50    840µs ± 3%   804µs ± 3%  -4.37%  (p=0.000 n=15+18)
Latency-95   6.52ms ± 4%  6.03ms ± 4%  -7.51%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.0ms ± 4%  -7.33%  (p=0.000 n=18+14)
Latency-999  18.0ms ± 9%  16.8ms ± 7%  -6.84%  (p=0.000 n=18+18)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50    840µs ± 3%   809µs ± 3%  -3.71%  (p=0.000 n=15+17)
Latency-95   6.52ms ± 4%  6.11ms ± 4%  -6.29%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%   9.9ms ± 6%  -7.55%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±11%  -8.49%  (p=0.000 n=18+18)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50    840µs ± 3%   823µs ± 5%  -2.06%  (p=0.002 n=15+18)
Latency-95   6.52ms ± 4%  6.32ms ± 3%  -3.05%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -5.22%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.7ms ±10%  -7.09%  (p=0.000 n=18+18)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50    840µs ± 3%   836µs ± 5%    ~     (p=0.442 n=15+18)
Latency-95   6.52ms ± 4%  6.39ms ± 3%  -2.00%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 6%  -5.15%  (p=0.000 n=18+17)
Latency-999  18.0ms ± 9%  16.6ms ± 8%  -7.48%  (p=0.000 n=18+18)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50    840µs ± 3%   836µs ± 6%    ~     (p=0.401 n=15+18)
Latency-95   6.52ms ± 4%  6.40ms ± 4%  -1.79%  (p=0.010 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 5%  -4.95%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±14%  -8.17%  (p=0.000 n=18+18)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50    840µs ± 3%   828µs ± 2%  -1.49%  (p=0.001 n=15+17)
Latency-95   6.52ms ± 4%  6.38ms ± 4%  -2.04%  (p=0.001 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -4.77%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.9ms ± 9%  -6.23%  (p=0.000 n=18+18)

With significant oversubscription (background http benchmark
was running with full GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.32ms ±12%  1.30ms ±13%    ~     (p=0.454 n=14+14)
Latency-95   16.3ms ±10%  15.3ms ± 7%  -6.29%  (p=0.001 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 5%  -5.04%  (p=0.001 n=14+12)
Latency-999  49.9ms ±19%  45.9ms ± 5%  -8.00%  (p=0.008 n=14+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.32ms ±12%  1.29ms ± 9%    ~     (p=0.227 n=14+14)
Latency-95   16.3ms ±10%  15.4ms ± 5%  -5.27%  (p=0.002 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 6%  -5.16%  (p=0.001 n=14+14)
Latency-999  49.9ms ±19%  46.8ms ± 8%  -6.21%  (p=0.050 n=14+14)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.32ms ±12%  1.35ms ± 9%     ~     (p=0.401 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 4%   -7.67%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 5%   -6.98%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.7ms ± 5%  -10.56%  (p=0.000 n=14+11)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.32ms ±12%  1.36ms ±10%     ~     (p=0.246 n=14+14)
Latency-95   16.3ms ±10%  14.9ms ± 5%   -8.31%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 7%   -6.70%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.9ms ±15%  -10.13%  (p=0.003 n=14+14)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.32ms ±12%  1.41ms ± 9%  +6.37%  (p=0.008 n=14+13)
Latency-95   16.3ms ±10%  15.1ms ± 8%  -7.45%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.5ms ±12%  -6.67%  (p=0.002 n=14+14)
Latency-999  49.9ms ±19%  45.9ms ±16%  -8.06%  (p=0.019 n=14+14)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.32ms ±12%  1.42ms ±10%   +7.21%  (p=0.003 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 7%   -7.59%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.3ms ± 8%   -7.20%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.8ms ± 8%  -10.21%  (p=0.001 n=14+13)

All numbers are on 8 cores and with GOGC=10 (http benchmark has
tiny heap, few goroutines and low allocation rate, so by default
GC barely affects tail latency).

10us/5us yield delays seem to provide a reasonable compromise
and give 5-10% tail latency reduction. That's what used in this change.

go install -a std results on 4 core machine:

name      old time/op  new time/op  delta
Time       8.39s ± 2%   7.94s ± 2%  -5.34%  (p=0.000 n=47+49)
UserTime   24.6s ± 2%   22.9s ± 2%  -6.76%  (p=0.000 n=49+49)
SysTime    1.77s ± 9%   1.89s ±11%  +7.00%  (p=0.000 n=49+49)
CpuLoad    315ns ± 2%   313ns ± 1%  -0.59%  (p=0.000 n=49+48) # %CPU
MaxRSS    97.1ms ± 4%  97.5ms ± 9%    ~     (p=0.838 n=46+49) # bytes

Update #14396
Update #14189

Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2
Reviewed-on: https://go-review.googlesource.com/21503
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:52:03 +00:00
Dmitry Vyukov 3b246fa863 runtime: sleep less when we can do work
Usleep(100) in runqgrab negatively affects latency and throughput
of parallel application. We are sleeping instead of doing useful work.
This is effect is particularly visible on windows where minimal
sleep duration is 1-15ms.

Reduce sleep from 100us to 3us and use osyield on windows.
Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.

benchmark                    old ns/op     new ns/op     delta
BenchmarkChanSync-12         216           217           +0.46%
BenchmarkChanSyncWork-12     27213         25816         -5.13%

CPU consumption goes up from 106% to 108% in the first case,
and from 107% to 125% in the second case.

Test case from #14790 on windows:

BenchmarkDefaultResolution-8  4583372   29720    -99.35%
Benchmark1ms-8                992056    30701    -96.91%

99-th latency percentile for HTTP request serving is improved by up to 15%
(see http://golang.org/cl/20835 for details).

The following benchmarks are from the change that originally added this sleep
(see https://golang.org/s/go15gomaxprocs):

name        old time/op  new time/op  delta
Chain       22.6µs ± 2%  22.7µs ± 6%    ~      (p=0.905 n=9+10)
ChainBuf    22.4µs ± 3%  22.5µs ± 4%    ~      (p=0.780 n=9+10)
Chain-2     23.5µs ± 4%  24.9µs ± 1%  +5.66%   (p=0.000 n=10+9)
ChainBuf-2  23.7µs ± 1%  24.4µs ± 1%  +3.31%   (p=0.000 n=9+10)
Chain-4     24.2µs ± 2%  25.1µs ± 3%  +3.70%   (p=0.000 n=9+10)
ChainBuf-4  24.4µs ± 5%  25.0µs ± 2%  +2.37%  (p=0.023 n=10+10)
Powser       2.37s ± 1%   2.37s ± 1%    ~       (p=0.423 n=8+9)
Powser-2     2.48s ± 2%   2.57s ± 2%  +3.74%   (p=0.000 n=10+9)
Powser-4     2.66s ± 1%   2.75s ± 1%  +3.40%  (p=0.000 n=10+10)
Sieve        13.3s ± 2%   13.3s ± 2%    ~      (p=1.000 n=10+9)
Sieve-2      7.00s ± 2%   7.44s ±16%    ~      (p=0.408 n=8+10)
Sieve-4      4.13s ±21%   3.85s ±22%    ~       (p=0.113 n=9+9)

Fixes #14790

Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
Reviewed-on: https://go-review.googlesource.com/20835
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:32:06 +00:00
Alex Brainman ffeae198d0 runtime: leave directory before removing it in TestDLLPreloadMitigation
Fixes #15120

Change-Id: I1d9a192ac163826bad8b46e8c0b0b9e218e69570
Reviewed-on: https://go-review.googlesource.com/21520
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-05 04:32:49 +00:00
Alex Brainman fcac88098b runtime: remove race out of BenchmarkChanToSyscallPing1ms
Fixes #15119

Change-Id: I31445bf282a5e2a160ff4e66c5a592b989a5798f
Reviewed-on: https://go-review.googlesource.com/21448
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-05 00:52:40 +00:00
Austin Clements 61f56e925e runtime: fix pagesInUse accounting
When we grow the heap, we create a temporary "in use" span for the
memory acquired from the OS and then free that span to link it into
the heap. Hence, we (1) increase pagesInUse when we make the temporary
span so that (2) freeing the span will correctly decrease it.

However, currently step (1) increases pagesInUse by the number of
pages requested from the heap, while step (2) decreases it by the
number of pages requested from the OS (the size of the temporary
span). These aren't necessarily the same, since we round up the number
of pages we request from the OS, so steps 1 and 2 don't necessarily
cancel out like they're supposed to. Over time, this can add up and
cause pagesInUse to underflow and wrap around to 2^64. The garbage
collector computes the sweep ratio from this, so if this happens, the
sweep ratio becomes effectively infinite, causing the first allocation
on each P in a sweep cycle to sweep the entire heap. This makes
sweeping effectively STW.

Fix this by increasing pagesInUse in step 1 by the number of pages
requested from the OS, so that the two steps correctly cancel out. We
add a test that checks that the running total matches the actual state
of the heap.

Fixes #15022. For 1.6.x.

Change-Id: Iefd9d6abe37d0d447cbdbdf9941662e4f18eeffc
Reviewed-on: https://go-review.googlesource.com/21280
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-04-04 15:33:26 +00:00
Alex Brainman 1f5b1b2b66 runtime: change osyield to use Windows SwitchToThread
It appears that windows osyield is just 15ms sleep on my computer
(see benchmarks below). Replace NtWaitForSingleObject in osyield
with SwitchToThread (as suggested by Dmitry).

Also add issue #14790 related benchmarks, so we can track perfomance
changes in CL 20834 and CL 20835 and beyond.

Update #14790

benchmark                             old ns/op     new ns/op     delta
BenchmarkChanToSyscallPing1ms         1953200       1953000       -0.01%
BenchmarkChanToSyscallPing15ms        31562904      31248400      -1.00%
BenchmarkSyscallToSyscallPing1ms      5247          4202          -19.92%
BenchmarkSyscallToSyscallPing15ms     5260          4374          -16.84%
BenchmarkChanToChanPing1ms            474           494           +4.22%
BenchmarkChanToChanPing15ms           468           489           +4.49%
BenchmarkOsYield1ms                   980018        75.5          -99.99%
BenchmarkOsYield15ms                  15625200      75.8          -100.00%

Change-Id: I1b4cc7caca784e2548ee3c846ca07ef152ebedce
Reviewed-on: https://go-review.googlesource.com/21294
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-04 10:05:05 +00:00
Christopher Nelson ed8f0e5c33 cmd/go: fix -buildmode=c-archive should work on windows
Add supporting code for runtime initialization, including both
32- and 64-bit x86 architectures.

Add .ctors section on Windows to PE .o files, and INITENTRY to .ctors
section to plug in to the GCC C/C++ startup initialization mechanism.
This allows the Go runtime to initialize itself. Add .text section
symbol for .ctor relocations. Note: This is unlikely to be useful for
MSVC-based toolchains.

Fixes #13494

Change-Id: I4286a96f70e5f5228acae88eef46e2bed95813f3
Reviewed-on: https://go-review.googlesource.com/18057
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-04-04 03:38:25 +00:00
Eric Engestrom 7a8caf7d43 all: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>

Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7
Reviewed-on: https://go-review.googlesource.com/21433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-03 17:03:15 +00:00
Brad Fitzpatrick 683448a304 runtime, syscall: only search for Windows DLLs in the System32 directory
Make sure that for any DLL that Go uses itself, we only look for the
DLL in the Windows System32 directory, guarding against DLL preloading
attacks.

(Unless the Windows version is ancient and LoadLibraryEx is
unavailable, in which case the user probably has bigger security
problems anyway.)

This does not change the behavior of syscall.LoadLibrary or NewLazyDLL
if the DLL name is something unused by Go itself.

This change also intentionally does not add any new API surface. Instead,
x/sys is updated with a LoadLibraryEx function and LazyDLL.Flags in:
    https://golang.org/cl/21388

Updates #14959

Change-Id: I8d29200559cc19edf8dcf41dbdd39a389cd6aeb9
Reviewed-on: https://go-review.googlesource.com/21140
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-01 22:55:36 +00:00
Ian Lance Taylor 59fc42b230 runtime: allocate mp.cgocallers earlier
Fixes #15061.

Change-Id: I71f69f398d1c5f3a884bbd044786f1a5600d0fae
Reviewed-on: https://go-review.googlesource.com/21398
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-01 22:23:13 +00:00
Ian Lance Taylor 58394fd7d5 runtime/cgo: only build _cgo_callers if x_cgo_callers is defined
Fixes a problem when using the external linker on Solaris.  The Solaris
external linker still doesn't work due to issue #14957.

The problem is, for example, with `go test cmd/objdump`:

        objdump_test.go:71: go build fmthello.go: exit status 2
                # command-line-arguments
                /var/gcc/iant/go/pkg/tool/solaris_amd64/link: running gcc failed: exit status 1
                Undefined                       first referenced
                 symbol                             in file
                x_cgo_callers                       /tmp/go-link-355600608/go.o
                ld: fatal: symbol referencing errors
                collect2: error: ld returned 1 exit status

Change-Id: I54917cfd5c288ee77ea25c439489bd2c9124fe73
Reviewed-on: https://go-review.googlesource.com/21392
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-04-01 16:13:45 +00:00
Ian Lance Taylor ea306ae625 runtime: support symbolic backtrace of C code in a cgo crash
The new function runtime.SetCgoTraceback may be used to register stack
traceback and symbolizer functions, written in C, to do a stack
traceback from cgo code.

There is a sample implementation of runtime.SetCgoSymbolizer at
github.com/ianlancetaylor/cgosymbolizer.  Just importing that package is
sufficient to get symbolic C backtraces.

Currently only supported on linux/amd64.

Change-Id: If96ee2eb41c6c7379d407b9561b87557bfe47341
Reviewed-on: https://go-review.googlesource.com/17761
Reviewed-by: Austin Clements <austin@google.com>
2016-04-01 04:13:44 +00:00
Matthew Dempsky e6066711a0 cmd/compile, runtime: fix pedantic int->string conversions
Previously, cmd/compile rejected constant int->string conversions if
the integer value did not fit into an "int" value. Also, runtime
incorrectly truncated 64-bit values to 32-bit before checking if
they're a valid Unicode code point. According to the Go spec, both of
these cases should instead yield "\uFFFD".

Fixes #15039.

Change-Id: I3c8a3ad9a0780c0a8dc1911386a523800fec9764
Reviewed-on: https://go-review.googlesource.com/21344
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-31 10:28:23 +00:00
Keith Randall 4b209dbf0b runtime: don't use REP;MOVSB if CPUID doesn't say it is fast
Only use REP;MOVSB if:
 1) The CPUID flag says it is fast, and
 2) The pointers are unaligned
Otherwise, use REP;MOVSQ.

Update #14630

Change-Id: I946b28b87880c08e5eed1ce2945016466c89db66
Reviewed-on: https://go-review.googlesource.com/21300
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2016-03-31 02:54:10 +00:00
Austin Clements 17f6e5396b runtime: print sweep ratio if gcpacertrace>0
Change-Id: I5217bf4b75e110ca2946e1abecac6310ed84dad5
Reviewed-on: https://go-review.googlesource.com/21205
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-30 02:27:58 +00:00
Michel Lespinasse 859b63cc09 cmd/compile: optimize remaining convT2I calls
See #14874
Updates #6853

This change adds a compiler optimization for non pointer shaped convT2I.
Since itab symbols are now emitted by the compiler, the itab address can
be passed directly to convT2I instead of passing the iface type and a
cache pointer argument.

Compilebench results for the 5-commits series ending here:

name       old time/op     new time/op     delta
Template       336ms ± 4%      344ms ± 4%   +2.61%          (p=0.027 n=9+8)
Unicode        165ms ± 6%      173ms ± 7%   +5.11%          (p=0.014 n=9+9)
GoTypes        1.09s ± 1%      1.06s ± 2%   -3.29%          (p=0.000 n=9+9)
Compiler       5.09s ±10%      4.75s ±10%   -6.64%        (p=0.011 n=10+10)
MakeBash       31.1s ± 5%      30.3s ± 3%     ~           (p=0.089 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       558k ± 0%       558k ± 0%   +0.02%        (p=0.000 n=10+10)
CmdGoSize      6.24M ± 0%      6.11M ± 0%   -2.11%        (p=0.000 n=10+10)

name       old data-bytes  new data-bytes  delta
HelloSize      3.66k ± 0%      3.74k ± 0%   +2.41%        (p=0.000 n=10+10)
CmdGoSize       134k ± 0%       162k ± 0%  +20.76%        (p=0.000 n=10+10)

name       old bss-bytes   new bss-bytes   delta
HelloSize       126k ± 0%       126k ± 0%     ~     (all samples are equal)
CmdGoSize       149k ± 0%       146k ± 0%   -2.17%        (p=0.000 n=10+10)

name       old exe-bytes   new exe-bytes   delta
HelloSize       924k ± 0%       924k ± 0%   +0.05%        (p=0.000 n=10+10)
CmdGoSize      9.77M ± 0%      9.62M ± 0%   -1.47%        (p=0.000 n=10+10)

Change-Id: Ib230ddc04988824035c32287ae544a965fedd344
Reviewed-on: https://go-review.googlesource.com/20902
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:21:50 +00:00
Michel Lespinasse 79688ca58f cmd/link: collect itablinks as a slice in moduledata
See #14874

This change tells the linker to collect all the itablink symbols and
collect them so that moduledata can have a slice of all compiler
generated itabs.

The logic is shamelessly adapted from what is done with typelink symbols.

Change-Id: Ie93b59acf0fcba908a876d506afbf796f222dbac
Reviewed-on: https://go-review.googlesource.com/20889
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-29 02:18:56 +00:00
Michel Lespinasse f00bbd5f81 cmd/compile: emit itabs and itablinks
See #14874

This change tells the compiler to emit itab and itablink symbols in
situations where they could be useful; however the compiled code does
not actually make use of the new symbols yet.

Change-Id: I0db3e6ec0cb1f3b7cebd4c60229e4a48372fe586
Reviewed-on: https://go-review.googlesource.com/20888
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:18:03 +00:00
Michel Lespinasse 7043d2bb5e runtime: insert itabs into hash table during init
See #14874

This change makes the runtime register all compiler generated itabs
(as obtained from the moduledata) during init.

Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf
Reviewed-on: https://go-review.googlesource.com/20900
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:14:49 +00:00
Matthew Dempsky deb83d0639 cmd/compile: remove unused write barrier helpers
These have been unused since CL 10316.

Passes toolstash -cmp.

Change-Id: Icc19f3fcc7275fbee1c665f704e10a110ecce2a5
Reviewed-on: https://go-review.googlesource.com/21242
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-03-29 00:16:55 +00:00
Shinji Tanaka 0f86d1edfb runtime: use set_thread_area instead of modify_ldt on linux/386
linux/386 depends on modify_ldt system call, but recent Linux kernels
can disable this system call. Any Go programs built as linux/386
crash with the message 'Trace/breakpoint trap'.

The kernel config CONFIG_MODIFY_LDT_SYSCALL, which control
enable/disable modify_ldt, is disabled on Amazon Linux 2016.03.

This fixes this problem by using set_thread_area instead of modify_ldt
on linux/386.

Fixes #14795.

Change-Id: I0cc5139e40e9e5591945164156a77b6bdff2c7f1
Reviewed-on: https://go-review.googlesource.com/21190
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-28 16:56:38 +00:00
David Chase 8eec2bbfbc cmd/compile: added some intrinsics to SSA back end
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have.  I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.

These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )

All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.

Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 16:29:59 +00:00
Matthew Dempsky 995fb0319e cmd/compile: fix stringtoslicebytetmp optimization
Fixes #14973.

Change-Id: Iea68c9deca9429bde465c9ae05639209fe0ccf72
Reviewed-on: https://go-review.googlesource.com/21175
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-27 19:12:37 +00:00
Shenghou Ma a7f7a9cca7 runtime, runtime/cgo: save callee-saved FP registers on arm64
For #14876.

Change-Id: I0992859264cbaf9c9b691fad53345bbb01b4cf3b
Reviewed-on: https://go-review.googlesource.com/21085
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-25 23:04:44 +00:00
Shenghou Ma 71916437fb runtime/cgo: save callee-saved xmm registers on windows/amd64
For #14876.

Change-Id: I33947f74e8058437a784862f1f064974afc99250
Reviewed-on: https://go-review.googlesource.com/21084
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 23:04:33 +00:00
Joe Sylve 562e38c0af runtime: fix signal handling on Solaris
This fixes the problems with signal handling that were inadvertently
introduced in https://go-review.googlesource.com/21006.

Fixes #14899

Change-Id: Ia746914dcb3146a52413d32c57b089af763f0810
Reviewed-on: https://go-review.googlesource.com/21145
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 21:38:47 +00:00
Marvin Stenger 6b0688f742 runtime: speed up growslice by avoiding divisions 2
This is a follow-up of https://go-review.googlesource.com/#/c/20653/

Special case computation for slices with elements of byte size or
pointer size.

name                      old time/op  new time/op  delta
GrowSliceBytes-4          86.2ns ± 3%  75.4ns ± 2%  -12.50%  (p=0.000 n=20+20)
GrowSliceInts-4            161ns ± 3%   136ns ± 3%  -15.59%  (p=0.000 n=19+19)
GrowSlicePtr-4             239ns ± 2%   233ns ± 2%   -2.52%  (p=0.000 n=20+20)
GrowSliceStruct24Bytes-4   258ns ± 3%   256ns ± 3%     ~     (p=0.134 n=20+20)

Change-Id: Ice5fa648058fe9d7fa89dee97ca359966f671128
Reviewed-on: https://go-review.googlesource.com/21101
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-25 16:47:56 +00:00
Richard Miller 967b9940b4 runtime: avoid fork/exit race in plan9
There's a race between runtime.goexitsall killing all OS processes
of a go program in order to exit, and runtime.newosproc forking a
new one.  If the new process has been created but not yet stored
its pid in m.procid, it will not be killed by goexitsall and
deadlock results.

This CL prevents the race by making the newly forked process
check whether the program is exiting.  It also prevents a
potential "shoot-out" if multiple goroutines call Exit at
the same time, which could possibly lead to two processes
killing each other and leaving the rest deadlocked.

Change-Id: I3170b4a62d2461f6b029b3d6aad70373714ed53e
Reviewed-on: https://go-review.googlesource.com/21135
Run-TryBot: David du Colombier <0intro@gmail.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
2016-03-25 15:04:45 +00:00
Dmitry Vyukov ea0386f85f runtime: improve randomized stealing logic
During random stealing we steal 4*GOMAXPROCS times from random procs.
One would expect that most of the time we check all procs this way,
but due to low quality PRNG we actually miss procs with frightening
probability. Below are modelling experiment results for 1e6 tries:

GOMAXPROCS = 2 : missed 1 procs 7944 times

GOMAXPROCS = 3 : missed 1 procs 101620 times
GOMAXPROCS = 3 : missed 2 procs 3571 times

GOMAXPROCS = 4 : missed 1 procs 63916 times
GOMAXPROCS = 4 : missed 2 procs 61 times
GOMAXPROCS = 4 : missed 3 procs 16 times

GOMAXPROCS = 5 : missed 1 procs 133136 times
GOMAXPROCS = 5 : missed 2 procs 1025 times
GOMAXPROCS = 5 : missed 3 procs 101 times
GOMAXPROCS = 5 : missed 4 procs 15 times

GOMAXPROCS = 8 : missed 1 procs 151765 times
GOMAXPROCS = 8 : missed 2 procs 5057 times
GOMAXPROCS = 8 : missed 3 procs 1726 times
GOMAXPROCS = 8 : missed 4 procs 68 times

GOMAXPROCS = 12 : missed 1 procs 199081 times
GOMAXPROCS = 12 : missed 2 procs 27489 times
GOMAXPROCS = 12 : missed 3 procs 3113 times
GOMAXPROCS = 12 : missed 4 procs 233 times
GOMAXPROCS = 12 : missed 5 procs 9 times

GOMAXPROCS = 16 : missed 1 procs 237477 times
GOMAXPROCS = 16 : missed 2 procs 30037 times
GOMAXPROCS = 16 : missed 3 procs 9466 times
GOMAXPROCS = 16 : missed 4 procs 1334 times
GOMAXPROCS = 16 : missed 5 procs 192 times
GOMAXPROCS = 16 : missed 6 procs 5 times
GOMAXPROCS = 16 : missed 7 procs 1 times
GOMAXPROCS = 16 : missed 8 procs 1 times

A missed proc won't lead to underutilization because we check all procs
again after dropping P. But it can lead to an unpleasant situation
when we miss a proc, drop P, check all procs, discover work, acquire P,
miss the proc again, repeat.

Improve stealing logic to cover all procs.
Also don't enter spinning mode and try to steal when there is nobody around.

Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
Reviewed-on: https://go-review.googlesource.com/20836
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2016-03-25 11:00:48 +00:00
David Crawshaw 24ce64d1a9 cmd/compile, runtime: new static name encoding
Create a byte encoding designed for static Go names.

It is intended to be a compact representation of a name
and optional tag data that can be turned into a Go string
without allocating, and describes whether or not it is
exported without unicode table.

The encoding is described in reflect/type.go:

// The first byte is a bit field containing:
//
//	1<<0 the name is exported
//	1<<1 tag data follows the name
//	1<<2 pkgPath *string follow the name and tag
//
// The next two bytes are the data length:
//
//	 l := uint16(data[1])<<8 | uint16(data[2])
//
// Bytes [3:3+l] are the string data.
//
// If tag data follows then bytes 3+l and 3+l+1 are the tag length,
// with the data following.
//
// If the import path follows, then ptrSize bytes at the end of
// the data form a *string. The import path is only set for concrete
// methods that are defined in a different package than their type.

Shrinks binary sizes:

	cmd/go: 164KB (1.6%)
	jujud:  1.0MB (1.5%)

For #6853.

Change-Id: I46b6591015b17936a443c9efb5009de8dfe8b609
Reviewed-on: https://go-review.googlesource.com/20968
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 00:13:49 +00:00
Joe Sylve df2b2eb63d runtime: improve last ditch signal forwarding for Unix libraries
The current runtime attempts to forward signals generated by non-Go
code to the original signal handler.  If it can't call the original
handler directly, it currently attempts to re-raise the signal after
resetting the handler.  In this case, the original context is lost.

This fix prevents that problem by simply returning from the go signal
handler after resetting the original handler.  It only does this when
the original handler is the system default handler, which in all cases
is known to not recover.  The signal is not reset, so it is retriggered
and the original handler takes over with the proper context.

Fixes #14899

Change-Id: Ib1c19dfa4b50d9732d7a453de3784c8141e1cbb3
Reviewed-on: https://go-review.googlesource.com/21006
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-24 19:34:17 +00:00
Marvin Stenger 44532f1a9d runtime: fix inconsistency in slice.go
Fixes #14938.

Additionally some simplifications along the way.

Change-Id: I2c5fb7e32dcc6fab68fff36a49cb72e715756abe
Reviewed-on: https://go-review.googlesource.com/21046
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-03-24 18:17:28 +00:00
Elias Naur be10c51500 runtime/cgo: block signals to the iOS mach exception handler
For darwin/arm{,64} a non-Go thread is created to convert
EXC_BAD_ACCESS to panics. However, the Go signal handler refuse to
handle signals that would otherwise be ignored if they arrive at
non-Go threads.

Block all (posix) signals to that thread, making sure that
no unexpected signals arrive to it. At least one test, TestStop in
os/signal, depends on signals not arriving on any non-Go threads.

For #14318

Change-Id: I901467fb53bdadb0d03b0f1a537116c7f4754423
Reviewed-on: https://go-review.googlesource.com/21047
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-03-24 14:12:12 +00:00
Lynn Boger baec148767 bytes: Equal perf improvements on ppc64le/ppc64
The existing implementation for Equal and similar
functions in the bytes package operate on one byte at
at time.  This performs poorly on ppc64/ppc64le especially
when the byte buffers are large.  This change improves
those functions by loading and comparing double words where
possible.  The common code has been moved to a function
that can be shared by the other functions in this
file which perform the same type of comparison.
Further optimizations are done for the case where
>= 32 bytes are being compared.  The new function
memeqbody is used by memeq_varlen, Equal, and eqstring.

When running the bytes test with -test.bench=Equal

benchmark                     old MB/s     new MB/s     speedup
BenchmarkEqual1               164.83       129.49       0.79x
BenchmarkEqual6               563.51       445.47       0.79x
BenchmarkEqual9               656.15       1099.00      1.67x
BenchmarkEqual15              591.93       1024.30      1.73x
BenchmarkEqual16              613.25       1914.12      3.12x
BenchmarkEqual20              682.37       1687.04      2.47x
BenchmarkEqual32              807.96       3843.29      4.76x
BenchmarkEqual4K              1076.25      23280.51     21.63x
BenchmarkEqual4M              1079.30      13120.14     12.16x
BenchmarkEqual64M             1073.28      10876.92     10.13x

It was determined that the degradation in the smaller byte tests
were due to unfavorable code alignment of the single byte loop.

Fixes #14368

Change-Id: I0dd87382c28887c70f4fbe80877a8ba03c31d7cd
Reviewed-on: https://go-review.googlesource.com/20249
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-23 14:21:15 +00:00
Keith Randall 6a33f7765f runtime: use MOVSB instead of MOVSQ for unaligned moves
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).

benchmark                              old ns/op     new ns/op     delta
BenchmarkMemmove4096-8                 93.9          93.2          -0.75%
BenchmarkMemmoveUnalignedDst4096-8     256           151           -41.02%
BenchmarkMemmoveUnalignedSrc4096-8     175           90.5          -48.29%

Fixes #14630

Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-21 19:10:24 +00:00
Michael Munday cc17d1ba76 runtime/internal/sys: add s390x support
Change-Id: I928532b406a3457d2c5f75f4de7d46a3f795192e
Reviewed-on: https://go-review.googlesource.com/20939
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-21 08:04:38 +00:00
Keith Randall 377eaa7494 runtime: add space
Missed this in review of 20812

Change-Id: I01e220499dcd58e1a7205e2a577dd9630a8b7174
Reviewed-on: https://go-review.googlesource.com/20819
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 21:31:49 +00:00
Keith Randall 15ea61146e runtime: use unaligned loads on ppc64
benchmark                      old ns/op     new ns/op     delta
BenchmarkAlignedLoad-160       8.67          7.42          -14.42%
BenchmarkUnalignedLoad-160     8.63          7.37          -14.60%

Change-Id: Id4609d7b4038c4d2ec332efc4fe6f1adfb61b82b
Reviewed-on: https://go-review.googlesource.com/20812
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 19:21:53 +00:00
Marcel van Lohuizen 872ca73cad runtime: don't assume b.N > 0
Change-Id: I2e26717f2563d7633ffd15f4adf63c3d0ee3403f
Reviewed-on: https://go-review.googlesource.com/20856
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-03-18 15:55:02 +00:00
Austin Clements 857d0b48db runtime: document sudog
Change-Id: I85c0bcf02842cc32dbc9bfdcea27efe871173574
Reviewed-on: https://go-review.googlesource.com/20774
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-17 18:57:10 +00:00