mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Rick Hudson	913db7685e	runtime: run background mark helpers only if work is available Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:57:50 +00:00
Austin Clements	f0dd002895	runtime: use separate count and note for forEachP Currently, forEachP reuses the stopwait and stopnote fields from stopTheWorld to track how many Ps have not responded to the safe-point request and to sleep until all Ps have responded. It was assumed this was safe because both stopTheWorld and forEachP must occur under the worlsema and hence stopwait and stopnote cannot be used for both purposes simultaneously and callers could always determine the appropriate use based on sched.gcwaiting (which is only set by stopTheWorld). However, this is not the case, since it's possible for there to be a window between when an M observes that gcwaiting is set and when it checks stopwait during which stopwait could have changed meanings. When this happens, the M decrements stopwait and may wakeup stopnote, but does not otherwise participate in the forEachP protocol. As a result, stopwait is decremented too many times, so it may reach zero before all Ps have run the safe-point function, causing forEachP to wake up early. It will then either observe that some P has not run the safe-point function and panic with "P did not run fn", or the remaining P (or Ps) will run the safe-point function before it wakes up and it will observe that stopwait is negative and panic with "not stopped". Fix this problem by giving forEachP its own safePointWait and safePointNote fields. One known sequence of events that can cause this race is as follows. It involves three actors: G1 is running on M1 on P1. P1 has an empty run queue. G2/M2 is in a blocked syscall and has lost its P. (The details of this don't matter, it just needs to be in a position where it needs to grab an idle P.) GC just started on G3/M3/P3. (These aren't very involved, they just have to be separate from the other G's, M's, and P's.) 1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1. Now G1/M1 begins to enter a syscall: 2. G1/M1 invokes reentersyscall, which sets the P1's status to _Psyscall. 3. G1/M1's reentersyscall observes gcwaiting != 0 and calls entersyscall_gcwait. 4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock. Back on GC: 5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and returns. 6. GC does stuff and then calls startTheWorld(). 7. startTheWorld() calls procresize(), which sets P1's status to _Pidle and puts P1 on the idle list. Now G2/M2 returns from its syscall and takes over P1: 8. G2/M2 returns from its blocked syscall and gets P1 from the idle list. 9. G2/M2 acquires P1, which sets P1's status to _Prunning. 10. G2/M2 starts a new syscall and invokes reentersyscall, which sets P1's status to _Psyscall. Back on G1/M1: 11. G1/M1 finally acquires sched.lock in entersyscall_gcwait. At this point, G1/M1 still thinks it's running on P1. P1's status is _Psyscall, which is consistent with what G1/M1 is doing, but it's _Psyscall because G2/M2 put it in to _Psyscall, not G1/M1. This is basically an ABA race on P1's status. Because forEachP currently shares stopwait with stopTheWorld. G1/M1's entersyscall_gcwait observes the non-zero stopwait set by forEachP, but mistakes it for a stopTheWorld. It cas's P1's status from _Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement stopwait one more time than forEachP was expecting. Fixes #10618. (See the issue for details on why the above race is safe when forEachP is not involved.) Prior to this commit, the command stress ./runtime.test -test.run TestFutexsleep\\|TestGoroutineProfile would reliably fail after a few hundred runs. With this commit, it ran for over 2 million runs and never crashed. Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30 Reviewed-on: https://go-review.googlesource.com/10157 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:47 +00:00
Austin Clements	277acca286	runtime: hold worldsema while starting the world Currently, startTheWorld releases worldsema before starting the world. Since startTheWorld can change gomaxprocs after allowing Ps to run, this means that gomaxprocs can change while another P holds worldsema. Unfortunately, the garbage collector and forEachP assume that holding worldsema protects against changes in gomaxprocs (which it almost does). In particular, this is causing somewhat frequent "P did not run fn" crashes in forEachP in the runtime tests because gomaxprocs is changing between the several loops that forEachP does over all the Ps. Fix this by only releasing worldsema after the world is started. This relates to issue #10618. forEachP still fails under stress testing, but much less frequently. Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff Reviewed-on: https://go-review.googlesource.com/10156 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:37 +00:00
Austin Clements	9c44a41dd5	runtime: disallow preemption during startTheWorld Currently, startTheWorld clears preemptoff for the current M before starting the world. A few callers increment m.locks around startTheWorld, presumably to prevent preemption any time during starting the world. This is almost certainly pointless (none of the other callers do this), but there's no harm in making startTheWorld keep preemption disabled until it's all done, which definitely lets us drop these m.locks manipulations. Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a Reviewed-on: https://go-review.googlesource.com/10155 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:31 +00:00
Austin Clements	a1da255aa0	runtime: factor stoptheworld/starttheworld pattern There are several steps to stopping and starting the world and currently they're open-coded in several places. The garbage collector is the only thing that needs to stop and start the world in a non-trivial pattern. Replace all other uses with calls to higher-level functions that implement the entire pattern necessary to stop and start the world. This is a pure refectoring and should not change any code semantics. In the following commits, we'll make changes that are easier to do with this abstraction in place. This commit renames the old starttheworld to startTheWorldWithSema. This is a slight misnomer right now because the callers release worldsema just before calling this. However, a later commit will swap these and I don't want to think of another name in the mean time. Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911 Reviewed-on: https://go-review.googlesource.com/10154 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:25 +00:00
Austin Clements	a0fc306023	runtime: eliminate runqvictims and a copy from runqsteal Currently, runqsteal steals Gs from another P into an intermediate buffer and then copies those Gs into the current P's run queue. This intermediate buffer itself was moved from the stack to the P in commit `c4fe503` to eliminate the cost of zeroing it on every steal. This commit follows up `c4fe503` by stealing directly into the current P's run queue, which eliminates the copy and the need for the intermediate buffer. The update to the tail pointer is only committed once the entire steal operation has succeeded, so the semantics of stealing do not change. Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51 Reviewed-on: https://go-review.googlesource.com/9998 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-17 01:08:42 +00:00
Rick Hudson	c4fe503119	runtime: reduce thrashing of gs between ps One important use case is a pipeline computation that pass values from one Goroutine to the next and then exits or is placed in a wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable another Goroutine and then immediately make P1 available to execute it. We need to prevent other Ps from stealing the G that P1 is about to execute. Otherwise the Gs can thrash between Ps causing unneeded synchronization and slowing down throughput. Fix this by changing the stealing logic so that when a P attempts to steal the only G on some other P's run queue, it will pause momentarily to allow the victim P to schedule the G. As part of optimizing stealing we also use a per P victim queue move stolen gs. This eliminates the zeroing of a stack local victim queue which turned out to be expensive. This CL is a necessary but not sufficient prerequisite to changing the default value of GOMAXPROCS to something > 1 which is another CL/discussion. For highly serialized programs, such as GoroutineRing below this can make a large difference. For larger and more parallel programs such as the x/benchmarks there is no noticeable detriment. ~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt name old mean new mean delta GoroutineRing 30.2µs × (0.98,1.01) 30.1µs × (0.97,1.04) ~ (p=0.941) GoroutineRing-2 113µs × (0.91,1.07) 30µs × (0.98,1.03) -73.17% (p=0.004) GoroutineRing-4 144µs × (0.98,1.02) 32µs × (0.98,1.01) -77.69% (p=0.000) GoroutineRingBuf 32.7µs × (0.97,1.03) 32.5µs × (0.97,1.02) ~ (p=0.795) GoroutineRingBuf-2 120µs × (0.92,1.08) 33µs × (1.00,1.00) -72.48% (p=0.004) GoroutineRingBuf-4 138µs × (0.92,1.06) 33µs × (1.00,1.00) -76.21% (p=0.003) The bench benchmarks show little impact. old new garbage 7032879 7011696 httpold 25509 25301 splayold 1022073 1019499 jsonold 28230624 28081433 Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50 Reviewed-on: https://go-review.googlesource.com/9872 Reviewed-by: Austin Clements <austin@google.com>	2015-05-13 12:55:24 +00:00
Austin Clements	350fd548b3	runtime: don't run runq tests on the system stack Running these tests on the system stack is problematic because they allocate Ps, which are large enough to overflow the system stack if they are stack-allocated. It used to be necessary to run these tests on the system stack because they were written in C, but since this is no longer the case, we can fix this problem by simply not running the tests on the system stack. This also means we no longer need the hack in one of these tests that forces the allocated Ps to escape to the heap, so eliminate that as well. Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57 Reviewed-on: https://go-review.googlesource.com/9923 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-12 19:58:08 +00:00
Russ Cox	1635ab7dfe	runtime: remove wbshadow mode The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:55:11 +00:00
Daniel Morsing	db6f88a84b	runtime: enable profiling on g0 Since we now have stack information for code running on the systemstack, we can traceback over it. To make cpu profiles useful, add a case in gentraceback to jump over systemstack switches. Fixes #10609. Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7 Reviewed-on: https://go-review.googlesource.com/9506 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-05-11 08:44:30 +00:00
Michael Hudson-Doyle	fa896733b5	runtime: check consistency of all module data objects Current code just checks the consistency (that the functab is correctly sorted by PC, etc) of the moduledata object that the runtime belongs to. Change to check all of them. Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75 Reviewed-on: https://go-review.googlesource.com/9773 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-05-07 15:06:08 +00:00
Keith Randall	5a828cfcde	runtime: let freezetheworld work even when gomaxprocs=1 Freezetheworld still has stuff to do when gomaxprocs=1. In particular, signals can come in on other Ms (like the GC M, say) and the single user M is still running. Fixes #10546 Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7 Reviewed-on: https://go-review.googlesource.com/9551 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-05-05 15:11:10 +00:00
Keith Randall	a55b131393	cmd/dist, runtime: Make stack guard larger for non-optimized builds Kind of a hack, but makes the non-optimized builds pass. Fixes #10079 Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff Reviewed-on: https://go-review.googlesource.com/9552 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 15:41:55 +00:00
David Chase	7fbb1b36c3	cmd/internal/gc: improve flow of input params to output params This includes the following information in the per-function summary: outK = paramJ encoded in outK bits for paramJ outK = paramJ encoded in outK bits for paramJ heap = paramJ EscHeap heap = paramJ EscContentEscapes Note that (currently) if the address of a parameter is taken and returned, necessarily a heap allocation occurred to contain that reference, and the heap can never refer to stack, therefore the parameter and everything downstream from it escapes to the heap. The per-function summary information now has a tuneable number of bits (2 is probably noticeably better than 1, 3 is likely overkill, but it is now easy to check and the -m debugging output includes information that allows you to figure out if more would be better.) A new test was added to check pointer flow through struct-typed and struct-typed parameters and returns; some of these are sensitive to the number of summary bits, and ought to yield better results with a more competent escape analysis algorithm. Another new test checks (some) correctness with array parameters, results, and operations. The old analysis inferred a piece of plan9 runtime was non-escaping by counteracting overconservative analysis with buggy analysis; with the bug fixed, the result was too conservative (and it's not easy to fix in this framework) so the source code was tweaked to get the desired result. A test was added against the discovered bug. The escape analysis was further improved splitting the "level" into 3 parts, one tracking the conventional "level" and the other two computing the highest-level-suffix-from-copy, which is used to generally model the cancelling effect of indirection applied to address-of. With the improved escape analysis enabled, it was necessary to modify one of the runtime tests because it now attempts to allocate too much on the (small, fixed-size) G0 (system) stack and this failed the test. Compiling src/std after touching src/runtime/.go with -m logging turned on shows 420 fewer heap allocation sites (10538 vs 10968). Profiling allocations in src/html/template with for i in {1..5} ; do go tool 6g -memprofile=mastx.${i}.prof -memprofilerate=1 *.go; go tool pprof -alloc_objects -text mastx.${i}.prof ; done showed a 15% reduction in allocations performed by the compiler. Update #3753 Update #4720 Fixes #10466 Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432 Reviewed-on: https://go-review.googlesource.com/8202 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 13:47:20 +00:00
Austin Clements	33e0f3d853	runtime: fix some out of date comments and typos Change-Id: I061057414c722c5a0f03c709528afc8554114db6 Reviewed-on: https://go-review.googlesource.com/9367 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 20:08:38 +00:00
Austin Clements	1b01910c06	runtime: rename gcController.findRunnable to findRunnableGCWorker This avoids confusion with the main findrunnable in the scheduler. Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0 Reviewed-on: https://go-review.googlesource.com/9305 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:42 +00:00
Austin Clements	bb6320535d	runtime: replace STW for enabling write barriers with ragged barrier Currently, we use a full stop-the-world around enabling write barriers. This is to ensure that all Gs have enabled write barriers before any blackening occurs (either in gcBgMarkWorker() or in gcAssistAlloc()). However, there's no need to bring the whole world to a synchronous stop to ensure this. This change replaces the STW with a ragged barrier that ensures each P has individually observed that write barriers should be enabled before GC performs any blackening. Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955 Reviewed-on: https://go-review.googlesource.com/8207 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:37 +00:00
Austin Clements	57afa76471	runtime: add ragged global barrier function This adds forEachP, which performs a general-purpose ragged global barrier. forEachP takes a callback and invokes it for every P at a GC safe point. Ps that are idle or in a syscall are considered to be at a continuous safe point. forEachP ensures that these Ps do not change state by forcing all syscall Ps into idle and holding the sched.lock. To ensure that Ps do not enter syscall or idle without running the safe-point function, this adds checks for a pending callback every place there is currently a gcwaiting check. We'll use forEachP to replace the STW around enabling the write barrier and to replace the current asynchronous per-M wbuf cache with a cooperatively managed per-P gcWork cache. Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc Reviewed-on: https://go-review.googlesource.com/8206 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:33 +00:00
Austin Clements	b0b1a66052	runtime: reset spinning in mspinning if work was ready()ed This fixes a bug where the runtime ready()s a goroutine while setting up a new M that's initially marked as spinning, causing the scheduler to later panic when it finds work in the run queue of a P associated with a spinning M. Specifically, the sequence of events that can lead to this is: 1) sysmon calls handoffp to hand off a P stolen from a syscall. 2) handoffp sees no pending work on the P, so it calls startm with spinning set. 3) startm calls newm, which in turn calls allocm to allocate a new M. 4) allocm "borrows" the P we're handing off in order to do allocation and performs this allocation. 5) This allocation may assist the garbage collector, and this assist may detect the end of concurrent mark and ready() the main GC goroutine to signal this. 6) This ready()ing puts the GC goroutine on the run queue of the borrowed P. 7) newm starts the OS thread, which runs mstart and subsequently mstart1, which marks the M spinning because startm was called with spinning set. 8) mstart1 enters the scheduler, which panics because there's work on the run queue, but the M is marked spinning. To fix this, before marking the M spinning in step 7, add a check to see if work was been added to the P's run queue. If this is the case, undo the spinning instead. Fixes #10573. Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5 Reviewed-on: https://go-review.googlesource.com/9332 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-27 12:49:54 +00:00
Austin Clements	2a46f55b35	runtime: panic when idling a P with runnable Gs This adds a check that we never put a P on the idle list when it has work on its local run queue. Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7 Reviewed-on: https://go-review.googlesource.com/9324 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-27 12:49:49 +00:00
Austin Clements	0e6a6c510f	runtime: simplify process for starting GC goroutine Currently, when allocation reaches the GC trigger, the runtime uses readyExecute to start the GC goroutine immediately rather than wait for the scheduler to get around to the GC goroutine while the mutator continues to grow the heap. Now that the scheduler runs the most recently readied goroutine when a goroutine yields its time slice, this rigmarole is no longer necessary. The runtime can simply ready the GC goroutine and yield from the readying goroutine. Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710 Reviewed-on: https://go-review.googlesource.com/9292 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-24 15:13:05 +00:00
Austin Clements	e870f06c3f	runtime: yield time slice to most recently readied G Currently, when the runtime ready()s a G, it adds it to the end of the current P's run queue and continues running. If there are many other things in the run queue, this can result in a significant delay before the ready()d G actually runs and can hurt fairness when other Gs in the run queue are CPU hogs. For example, if there are three Gs sharing a P, one of which is a CPU hog that never voluntarily gives up the P and the other two of which are doing small amounts of work and communicating back and forth on an unbuffered channel, the two communicating Gs will get very little CPU time. Change this so that when G1 ready()s G2 and then blocks, the scheduler immediately hands off the remainder of G1's time slice to G2. In the above example, the two communicating Gs will now act as a unit and together get half of the CPU time, while the CPU hog gets the other half of the CPU time. This fixes the problem demonstrated by the ping-pong benchmark added in the previous commit: benchmark old ns/op new ns/op delta BenchmarkPingPongHog 684287 825 -99.88% On the x/benchmarks suite, this change improves the performance of garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for GOMAXPROCS=1 and 4. It has negligible effect on heap size. This has no effect on the go1 benchmark suite since those benchmarks are mostly single-threaded. Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f Reviewed-on: https://go-review.googlesource.com/9289 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:52 +00:00
Austin Clements	e5e52f4f2c	runtime: factor checking if P run queue is empty There are a variety of places where we check if a P's run queue is empty. This test is about to get slightly more complicated, so factor it out into a new function, runqempty. This function is inlinable, so this has no effect on performance. Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b Reviewed-on: https://go-review.googlesource.com/9287 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:42 +00:00
Austin Clements	e0c3d85f08	runtime: fix background marking at 25% utilization Currently, in accordance with the GC pacing proposal, we schedule background marking with a goal of achieving 25% utilization total between mutator assists and background marking. This is stricter than was set out in the Go 1.5 proposal, which suggests that the garbage collector can use 25% just for itself and anything the mutator does to help out is on top of that. It also has several technical drawbacks. Because mutator assist time is constantly changing and we can't have instantaneous information on background marking time, it effectively requires hitting a moving target based on out-of-date information. This works out in the long run, but works poorly for short GC cycles and on short time scales. Also, this requires time-multiplexing all Ps between the mutator and background GC since the goal utilization of background GC constantly fluctuates. This results in a complicated scheduling algorithm, poor affinity, and extra overheads from context switching. This change modifies the way we schedule and run background marking so that background marking always consumes 25% of GOMAXPROCS and mutator assist is in addition to this. This enables a much more robust scheduling algorithm where we pre-determine the number of Ps we should dedicate to background marking as well as the utilization goal for a single floating "remainder" mark worker. Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168 Reviewed-on: https://go-review.googlesource.com/9013 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:50 +00:00
Austin Clements	8d03acce54	runtime: multi-threaded, utilization-scheduled background mark Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:32 +00:00
Russ Cox	181e26b9fa	runtime: replace func-based write barrier skipping with type-based This CL revises CL 7504 to use explicitly uintptr types for the struct fields that are going to be updated sometimes without write barriers. The result is that the fields are now updated always without write barriers. This approach has two important properties: 1) Now the GC never looks at the field, so if the missing reference could cause a problem, it will do so all the time, not just when the write barrier is missed at just the right moment. 2) Now a write barrier never happens for the field, avoiding the (correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1. Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e Reviewed-on: https://go-review.googlesource.com/9019 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-20 20:20:09 +00:00
Ian Lance Taylor	725aa3451a	runtime: no deadlock error if buildmode=c-archive or c-shared Change-Id: I4ee6dac32bd3759aabdfdc92b235282785fbcca9 Reviewed-on: https://go-review.googlesource.com/9083 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-20 17:31:44 +00:00
Austin Clements	c1c667542c	runtime: fix dangling pointer in readyExecute readyExecute passes a closure to mcall that captures an argument to readyExecute. Since mcall is marked noescape, this closure lives on the stack of the calling goroutine. However, the closure puts the calling goroutine on the run queue (and switches to a new goroutine). If the calling goroutine gets scheduled before the mcall returns, this stack-allocated closure will become invalid while it's still executing. One consequence of this we've observed is that the captured gp variable can get overwritten before the call to execute(gp), causing execute(gp) to segfault. Fix this by passing the currently captured gp variable through a field in the calling goroutine's g struct so that the func is no longer a closure. To prevent problems like this in the future, this change also removes the go:noescape annotation from mcall. Due to a compiler bug, this will currently cause a func closure passed to mcall to be implicitly allocated rather than refusing the implicit allocation. However, this is okay because there are no other closures passed to mcall right now and the compiler bug will be fixed shortly. Fixes #10428. Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32 Reviewed-on: https://go-review.googlesource.com/8866 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-17 17:59:14 +00:00
Austin Clements	a23a341e10	runtime: make time slice a const A G will be preempted if it runs for 10ms without blocking. Currently this constant is hard-coded in retake. Move it to a global const. We'll use the time slice length in scheduling background GC. Change-Id: I79a979948af2fad3afe5df9d4af4062f166554b7 Reviewed-on: https://go-review.googlesource.com/8838 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-14 22:06:32 +00:00
Austin Clements	4b956ae317	runtime: start concurrent GC promptly when we reach its trigger Currently, when allocation reaches the concurrent GC trigger size, we start the concurrent collector by ready'ing its G. This simply puts it on the end of the P's run queue, which means we may not actually start GC for some time as the current G continues to run and then the P drains other Gs already on its run queue. Since the mutator can continue to allocate, the heap can potentially be much larger than we intended by the time GC actually starts. Furthermore, how much larger is difficult to predict since it depends on the scheduler. Fix this by preempting the current G and switching directly to the concurrent GC G as soon as we reach the trigger heap size. On the garbage benchmark from the benchmarks subrepo with GOMAXPROCS=4, this reduces the time from triggering the GC to the beginning of sweep termination by 10 to 30 milliseconds, which reduces allocation after the trigger by up to 10MB (a large fraction of the 64MB live heap the benchmark tries to maintain). One other known source of delay before we "really" start GC is the sweep finalization performed before sweep termination. This has similar negative effects on heap size and predictability, but is an orthogonal problem. This change adds a TODO for this. Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47 Reviewed-on: https://go-review.googlesource.com/8513 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-10 18:22:52 +00:00
Austin Clements	7c37249639	runtime: make test for freezetheworld more precise exitsyscallfast checks for freezetheworld, but does so only by checking if stopwait is positive. This can also happen during stoptheworld, which is harmless, but confusing. Shortly, it will be important that we get to the p.status cas even if stopwait is set. Hence, make this test more specific so it only triggers with freezetheworld and not other uses of stopwait. Change-Id: Ibb722cd8360c3ed5a9654482519e3ceb87a8274d Reviewed-on: https://go-review.googlesource.com/8205 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-10 18:02:55 +00:00
Dmitry Vyukov	089d363a91	runtime: fix tracing of syscall exit Fix tracing of syscall exit after: https://go-review.googlesource.com/#/c/7504/ Change-Id: Idcde2aa826d2b9a05d0a90a80242b6bfa78846ab Reviewed-on: https://go-review.googlesource.com/8728 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 17:39:06 +00:00
Michael Hudson-Doyle	a1f57598cc	runtime, cmd/internal/ld: rename themoduledata to firstmoduledata 'themoduledata' doesn't really make sense now we support multiple moduledata objects. Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62 Reviewed-on: https://go-review.googlesource.com/8521 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle	3a84e3305b	runtime, cmd/internal/ld: initialize themoduledata slices directly This CL is quite conservative in some ways. It continues to define symbols that have no real purpose (e.g. epclntab). These could be deleted if there is no concern that external tools might look for them. It would also now be possible to make some changes to the pcln data but I get the impression that would definitely require some thought and discussion. Change-Id: Ib33cde07e4ec38ecc1d6c319a10138c9347933a3 Reviewed-on: https://go-review.googlesource.com/7616 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-08 16:20:57 +00:00
Austin Clements	f244a1471d	runtime: add cumulative GC CPU % to gctrace line This tracks both total CPU time used by GC and the total time available to all Ps since the beginning of the program and uses this to derive a cumulative CPU usage percent for the gctrace line. Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015 Reviewed-on: https://go-review.googlesource.com/8350 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-02 23:37:13 +00:00
Michael Hudson-Doyle	67426a8a9e	runtime, cmd/internal/ld: change runtime to use a single linker symbol In preparation for being able to run a go program that has code in several objects, this changes from having several linker symbols used by the runtime into having one linker symbol that points at a structure containing the needed data. Multiple object support will construct a linked list of such structures. A follow up will initialize the slices in the themoduledata structure directly from the linker but I was aiming for a minimal diff for now. Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf Reviewed-on: https://go-review.googlesource.com/7441 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-03-31 22:45:07 +00:00
Austin Clements	9e6f7aac28	runtime: make "write barriers are not allowed" comments more precise Currently, various functions are marked with the comment // May run without a P, so write barriers are not allowed. However, "running without a P" is ambiguous. We intended these to mean that m.p may be nil (which is the condition checked by the write barrier). The comment could also be taken to mean that a stop-the-world may happen, which is not the case for these functions because they run in situations where there is in fact a function on the stack holding a P locally, it just isn't in m.p. Change these comments to state precisely what we mean, that m.p may be nil. Change-Id: I4a4a1d26aebd455e5067540e13b9f96a7482146c Reviewed-on: https://go-review.googlesource.com/8209 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-30 15:13:53 +00:00
Austin Clements	392336f94e	runtime: disallow write barriers in handoffp and callees handoffp by definition runs without a P, so it's not allowed to have write barriers. It doesn't have any right now, but mark it nowritebarrier to disallow any creeping in in the future. handoffp in turns calls startm, newm, and newosproc, all of which are "below Go" and make sense to run without a P, so disallow write barriers in these as well. For most functions, we've done this because they may race with stoptheworld() and hence must not have write barriers. For these functions, it's a little different: the world can't stop while we're in handoffp, so this race isn't present. But we implement this restriction with a somewhat broader rule that you can't have a write barrier without a P. We like this rule because it's simple and means that our write barriers can depend on there being a P, even though this rule is actually a little broader than necessary. Hence, even though there's no danger of the race in these functions, we want to adhere to the broader rule. Change-Id: Ie22319c30eea37d703eb52f5c7ca5da872030b88 Reviewed-on: https://go-review.googlesource.com/8130 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-26 20:38:59 +00:00
David Crawshaw	e9d9d0befc	runtime, runtime/cgo: make needextram a bool Also invert it, which means it no longer needs to cross the cgo package boundary. Change-Id: I393cd073bda02b591a55d6bc6b8bb94970ea71cd Reviewed-on: https://go-review.googlesource.com/8082 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-03-26 11:12:25 +00:00
Alex Brainman	2420926a8a	runtime: remove obsolete comment We do not use SEH to handle Windows exception anymore. Change-Id: I0ac807a0fed7a5b4c745454246764c524460472b Reviewed-on: https://go-review.googlesource.com/8071 Reviewed-by: Minux Ma <minux@golang.org>	2015-03-25 02:55:56 +00:00
David Crawshaw	b8caed823b	runtime: initialize extra M for cgo during mstart Previously the extra m needed for cgo callbacks was created on the first callback. This works for cgo, however the cgocallback mechanism is also borrowed by badsignal which can run before any cgo calls are made. Now we initialize the extra M at runtime startup before any signal handlers are registered, so badsignal cannot be called until the extra M is ready. Updates #10207. Change-Id: Iddda2c80db6dc52d8b60e2b269670fbaa704c7b3 Reviewed-on: https://go-review.googlesource.com/7978 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org>	2015-03-24 19:39:46 +00:00
Rick Hudson	41dbcc19ef	runtime: Remove write barriers during STW. The GC assumes that there will be no asynchronous write barriers when the world is stopped. This keeps the synchronization between write barriers and the GC simple. However, currently, there are a few places in runtime code where this assumption does not hold. The GC stops the world by collecting all Ps, which stops all user Go code, but small parts of the runtime can run without a P. For example, the code that releases a P must still deschedule its G onto a runnable queue before stopping. Similarly, when a G returns from a long-running syscall, it must run code to reacquire a P. Currently, this code can contain write barriers. This can lead to the GC collecting reachable objects if something like the following sequence of events happens: 1. GC stops the world by collecting all Ps. 2. G #1 returns from a syscall (for example), tries to install a pointer to object X, and calls greyobject on X. 3. greyobject on G #1 marks X, but does not yet add it to a write buffer. At this point, X is effectively black, not grey, even though it may point to white objects. 4. GC reaches X through some other path and calls greyobject on X, but greyobject does nothing because X is already marked. 5. GC completes. 6. greyobject on G #1 adds X to a work buffer, but it's too late. 7. Objects that were reachable only through X are incorrectly collected. To fix this, we check the invariant that no asynchronous write barriers happen when the world is stopped by checking that write barriers always have a P, and modify all currently known sources of these writes to disable the write barrier. In all modified cases this is safe because the object in question will always be reachable via some other path. Some of the trace code was turned off, in particular the code that traces returning from a syscall. The GC assumes that as far as the heap is concerned the thread is stopped when it is in a syscall. Upon returning the trace code must not do any heap writes for the same reasons discussed above. Fixes #10098 Fixes #9953 Fixes #9951 Fixes #9884 May relate to #9610 #9771 Change-Id: Ic2e70b7caffa053e56156838eb8d89503e3c0c8a Reviewed-on: https://go-review.googlesource.com/7504 Reviewed-by: Austin Clements <austin@google.com>	2015-03-17 17:33:21 +00:00
Aram Hăvărneanu	846ee0465b	runtime: add support for linux/arm64 Change-Id: Ibda6a5bedaff57fd161d63fc04ad260931d34413 Reviewed-on: https://go-review.googlesource.com/7142 Reviewed-by: Russ Cox <rsc@golang.org>	2015-03-16 18:45:54 +00:00
Dmitry Vyukov	919fd24884	runtime: remove runtime frames from stacks in traces Stip uninteresting bottom and top frames from trace stacks. This makes both binary and json trace files smaller, and also makes stacks shorter and more readable in the viewer. Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583 Reviewed-on: https://go-review.googlesource.com/5523 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-10 14:46:15 +00:00
Dmitry Vyukov	6c58d28ca4	runtime: cleanup Cleanup after https://go-review.googlesource.com/3742 Change-Id: Iff3ceffc31b778b1ed0b730696fce6d1b5124447 Reviewed-on: https://go-review.googlesource.com/6761 Reviewed-by: Minux Ma <minux@golang.org>	2015-03-05 07:45:17 +00:00
Dmitry Vyukov	b759e225f5	runtime: bound defer pools (try 2) The unbounded list-based defer pool can grow infinitely. This can happen if a goroutine routinely allocates a defer; then blocks on one P; and then unblocked, scheduled and frees the defer on another P. The scenario was reported on golang-nuts list. We've been here several times. Any unbounded local caches are bad and grow to infinite size. This change introduces central defer pool; local pools become fixed-size with the only purpose of amortizing accesses to the central pool. Freedefer now executes on system stack to not consume nosplit stack space. Change-Id: I1a27695838409259d1586a0adfa9f92bccf7ceba Reviewed-on: https://go-review.googlesource.com/3967 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-04 14:29:58 +00:00
Dmitry Vyukov	5ef145c809	runtime: bound sudog cache The unbounded list-based sudog cache can grow infinitely. This can happen if a goroutine is routinely blocked on one P and then unblocked and scheduled on another P. The scenario was reported on golang-nuts list. We've been here several times. Any unbounded local caches are bad and grow to infinite size. This change introduces central sudog cache; local caches become fixed-size with the only purpose of amortizing accesses to the central cache. The change required to move sudog cache from mcache to P, because mcache is not scanned by GC. Change-Id: I3bb7b14710354c026dcba28b3d3c8936a8db4e90 Reviewed-on: https://go-review.googlesource.com/3742 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-04 14:14:29 +00:00
Dmitry Vyukov	894024f478	runtime: fix traceback from goexit1 We used to not call traceback from goexit1. But now tracer does it and crashes on amd64p32: runtime: unexpected return pc for runtime.getg called from 0x108a4240 goroutine 18 [runnable, locked to thread]: runtime.traceGoEnd() src/runtime/trace.go:758 fp=0x10818fe0 sp=0x10818fdc runtime.goexit1() src/runtime/proc1.go:1540 +0x20 fp=0x10818fe8 sp=0x10818fe0 runtime.getg(0x0) src/runtime/asm_386.s:2414 fp=0x10818fec sp=0x10818fe8 created by runtime/pprof_test.TestTraceStress src/runtime/pprof/trace_test.go:123 +0x500 Return PC from goexit1 points right after goexit (+0x6). It happens to work most of the time somehow. This change fixes traceback from goexit1 by adding an additional NOP to goexit. Fixes #9931 Change-Id: Ied25240a181b0a2d7bc98127b3ed9068e9a1a13e Reviewed-on: https://go-review.googlesource.com/5460 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-28 23:19:57 +00:00
Dmitry Vyukov	f47e581e02	runtime: do not do futile netpolls There is no sense in trying to netpoll while there is already a thread blocked in netpoll. And in most cases there must be a thread blocked in netpoll, because the first otherwise idle thread does blocking netpoll. On some program I see that netpoll called from findrunnable consumes 3% of time. Change-Id: I0af1a73d637bffd9770ea50cb9278839716e8816 Reviewed-on: https://go-review.googlesource.com/4553 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-02-26 11:03:07 +00:00
Matthew Dempsky	3c8a89daf3	runtime: simplify CPU profiling code This makes Go's CPU profiling code somewhat more idiomatic; e.g., using := instead of forward declaring variables, using "int" for element counts instead of "uintptr", and slices instead of C-style pointer+length. This makes the code easier to read and eliminates a lot of type conversion clutter. Additionally, in sigprof we can collect just maxCPUProfStack stack frames, as cpuprof won't use more than that anyway. Change-Id: I0235b5ae552191bcbb453b14add6d8c01381bd06 Reviewed-on: https://go-review.googlesource.com/6072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-02-26 08:59:24 +00:00

1 2

84 Commits