Commit Graph

59 Commits

Author SHA1 Message Date
Russ Cox 2b62e1eaec runtime: fix hang in GC due to shrinkstack vs netpoll race
During garbage collection, after scanning a stack, we think about
shrinking it to reclaim some memory. The shrinking code (called
while the world is stopped) checked that the status was Gwaiting
or Grunnable and then changed the state to Gcopystack, to essentially
lock the stack so that no other GC thread is scanning it.
The same locking happens for stack growth (and is more necessary there).

        oldstatus = runtime·readgstatus(gp);
        oldstatus &= ~Gscan;
        if(oldstatus == Gwaiting || oldstatus == Grunnable)
                runtime·casgstatus(gp, oldstatus, Gcopystack); // oldstatus is Gwaiting or Grunnable
        else
                runtime·throw("copystack: bad status, not Gwaiting or Grunnable");

Unfortunately, "stop the world" doesn't stop everything. It stops all
normal goroutine execution, but the network polling thread is still
blocked in epoll and may wake up. If it does, and it chooses a goroutine
to mark runnable, and that goroutine is the one whose stack is shrinking,
then it can happen that between readgstatus and casgstatus, the status
changes from Gwaiting to Grunnable.

casgstatus assumes that if the status is not what is expected, it is a
transient change (like from Gwaiting to Gscanwaiting and back, or like
from Gwaiting to Gcopystack and back), and it loops until the status
has been restored to the expected value. In this case, the status has
changed semi-permanently from Gwaiting to Grunnable - it won't
change again until the GC is done and the world can continue, but the
GC is waiting for the status to change back. This wedges the program.

To fix, call a special variant of casgstatus that accepts either Gwaiting
or Grunnable as valid statuses.

Without the fix bug with the extra check+throw in casgstatus, the
program below dies in a few seconds (2-10) with GOMAXPROCS=8
on a 2012 Retina MacBook Pro. With the fix, it runs for minutes
and minutes.

package main

import (
        "io"
        "log"
        "net"
        "runtime"
)

func main() {
        const N = 100
        for i := 0; i < N; i++ {
                l, err := net.Listen("tcp", "127.0.0.1:0")
                if err != nil {
                        log.Fatal(err)
                }
                ch := make(chan net.Conn, 1)
                go func() {
                        var err error
                        c1, err := net.Dial("tcp", l.Addr().String())
                        if err != nil {
                                log.Fatal(err)
                        }
                        ch <- c1
                }()
                c2, err := l.Accept()
                if err != nil {
                        log.Fatal(err)
                }
                c1 := <-ch
                l.Close()
                go netguy(c1, c2)
                go netguy(c2, c1)
                c1.Write(make([]byte, 100))
        }
        for {
                runtime.GC()
        }
}

func netguy(r, w net.Conn) {
        buf := make([]byte, 100)
        for {
                bigstack(1000)
                _, err := io.ReadFull(r, buf)
                if err != nil {
                        log.Fatal(err)
                }
                w.Write(buf)
        }
}

var g int

func bigstack(n int) {
        var buf [100]byte
        if n > 0 {
                bigstack(n - 1)
        }
        g = int(buf[0]) + int(buf[99])
}

Fixes #9186.

LGTM=rlh
R=austin, rlh
CC=dvyukov, golang-codereviews, iant, khr, r
https://golang.org/cl/179680043
2014-12-01 16:32:06 -05:00
Russ Cox 6ad16c4a48 runtime: fix initial gp->sched.pc in newextram
CL 170720043 missed this one when adding +PCQuantum.

LGTM=iant
R=r, iant
CC=golang-codereviews
https://golang.org/cl/168090043
2014-11-06 09:37:04 -05:00
Russ Cox a5a0733144 runtime: change top-most return PC from goexit to goexit+PCQuantum
If you get a stack of PCs from Callers, it would be expected
that every PC is immediately after a call instruction, so to find
the line of the call, you look up the line for PC-1.
CL 163550043 now explicitly documents that.

The most common exception to this is the top-most return PC
on the stack, which is the entry address of the runtime.goexit
function. Subtracting 1 from that PC will end up in a different
function entirely.

To remove this special case, make the top-most return PC
goexit+PCQuantum and then implement goexit in assembly
so that the first instruction can be skipped.

Fixes #7690.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/170720043
2014-10-29 20:37:44 -04:00
Russ Cox a22c11b995 runtime: fix line number in first stack frame in printed stack trace
Originally traceback was only used for printing the stack
when an unexpected signal came in. In that case, the
initial PC is taken from the signal and should be used
unaltered. For the callers, the PC is the return address,
which might be on the line after the call; we subtract 1
to get to the CALL instruction.

Traceback is now used for a variety of things, and for
almost all of those the initial PC is a return address,
whether from getcallerpc, or gp->sched.pc, or gp->syscallpc.
In those cases, we need to subtract 1 from this initial PC,
but the traceback code had a hard rule "never subtract 1
from the initial PC", left over from the signal handling days.

Change gentraceback to take a flag that specifies whether
we are tracing a trap.

Change traceback to default to "starting with a return PC",
which is the overwhelmingly common case.

Add tracebacktrap, like traceback but starting with a trap PC.

Use tracebacktrap in signal handlers.

Fixes #7690.

LGTM=iant, r
R=r, iant
CC=golang-codereviews
https://golang.org/cl/167810044
2014-10-29 15:14:24 -04:00
Shenghou Ma 2fe9482343 runtime: add fake time support back.
Revived from CL 15690048.

Fixes #5356.

LGTM=rsc
R=adg, dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/101400043
2014-10-27 20:35:15 -04:00
Russ Cox 1ba977ccca undo CL 159990043 / 421fadcef39a
Dmitriy believes this broke Windows.
It looks like build.golang.org stopped before that,
but it's worth a shot.

««« original CL description
runtime: make pprof a little nicer

Update #8942

This does not fully address issue 8942 but it does make
the profiles much more useful, until that issue can be
fixed completely.

LGTM=dvyukov
R=r, dvyukov
CC=golang-codereviews
https://golang.org/cl/159990043
»»»

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/160030043
2014-10-17 10:11:03 -04:00
Russ Cox 7ed8723d49 runtime: make pprof a little nicer
Update #8942

This does not fully address issue 8942 but it does make
the profiles much more useful, until that issue can be
fixed completely.

LGTM=dvyukov
R=r, dvyukov
CC=golang-codereviews
https://golang.org/cl/159990043
2014-10-16 14:44:55 -04:00
Russ Cox 3ffd29fb2c cmd/cc, runtime: disallow structs without tags
Structs without tags have no unique name to use in the
Go definitions generated from the C types.
This caused issue 8812, fixed by CL 149260043.
Avoid future problems by requiring struct tags.

Update runtime as needed.
(There is no other C code in the tree.)

LGTM=bradfitz, iant
R=golang-codereviews, bradfitz, dave, iant
CC=golang-codereviews, khr, r
https://golang.org/cl/150360043
2014-10-03 12:44:20 -04:00
Keith Randall 70b2da98ca runtime: initialize traceback variables earlier
Our traceback code needs to know the PC of several special
functions, including goexit, mcall, etc.  Make sure that
these PCs are initialized before any traceback occurs.

Fixes #8766

LGTM=rsc
R=golang-codereviews, rsc, khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/145570043
2014-09-29 21:21:36 -07:00
Russ Cox 193daab988 cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects
In linker, refuse to write conservative (array of pointers) as the
garbage collection type for any variable in the data/bss GC program.

In the linker, attach the Go type to an already-read C declaration
during dedup. This gives us Go types for C globals for free as long
as the cmd/dist-generated Go code contains the declaration.
(Most runtime C declarations have a corresponding Go declaration.
Both are bss declarations and so the linker dedups them.)

In cmd/dist, add a few more C files to the auto-Go-declaration list
in order to get Go type information for the C declarations into the linker.

In C compiler, mark all non-pointer-containing global declarations
and all string data as NOPTR. This allows them to exist in C files
without any corresponding Go declaration. Count C function pointers
as "non-pointer-containing", since we have no heap-allocated C functions.

In runtime, add NOPTR to the remaining pointer-containing declarations,
none of which refer to Go heap objects.

In runtime, also move os.Args and syscall.envs data into runtime-owned
variables. Otherwise, in programs that do not import os or syscall, the
runtime variables named os.Args and syscall.envs will be missing type
information.

I believe that this CL eliminates the final source of conservative GC scanning
in non-SWIG Go programs, and therefore...

Fixes #909.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/149770043
2014-09-24 16:55:26 -04:00
Hector Martin Cantero 7283e08cbf runtime: keep g->syscallsp consistent after cgo->Go callbacks
Normally, the caller to runtime.entersyscall() must not return before
calling runtime.exitsyscall(), lest g->syscallsp become a dangling
pointer. runtime.cgocallbackg() violates this constraint. To work around
this, save g->syscallsp and g->syscallpc around cgo->Go callbacks, then
restore them after calling runtime.entersyscall(), which restores the
syscall stack frame pointer saved by cgocall. This allows the GC to
correctly trace a goroutine that is currently returning from a
Go->cgo->Go chain.

This also adds a check to proc.c that panics if g->syscallsp is clearly
invalid. It is not 100% foolproof, as it will not catch a case where the
stack was popped then pushed back beyond g->syscallsp, but it does catch
the present cgo issue and makes existing tests fail without the bugfix.

Fixes #7978.

LGTM=dvyukov, rsc
R=golang-codereviews, dvyukov, minux, bradfitz, iant, gobot, rsc
CC=golang-codereviews, rsc
https://golang.org/cl/131910043
2014-09-24 13:20:25 -04:00
Russ Cox c7f6bd795a runtime: rename SchedType to SchedT
CL 144940043 renamed it from Sched to SchedType
to avoid a lowercasing conflict in the Go code with
the variable named sched.
We've been using just T resolve those conflicts, not Type.

The FooType pattern is already taken for the kind-specific
variants of the runtime Type structure: ChanType, MapType,
and so on. SchedType isn't a Type.

LGTM=bradfitz, khr
R=khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/145180043
2014-09-18 23:51:22 -04:00
Russ Cox c3b5db895b runtime: delete panicstring; move its checks into gopanic
In Go 1.3 the runtime called panicstring to report errors like
divide by zero or memory faults. Now we call panic (gopanic)
with pre-allocated error values. That new path is missing the
checking that panicstring did, so add it there.

The only call to panicstring left is in cnew, which is problematic
because if it fails, probably the heap is corrupt. In that case,
calling panicstring creates a new errorCString (no allocation there),
but then panic tries to print it, invoking errorCString.Error, which
does a string concatenation (allocating), which then dies.
Replace that one panicstring with a throw: cnew is for allocating
runtime data structures and should never ask for an inappropriate
amount of memory.

With panicstring gone, delete newErrorCString, errorCString.
While we're here, delete newErrorString, not called by anyone.
(It can't be: that would be C code calling Go code that might
block or grow the stack.)

Found while debugging a malloc corruption.
This resulted in 'panic during panic' instead of a more useful message.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/138290045
2014-09-18 14:49:24 -04:00
Keith Randall 6c934238c9 runtime: change minimum stack size to 2K.
It will be 8K on windows because it needs 4K for the OS.
Similarly, plan9 will be 4K.

On linux/amd64, reduces size of 100,000 goroutines
from ~819MB to ~245MB.

Update #7514

LGTM=dvyukov
R=golang-codereviews, dvyukov, khr, aram
CC=golang-codereviews
https://golang.org/cl/145790043
2014-09-17 08:32:15 -07:00
Keith Randall da8cf5438a runtime: always run semacquire on the G stack
semacquire might need to park the currently running G.  It can
only park if called from the G stack (because it has no way of
saving the M stack state).  So all calls to semacquire must come
from the G stack.

The three violators are GOMAXPROCS, ReadMemStats, and WriteHeapDump.
This change moves the semacquire call earlier, out of their C code
and into their Go code.

This seldom caused bugs because semacquire seldom actually had
to park the caller.  But it did happen intermittently.

Fixes #8749

LGTM=dvyukov
R=golang-codereviews, dvyukov, bradfitz
CC=golang-codereviews
https://golang.org/cl/144940043
2014-09-16 17:26:16 -07:00
Russ Cox 44753479c6 runtime: remove a few untyped allocations
LGTM=iant, khr, rlh
R=khr, iant, bradfitz, rlh
CC=dvyukov, golang-codereviews
https://golang.org/cl/142030044
2014-09-12 16:12:39 -04:00
Russ Cox 15a5c35cec runtime: move gosched to Go, to add stack frame information
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/134520044
2014-09-11 16:22:21 -04:00
Russ Cox 1d550b87db runtime: allow crash from gsignal stack
The uses of onM in dopanic/startpanic are okay even from the signal stack.

Fixes #8666.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/134710043
2014-09-11 12:08:30 -04:00
Anthony Martin 9f012e1002 runtime: call rfork on scheduler stack on Plan 9
A race exists between the parent and child processes after a fork.
The child needs to access the new M pointer passed as an argument
but the parent may have already returned and clobbered it.

Previously, we avoided this by saving the necessary data into
registers before the rfork system call but this isn't guaranteed
to work because Plan 9 makes no promises about the register state
after a system call. Only the 386 kernel seems to save them.
For amd64 and arm, this method won't work.

We eliminate the race by allocating stack space for the scheduler
goroutines (g0) in the per-process copy-on-write stack segment and
by only calling rfork on the scheduler stack.

LGTM=aram, 0intro, rsc
R=aram, 0intro, mischief, rsc
CC=golang-codereviews
https://golang.org/cl/110680044
2014-09-09 17:19:01 -07:00
Russ Cox 16c59acb97 runtime: avoid read overrun in heapdump
Start the stack a few words below the actual top, so that
if something tries to read goexit's caller PC from the stack,
it won't fault on a bad memory address.
Today, heapdump does that.
Maybe tomorrow, traceback or something else will do that.
Make it not a bug.

TBR=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/136450043
2014-09-09 15:38:55 -04:00
Russ Cox 15b76ad94b runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames
Commit to stack copying for stack growth.

We're carrying around a surprising amount of cruft from older schemes.
I am confident that precise stack scans and stack copying are here to stay.

Delete fallback code for when precise stack info is disabled.
Delete fallback code for when copying stacks is disabled.
Delete fallback code for when StackCopyAlways is disabled.
Delete Stktop chain - there is only one stack segment now.
Delete M.moreargp, M.moreargsize, M.moreframesize, M.cret.
Delete G.writenbuf (unrelated, just dead).
Delete runtime.lessstack, runtime.oldstack.
Delete many amd64 morestack variants.
Delete initialization of morestack frame/arg sizes (shortens split prologue!).

Replace G's stackguard/stackbase/stack0/stacksize/
syscallstack/syscallguard/forkstackguard with simple stack
bounds (lo, hi).

Update liblink, runtime/cgo for adjustments to G.

LGTM=khr
R=khr, bradfitz
CC=golang-codereviews, iant, r
https://golang.org/cl/137410043
2014-09-09 13:39:57 -04:00
Russ Cox bffb0590c1 runtime: merge mallocgc, gomallocgc
I assumed they were the same when I wrote
cgocallback.go earlier today. Merge them
to eliminate confusion.

I can't tell what gomallocgc did before with
a nil type but without FlagNoScan.
I created a call like that in cgocallback.go
this morning, translating from a C file.
It was supposed to do what the C version did,
namely treat the block conservatively.
Now it will.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/141810043
2014-09-09 01:08:34 -04:00
Russ Cox c81a0ed3c5 liblink, runtime: diagnose and fix C code running on Go stack
This CL contains compiler+runtime changes that detect C code
running on Go (not g0, not gsignal) stacks, and it contains
corrections for what it detected.

The detection works by changing the C prologue to use a different
stack guard word in the G than Go prologue does. On the g0 and
gsignal stacks, that stack guard word is set to the usual
stack guard value. But on ordinary Go stacks, that stack
guard word is set to ^0, which will make any stack split
check fail. The C prologue then calls morestackc instead
of morestack, and morestackc aborts the program with
a message about running C code on a Go stack.

This check catches all C code running on the Go stack
except NOSPLIT code. The NOSPLIT code is allowed,
so the check is complete. Since it is a dynamic check,
the code must execute to be caught. But unlike the static
checks we've been using in cmd/ld, the dynamic check
works with function pointers and other indirect calls.
For example it caught sigpanic being pushed onto Go
stacks in the signal handlers.

Fixes #8667.

LGTM=khr, iant
R=golang-codereviews, khr, iant
CC=golang-codereviews, r
https://golang.org/cl/133700043
2014-09-08 14:05:23 -04:00
Russ Cox c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00
Russ Cox 3f6acf1120 move src/runtime -> src/lib/runtime;
only automatic g4 mv here.

R=r
OCL=30002
CL=30007
2009-06-06 22:04:39 -07:00
Russ Cox aa3222d88f 32-bit fixes in lessstack.
avoid tight coupling between deferreturn and jmpdefer.
before, jmpdefer knew the exact frame size of deferreturn
in order to pop it off the stack.  now, deferreturn passes
jmpdefer a pointer to the frame above it explicitly.
that avoids a magic constant and should be less fragile.

R=r
DELTA=32  (6 added, 3 deleted, 23 changed)
OCL=29801
CL=29804
2009-06-02 23:02:12 -07:00
Russ Cox 5273868f67 32-bit stack switching bug fix
R=ken
OCL=29412
CL=29412
2009-05-26 17:13:39 -07:00
Russ Cox 918afd9491 move things out of sys into os and runtime
R=r
OCL=28569
CL=28573
2009-05-08 15:21:41 -07:00
Russ Cox a9996d0f89 runtime nits: variable name and comments
R=r
DELTA=10  (0 added, 0 deleted, 10 changed)
OCL=27374
CL=27388
2009-04-13 15:22:36 -07:00
Russ Cox d6c59ad7b8 clarification suggested by rob
R=r
DELTA=4  (4 added, 0 deleted, 0 changed)
OCL=26983
CL=27041
2009-04-02 16:41:53 -07:00
Russ Cox 95100344d3 fix runtime stack overflow bug that gri ran into:
160 - 75 was just barely not enough for deferproc + morestack.

added enum names and bumped to 256 - 128.
added explanation.

changed a few mal() (garbage-collected) to
malloc()/free() (manually collected).

R=ken
OCL=26981
CL=26981
2009-04-01 00:26:00 -07:00
Russ Cox 0d3a043de9 more 386 runtime - can run tiny c programs.
R=r
DELTA=1926  (1727 added, 168 deleted, 31 changed)
OCL=26876
CL=26878
2009-03-30 00:01:07 -07:00
Russ Cox 9f726c2c8b Use explicit allspan list instead of
trying to find all the places where
spans might be recorded.

Free can cascade into complicated
span manipulations that move them
from list to list; the old code had the
possibility of accidentally processing
a span twice or jumping to a different
list, causing an infinite loop.

R=r
DELTA=70  (28 added, 25 deleted, 17 changed)
OCL=23704
CL=23710
2009-01-28 15:22:16 -08:00
Ken Thompson e90314d024 pragma textflag
fixes latent bugs in go and defer

R=r
OCL=23613
CL=23613
2009-01-27 14:12:35 -08:00
Russ Cox 53e69e1db5 various race conditions.
R=r
DELTA=43  (29 added, 5 deleted, 9 changed)
OCL=23608
CL=23611
2009-01-27 14:01:20 -08:00
Ken Thompson 1e1cc4eb57 defer
R=r
OCL=23592
CL=23592
2009-01-27 12:03:53 -08:00
Russ Cox 1ce17918e3 gc #0. mark and sweep collector.
R=r,gri
DELTA=472  (423 added, 2 deleted, 47 changed)
OCL=23522
CL=23541
2009-01-26 17:37:05 -08:00
Russ Cox 360962420c casify, cleanup sys
R=r
OCL=22978
CL=22984
2009-01-16 14:58:14 -08:00
Russ Cox da0a7d7b8f malloc bug fixes.
use malloc by default.
free stacks.

R=r
DELTA=424  (333 added, 29 deleted, 62 changed)
OCL=21553
CL=21584
2008-12-19 03:13:39 -08:00
Russ Cox e29ce175ed malloc in runtime (not used by default)
R=r
DELTA=1551  (1550 added, 0 deleted, 1 changed)
OCL=21404
CL=21538
2008-12-18 15:42:28 -08:00
Russ Cox be629138ab use Note sched.stopped correctly
R=r
DELTA=6  (5 added, 0 deleted, 1 changed)
OCL=20777
CL=20779
2008-12-08 17:14:08 -08:00
Russ Cox 3f8aa662e9 add support for ref counts to memory allocator.
mark and sweep, stop the world garbage collector
(intermediate step in the way to ref counting).
can run pretty with an explicit gc after each file.

R=r
DELTA=502  (346 added, 143 deleted, 13 changed)
OCL=20630
CL=20635
2008-12-05 15:24:18 -08:00
Russ Cox 79e1db2da1 add stub routines stackalloc() and stackfree().
run oldstack on g0's stack, just like newstack does,
so that oldstack can free the old stack.

R=r
DELTA=53  (44 added, 0 deleted, 9 changed)
OCL=20404
CL=20433
2008-12-04 08:30:54 -08:00
Russ Cox efc86a74e4 change meaning of $GOMAXPROCS to number of cpus to use,
not number of threads.  can still starve all the other threads,
but only by looping, not by waiting in a system call.

fix darwin syscall.Syscall6 bug.

fix chanclient bug.

delete $GOMAXPROCS from network tests.

add stripped down printf, sys.printhex to runtime.

R=r
DELTA=355  (217 added, 36 deleted, 102 changed)
OCL=20017
CL=20019
2008-11-25 16:48:10 -08:00
Russ Cox 72154b042f go/acid/go
R=r
DELTA=99  (95 added, 1 deleted, 3 changed)
OCL=15983
CL=15992
2008-09-26 14:10:26 -07:00
Russ Cox a61bb95497 get rid of per-G Note, avoids per-G kernel semaphore on Mac.
2.14u 19.82s 22.17r 	 6.out 100000	# old
1.87u 0.43s 2.31r 	 6.out 100000	# new

R=r
OCL=15762
CL=15772
2008-09-24 14:13:07 -07:00
Russ Cox a67258f380 proper handling of signals.
do not run init on g0.

R=r
DELTA=161  (124 added, 23 deleted, 14 changed)
OCL=15490
CL=15497
2008-09-18 15:56:46 -07:00
Russ Cox 9350ef4eea add network listening & tests
R=r,presotto
OCL=15410
CL=15440
2008-09-17 13:49:23 -07:00
Russ Cox 376898ca8b go threads for OS X
R=r
OCL=14944
CL=15013
2008-09-09 11:50:14 -07:00
Rob Pike 24838a2df6 fix bug in stack limit calculation - was setting limit reg in wrong place.
R=ken
OCL=14981
CL=14981
2008-09-08 19:30:14 -07:00