go/src/runtime
Russ Cox 28208eb8e3 [release-branch.go1.4] runtime: fix hang in GC due to shrinkstack vs netpoll race
««« CL 179680043 / 752cd9199639
runtime: fix hang in GC due to shrinkstack vs netpoll race

During garbage collection, after scanning a stack, we think about
shrinking it to reclaim some memory. The shrinking code (called
while the world is stopped) checked that the status was Gwaiting
or Grunnable and then changed the state to Gcopystack, to essentially
lock the stack so that no other GC thread is scanning it.
The same locking happens for stack growth (and is more necessary there).

        oldstatus = runtime·readgstatus(gp);
        oldstatus &= ~Gscan;
        if(oldstatus == Gwaiting || oldstatus == Grunnable)
                runtime·casgstatus(gp, oldstatus, Gcopystack); // oldstatus is Gwaiting or Grunnable
        else
                runtime·throw("copystack: bad status, not Gwaiting or Grunnable");

Unfortunately, "stop the world" doesn't stop everything. It stops all
normal goroutine execution, but the network polling thread is still
blocked in epoll and may wake up. If it does, and it chooses a goroutine
to mark runnable, and that goroutine is the one whose stack is shrinking,
then it can happen that between readgstatus and casgstatus, the status
changes from Gwaiting to Grunnable.

casgstatus assumes that if the status is not what is expected, it is a
transient change (like from Gwaiting to Gscanwaiting and back, or like
from Gwaiting to Gcopystack and back), and it loops until the status
has been restored to the expected value. In this case, the status has
changed semi-permanently from Gwaiting to Grunnable - it won't
change again until the GC is done and the world can continue, but the
GC is waiting for the status to change back. This wedges the program.

To fix, call a special variant of casgstatus that accepts either Gwaiting
or Grunnable as valid statuses.

Without the fix bug with the extra check+throw in casgstatus, the
program below dies in a few seconds (2-10) with GOMAXPROCS=8
on a 2012 Retina MacBook Pro. With the fix, it runs for minutes
and minutes.

package main

import (
        "io"
        "log"
        "net"
        "runtime"
)

func main() {
        const N = 100
        for i := 0; i < N; i++ {
                l, err := net.Listen("tcp", "127.0.0.1:0")
                if err != nil {
                        log.Fatal(err)
                }
                ch := make(chan net.Conn, 1)
                go func() {
                        var err error
                        c1, err := net.Dial("tcp", l.Addr().String())
                        if err != nil {
                                log.Fatal(err)
                        }
                        ch <- c1
                }()
                c2, err := l.Accept()
                if err != nil {
                        log.Fatal(err)
                }
                c1 := <-ch
                l.Close()
                go netguy(c1, c2)
                go netguy(c2, c1)
                c1.Write(make([]byte, 100))
        }
        for {
                runtime.GC()
        }
}

func netguy(r, w net.Conn) {
        buf := make([]byte, 100)
        for {
                bigstack(1000)
                _, err := io.ReadFull(r, buf)
                if err != nil {
                        log.Fatal(err)
                }
                w.Write(buf)
        }
}

var g int

func bigstack(n int) {
        var buf [100]byte
        if n > 0 {
                bigstack(n - 1)
        }
        g = int(buf[0]) + int(buf[99])
}

Fixes #9186.

LGTM=rlh
R=austin, rlh
CC=dvyukov, golang-codereviews, iant, khr, r
https://golang.org/cl/179680043
»»»

TBR=rlh
CC=golang-codereviews
https://golang.org/cl/184030043
2014-12-01 16:42:41 -05:00
..
cgo runtime/cgo: add +build tags to files named for $GOOS 2014-11-09 20:20:45 -05:00
debug runtime: add PauseEnd array to MemStats and GCStats 2014-10-28 12:35:25 -04:00
pprof runtime/pprof: fix memory profiler test 2014-10-17 21:28:47 +04:00
race [release-branch.go1.4] runtime: fix atomic operations on non-heap addresses 2014-11-20 10:14:49 -05:00
Makefile
alg.go
append_test.go
arch_386.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
arch_386.h
arch_amd64.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
arch_amd64.h
arch_amd64p32.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
arch_amd64p32.h
arch_arm.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
arch_arm.h
asm.s cmd/gc: turn Go prototypes into ptr liveness maps for assembly functions 2014-09-12 00:18:20 -04:00
asm_386.s runtime: change top-most return PC from goexit to goexit+PCQuantum 2014-10-29 20:37:44 -04:00
asm_amd64.s runtime: change top-most return PC from goexit to goexit+PCQuantum 2014-10-29 20:37:44 -04:00
asm_amd64p32.s runtime: change top-most return PC from goexit to goexit+PCQuantum 2014-10-29 20:37:44 -04:00
asm_arm.s runtime: change top-most return PC from goexit to goexit+PCQuantum 2014-10-29 20:37:44 -04:00
atomic.go
atomic_386.c
atomic_amd64x.c
atomic_arm.go
cgocall.go runtime: keep g->syscallsp consistent after cgo->Go callbacks 2014-09-24 13:20:25 -04:00
cgocall.h
cgocallback.go runtime: fix _cgo_allocate(0) 2014-10-07 16:27:40 -04:00
chan.go runtime: fix sudog leak 2014-11-16 16:44:45 -05:00
chan.h
chan_test.go runtime: dequeue the correct SudoG 2014-10-18 21:02:49 -07:00
closure_test.go
compiler.go
complex.go
complex_test.go
cpuprof.go
crash_cgo_test.go runtime: make TestCgoExternalThreadPanic run on windows 2014-10-30 10:24:37 +11:00
crash_test.go runtime: fix unrecovered panic on external thread 2014-10-28 21:53:09 -04:00
debug.go runtime: always run semacquire on the G stack 2014-09-16 17:26:16 -07:00
defs.c
defs1_linux.go
defs2_linux.go
defs_android_arm.h
defs_arm_linux.go
defs_darwin.go
defs_darwin_386.h
defs_darwin_amd64.h
defs_dragonfly.go
defs_dragonfly_386.h
defs_dragonfly_amd64.h
defs_freebsd.go
defs_freebsd_386.h
defs_freebsd_amd64.h
defs_freebsd_arm.h
defs_linux.go
defs_linux_386.h
defs_linux_amd64.h
defs_linux_arm.h
defs_nacl_386.h
defs_nacl_amd64p32.h
defs_nacl_arm.h
defs_netbsd.go
defs_netbsd_386.go
defs_netbsd_386.h
defs_netbsd_amd64.go
defs_netbsd_amd64.h
defs_netbsd_arm.go
defs_netbsd_arm.h
defs_openbsd.go
defs_openbsd_386.h
defs_openbsd_amd64.h
defs_plan9_386.h
defs_plan9_amd64.h
defs_solaris.go
defs_solaris_amd64.go
defs_solaris_amd64.h
defs_windows.go runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
defs_windows_386.h runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
defs_windows_amd64.h runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
env_plan9.go runtime: handle non-nil-terminated environment strings on Plan 9 2014-10-20 23:03:03 +02:00
env_posix.go os, syscall: add Unsetenv 2014-10-01 11:17:15 -07:00
error.go runtime: delete panicstring; move its checks into gopanic 2014-09-18 14:49:24 -04:00
export_futex_test.go
export_test.go runtime: make gostringnocopy update maxstring 2014-09-11 16:53:34 -07:00
extern.go runtime: update comment for Callers 2014-10-29 15:14:04 -04:00
float.c
funcdata.h doc/asm: explain coordination with garbage collector 2014-10-28 15:51:06 -04:00
futex_test.go
gc_test.go
gcinfo_test.go
hash_test.go
hashmap.go runtime: map iterators: always use intrabucket randomess 2014-09-09 14:22:58 -07:00
hashmap_fast.go
heapdump.c runtime: update URL for heap dump format 2014-11-16 14:25:33 -05:00
iface.go
iface_test.go
lfstack.c
lfstack_test.go
lock_futex.go
lock_sema.go
malloc.c cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
malloc.go [release-branch.go1.4] runtime: remove assumption that noptrdata data bss noptrbss are ordered and contiguous 2014-11-19 15:31:31 -05:00
malloc.h runtime: add PauseEnd array to MemStats and GCStats 2014-10-28 12:35:25 -04:00
malloc_test.go
map_test.go runtime: try harder to get different iteration orders. 2014-09-15 12:30:57 -07:00
mapspeed_test.go
mcache.c cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
mcentral.c
mem.go runtime: add PauseEnd array to MemStats and GCStats 2014-10-28 12:35:25 -04:00
mem_darwin.c
mem_dragonfly.c
mem_freebsd.c
mem_linux.c
mem_nacl.c
mem_netbsd.c
mem_openbsd.c
mem_plan9.c runtime: more NOPTR 2014-09-24 19:04:06 -04:00
mem_solaris.c
mem_windows.c runtime: fix Windows SysUsed 2014-09-18 20:41:00 -04:00
memclr_386.s runtime: fix windows/386 build 2014-09-09 17:12:05 -04:00
memclr_amd64.s runtime: fix windows/386 build 2014-09-09 17:12:05 -04:00
memclr_arm.s
memclr_plan9_386.s
memclr_plan9_amd64.s
memmove_386.s
memmove_amd64.s
memmove_arm.s
memmove_linux_amd64_test.go
memmove_nacl_amd64p32.s
memmove_plan9_386.s
memmove_plan9_amd64.s
memmove_test.go
mfinal_test.go runtime: update docs, code for SetFinalizer 2014-10-06 14:18:09 -04:00
mfixalloc.c
mgc0.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
mgc0.go runtime: fix sudog leak 2014-11-16 16:44:45 -05:00
mgc0.h runtime: add comment to mgc0.h 2014-10-09 17:05:38 +04:00
mheap.c runtime: account for tiny allocs, for testing.AllocsPerRun 2014-09-17 14:49:32 -04:00
mknacl.sh
mprof.go runtime: avoid gentraceback of self on user goroutine stack 2014-11-05 23:01:48 -05:00
msize.c
netpoll.go
netpoll_epoll.go
netpoll_kqueue.go
netpoll_nacl.go
netpoll_solaris.c
netpoll_stub.c
netpoll_windows.c
noasm_arm.go
norace_test.go
os_android.c all: use golang.org/x/... import paths 2014-11-10 09:15:57 +11:00
os_android.h
os_darwin.c runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames 2014-09-09 13:39:57 -04:00
os_darwin.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_darwin.h
os_dragonfly.c runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames 2014-09-09 13:39:57 -04:00
os_dragonfly.go
os_dragonfly.h
os_freebsd.c runtime: fix build failures after CL 137410043 2014-09-09 14:02:37 -04:00
os_freebsd.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_freebsd.h
os_freebsd_arm.c
os_linux.c runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames 2014-09-09 13:39:57 -04:00
os_linux.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_linux.h
os_linux_386.c
os_linux_arm.c
os_nacl.c runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames 2014-09-09 13:39:57 -04:00
os_nacl.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_nacl.h
os_nacl_arm.c
os_netbsd.c runtime: assume precisestack, copystack, StackCopyAlways, ScanStackByFrames 2014-09-09 13:39:57 -04:00
os_netbsd.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_netbsd.h
os_netbsd_386.c
os_netbsd_amd64.c
os_netbsd_arm.c
os_openbsd.c runtime: cleanup openbsd semasleep implementation 2014-09-09 17:41:48 -07:00
os_openbsd.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_openbsd.h
os_plan9.c runtime: call rfork on scheduler stack on Plan 9 2014-09-09 17:19:01 -07:00
os_plan9.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_plan9.h cmd/cc, runtime: disallow structs without tags 2014-10-03 12:44:20 -04:00
os_plan9_386.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
os_plan9_amd64.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
os_solaris.c runtime: fix solaris build 2014-09-14 22:20:01 -04:00
os_solaris.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_solaris.h
os_windows.c runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
os_windows.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
os_windows.h
os_windows_386.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
os_windows_386.go
os_windows_amd64.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
os_windows_amd64.go
panic.c runtime: clear Defer.panic before removing from G.defer list 2014-10-07 23:17:31 -04:00
panic.go runtime: clear Defer.fn before removing from the G.defer list 2014-10-08 00:03:50 -04:00
parfor.c runtime: remove untyped allocation of ParFor 2014-09-16 11:03:11 -04:00
parfor_test.go
print1.go cmd/gc: avoid use of goprintf 2014-10-28 21:52:53 -04:00
proc.c [release-branch.go1.4] runtime: fix hang in GC due to shrinkstack vs netpoll race 2014-12-01 16:42:41 -05:00
proc.go runtime: fix sudog leak 2014-11-16 16:44:45 -05:00
proc_test.go
race.c [release-branch.go1.4] runtime: fix atomic operations on non-heap addresses 2014-11-20 10:14:49 -05:00
race.go liblink, runtime: diagnose and fix C code running on Go stack 2014-09-08 14:05:23 -04:00
race.h
race0.go
race_amd64.s [release-branch.go1.4] runtime: fix atomic operations on non-heap addresses 2014-11-20 10:14:49 -05:00
rdebug.go
rt0_android_arm.s
rt0_darwin_386.s
rt0_darwin_amd64.s
rt0_dragonfly_386.s
rt0_dragonfly_amd64.s
rt0_freebsd_386.s
rt0_freebsd_amd64.s
rt0_freebsd_arm.s
rt0_linux_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
rt0_linux_amd64.s
rt0_linux_arm.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
rt0_nacl_386.s
rt0_nacl_amd64p32.s runtime: disable fake time on nacl 2014-10-27 20:47:15 -04:00
rt0_nacl_arm.s
rt0_netbsd_386.s
rt0_netbsd_amd64.s
rt0_netbsd_arm.s
rt0_openbsd_386.s
rt0_openbsd_amd64.s
rt0_plan9_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
rt0_plan9_amd64.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
rt0_solaris_amd64.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
rt0_windows_386.s runtime: more NOPTR 2014-09-24 17:50:44 -04:00
rt0_windows_amd64.s runtime: more NOPTR 2014-09-24 17:50:44 -04:00
rune.go
runtime-gdb.py
runtime.c runtime: add GODEBUG invalidptr setting 2014-10-28 21:53:31 -04:00
runtime.go cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
runtime.h [release-branch.go1.4] runtime: fix hang in GC due to shrinkstack vs netpoll race 2014-12-01 16:42:41 -05:00
runtime_linux_test.go
runtime_test.go runtime: be very careful with bad pointer tests 2014-09-20 23:31:11 -07:00
runtime_unix_test.go
select.go runtime: fix sudog leak 2014-11-16 16:44:45 -05:00
sema.go runtime: fix sudog leak 2014-11-16 16:44:45 -05:00
signal.c
signal_386.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
signal_amd64x.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
signal_android_386.h
signal_android_arm.h
signal_arm.c runtime: fix line number in first stack frame in printed stack trace 2014-10-29 15:14:24 -04:00
signal_darwin_386.h
signal_darwin_amd64.h
signal_dragonfly_386.h
signal_dragonfly_amd64.h
signal_freebsd_386.h
signal_freebsd_amd64.h
signal_freebsd_arm.h
signal_linux_386.h
signal_linux_amd64.h
signal_linux_arm.h
signal_nacl_386.h
signal_nacl_amd64p32.h
signal_nacl_arm.h
signal_netbsd_386.h
signal_netbsd_amd64.h
signal_netbsd_arm.h
signal_openbsd_386.h
signal_openbsd_amd64.h
signal_solaris_amd64.h
signal_unix.c
signal_unix.go
signal_unix.h
signals_android.h
signals_darwin.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_dragonfly.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_freebsd.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_linux.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_nacl.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_netbsd.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_openbsd.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_plan9.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_solaris.h cmd/cc, cmd/ld, runtime: disallow conservative data/bss objects 2014-09-24 16:55:26 -04:00
signals_windows.h
sigpanic_unix.go liblink, runtime: diagnose and fix C code running on Go stack 2014-09-08 14:05:23 -04:00
sigqueue.go
slice.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
softfloat64.go
softfloat64_test.go
softfloat_arm.c
sqrt.go
stack.c [release-branch.go1.4] runtime: fix hang in GC due to shrinkstack vs netpoll race 2014-12-01 16:42:41 -05:00
stack.go
stack.h runtime: change minimum stack size to 2K. 2014-09-17 08:32:15 -07:00
stack_test.go runtime: reenable TestStackGrowth on 32-bit systems 2014-09-16 17:46:25 -04:00
string.c runtime: make gostringnocopy update maxstring 2014-09-11 16:53:34 -07:00
string.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
string_test.go runtime: make gostringnocopy update maxstring 2014-09-11 16:53:34 -07:00
stubs.go runtime: avoid gentraceback of self on user goroutine stack 2014-11-05 23:01:48 -05:00
symtab.go runtime: fix endianness assumption when decoding ftab 2014-10-27 17:12:48 -04:00
symtab_test.go
sys_arm.c
sys_darwin_386.s
sys_darwin_amd64.s
sys_dragonfly_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
sys_dragonfly_amd64.s
sys_freebsd_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
sys_freebsd_amd64.s
sys_freebsd_arm.s
sys_linux_386.s
sys_linux_amd64.s
sys_linux_arm.s
sys_nacl_386.s
sys_nacl_amd64p32.s runtime: add fake time support back. 2014-10-27 20:35:15 -04:00
sys_nacl_arm.s
sys_netbsd_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
sys_netbsd_amd64.s
sys_netbsd_arm.s
sys_openbsd_386.s runtime: more NOPTR 2014-09-24 19:04:06 -04:00
sys_openbsd_amd64.s
sys_plan9_386.s runtime: call rfork on scheduler stack on Plan 9 2014-09-09 17:19:01 -07:00
sys_plan9_amd64.s runtime: save correct pid for new m's on plan9/amd64 2014-09-12 01:21:51 -07:00
sys_solaris_amd64.s runtime: fix build failures after CL 137410043 2014-09-09 14:02:37 -04:00
sys_windows_386.s runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
sys_windows_amd64.s runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
sys_x86.c
syscall_nacl.h
syscall_solaris.c
syscall_solaris.go runtime: fix solaris build 2014-09-14 22:20:01 -04:00
syscall_windows.go runtime: fix parameter checking in syscall.NewCallback 2014-09-15 12:58:28 +10:00
syscall_windows_test.go runtime: handle all windows exception (second attempt) 2014-10-15 11:11:11 +11:00
thunk.s os, syscall: add Unsetenv 2014-10-01 11:17:15 -07:00
thunk_solaris_amd64.s
thunk_windows.s runtime: convert syscall_windows.c to Go 2014-09-14 21:25:44 -04:00
time.go runtime: add fake time support back. 2014-10-27 20:35:15 -04:00
tls_arm.s liblink: generate MRC replacement in liblink, not tls_arm 2014-09-30 10:03:10 +10:00
traceback.go runtime: avoid gentraceback of self on user goroutine stack 2014-11-05 23:01:48 -05:00
type.h runtime: remove type-punning for Type.gc[0], gc[1] 2014-10-07 11:06:51 -04:00
typekind.go runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
typekind.h runtime: remove duplicated Go constants 2014-09-16 10:22:15 -04:00
vdso_linux_amd64.c cmd/cc, runtime: disallow structs without tags 2014-10-03 12:44:20 -04:00
vlop_386.s
vlop_arm.s
vlop_arm_test.go
vlrt.c
vlrt.go