Use the __vdso_clock_gettime fast path via the vDSO on linux/arm64 to
speed up nanotime and walltime. This results in the following
performance improvement for time.Now on Cavium ThunderX:
name old time/op new time/op delta
TimeNow 442ns ± 0% 163ns ± 0% -63.16% (p=0.000 n=10+10)
And benchmarks on VDSO
BenchmarkClockVDSOAndFallbackPaths/vDSO 10000000 166 ns/op
BenchmarkClockVDSOAndFallbackPaths/Fallback 3000000 456 ns/op
Change-Id: I326118c6dff865eaa0569fc45d1fc1ff95cb74f6
Reviewed-on: https://go-review.googlesource.com/99855
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This was originally C code using names with underscores, which were
retained when the code was rewritten into Go. Change the code to use
Go-like camel case names.
The names that come from the ELF ABI are left unchanged.
Change-Id: I181bc5dd81284c07bc67b7df4635f4734b41d646
Reviewed-on: https://go-review.googlesource.com/98520
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Use the __vdso_clock_gettime fast path via the vDSO on linux/arm to
speed up nanotime and walltime. This results in the following
performance improvement for time.Now on a RaspberryPi 3 (running
32bit Raspbian, i.e. GOOS=linux/GOARCH=arm):
name old time/op new time/op delta
TimeNow 0.99µs ± 0% 0.39µs ± 1% -60.74% (p=0.000 n=12+20)
Change-Id: I3598278a6c88d7f6a6ce66c56b9d25f9dd2f4c9a
Reviewed-on: https://go-review.googlesource.com/98095
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change adds support for accelerating time.Now by using
the __vdso_clock_gettime fast-path via the vDSO on linux/386
if it is available.
When the vDSO path to the clocks is available, it is typically
5x-10x faster than the syscall path (see benchmark extract
below). Two such calls are made for each time.Now() call
on most platforms as of go 1.9.
- Add vdso_linux_386.go, containing the ELF32 definitions
for use by vdso_linux.go, the maximum array size, and
the symbols to be located in the vDSO.
- Modify runtime.walltime and runtime.nanotime to check for
and use the vDSO fast-path if available, or fall back to
the existing syscall path.
- Reduce the stack reservations for runtime.walltime and
runtime.monotime from 32 to 16 bytes. It appears the syscall
path actually only needed 8 bytes, but 16 is now needed to
cover the syscall and vDSO paths.
- Remove clearing DX from the syscall paths as clock_gettime
only takes 2 args (BX, CX in syscall calling convention),
so there should be no need to clear DX.
The included BenchmarkTimeNow was run with -cpu=1 -count=20
on an "Intel(R) Celeron(R) CPU J1900 @ 1.99GHz", comparing
released go 1.9.1 vs this change. This shows a gain in
performance on linux/386 (6.89x), and that no regression
occurred on linux/amd64 due to this change.
Kernel: linux/i686, GOOS=linux GOARCH=386
name old time/op new time/op delta
TimeNow 978ns ± 0% 142ns ± 0% -85.48% (p=0.000 n=16+20)
Kernel: linux/x86_64, GOOS=linux GOARCH=amd64
name old time/op new time/op delta
TimeNow 125ns ± 0% 125ns ± 0% ~ (all equal)
Gains are more dramatic in virtualized environments,
presumably due to the overhead of virtualizing the syscall.
Fixes#22190
Change-Id: I2f83ce60cb1b8b310c9ced0706bb463c1b3aedf8
Reviewed-on: https://go-review.googlesource.com/69390
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>