If the map contains 8 or fewer entries, it is wasteful to have a
directory that points to a table that points to a group.
Add a special case that replaces the directory with a direct pointer to
a group.
We could theoretically do similar for single table maps (no directory,
just point directly to a table), but that is left for later.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I6fc04dfc11c31dadfe5b5d6481b4c4abd43d48ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/611188
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Placed the fipsIndicator field in some 64-bit alignment padding in the g
struct to avoid growing per-goroutine memory requirements on 64-bit
targets.
Fixes#69911
Updates #69536
Change-Id: I176419d0e3814574758cb88a47340a944f405604
Reviewed-on: https://go-review.googlesource.com/c/go/+/620795
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Derek Parker <parkerderek86@gmail.com>
A GODEBUG is actually a security risk here: most programs will start to
ignore errors from Read because they can't happen (which is the intended
behavior), but then if a program is run with GODEBUG=randcrash=0 it will
use a partial buffer in case an error occurs, which may be catastrophic.
Note that the proposal was accepted without the GODEBUG, which was only
added later.
This (partially) reverts CL 608435. I kept the tests.
Updates #66821
Change-Id: I3fd20f9cae0d34115133fe935f0cfc7a741a2662
Reviewed-on: https://go-review.googlesource.com/c/go/+/622115
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
This change finally fully fixes mallocgc for asan after the recent
refactoring. Here is everything that changed:
Fix the accounting for the alloc header; large objects don't have them.
Mask out extra bits set from unrolling the bitmap for slice backing
stores in writeHeapBitsSmall. The redzone in asan mode makes it so that
dataSize is no longer an exact multiple of typ.Size_ in this case (a
new assumption I have recently discovered) but we didn't mask out any
extra bits, so we'd accidentally set bits in other allocations. Oops.
Move the initHeapBits optimization for the 8-byte scan sizeclass on
64-bit platforms up to mallocgc, out from writeHeapBitsSmall. So, this
actually caused a problem with asan when the optimization first landed,
but we missed it. The issue was then masked once we started passing the
redzone down into writeHeapBitsSmall, since the optimization would no
longer erroneously fire on asan. What happened was that dataSize would
be 8 (because that was the user-provided alloc size) so we'd skip
writing heap bits, but it would turn out the redzone bumped the size
class, so we'd actually *have* to write the heap bits for that size
class. This is not really a problem now *but* it caused problems for me
when debugging, since I would try to remove the red zone from dataSize
and this would trigger this bug again. Ultimately, this whole situation
is confusing because the check in writeHeapBitsSmall is *not* the same
as the check in initHeapBits. By moving this check up to mallocgc, we
can make the checks align better by matching on the sizeclass, so this
should be less error-prone in the future.
Change-Id: I1e9819223be23f722f3bf21e63e812f5fb557194
Reviewed-on: https://go-review.googlesource.com/c/go/+/622041
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sometimes the runtime needs to reserve some memory with a large
alignment, which the OS usually won't directly satisfy. So, it
asks size+align bytes instead, and frees the unaligned portions.
On sbrk systems, this doesn't work that well, as freeing the tail
portion doesn't really free the memory to the OS. Instead, we
could simply round the current break up, then reserve the given
size, without wasting the tail portion.
Also, don't create heap arena hints on sbrk systems. We can only
grow the break sequentially, and reserving specific addresses
would not succeed anyway.
For #69018.
Change-Id: Iadc2c54d62b00ad7befa5bbf71146523483a8c47
Reviewed-on: https://go-review.googlesource.com/c/go/+/621715
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Document that Caller and Frame.File always use forward slashes
as path separators, even on Windows.
Fixes#3335
Change-Id: Ic5bbf8a1f14af64277dca4783176cd8f70726b91
Reviewed-on: https://go-review.googlesource.com/c/go/+/603275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Goroutine profiles require checking in with the profiler before any
goroutine starts running. coroswitch is a place where a goroutine may
start running, but where we do not check in with the profiler, which
leads to crashes. Fix this by checking in with the profiler the same way
execute does.
Fixes#69998.
Change-Id: Idef6dd31b70a73dd1c967b56c307c7a46a26ba73
Reviewed-on: https://go-review.googlesource.com/c/go/+/622016
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
A previous CL broke the ASAN poisoning calculation in mallocgc by not
taking into account a possible allocation header, so the beginning of
the following allocation could have been poisoned.
This mostly isn't a problem, actually, since the following slot would
usually just have an allocation header in it that programs shouldn't be
touching anyway, but if we're going a word-past-the-end at the end of a
span, we could be poisoning a valid heap allocation.
Change-Id: I76a4f59bcef01af513a1640c4c212c0eb6be85b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/622295
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
We were missing a case for calling a C function with an index
into a pointer-to-array.
Fixes#70016
Change-Id: I9c74d629e58722813c1aaa0f0dc225a5a64d111b
Reviewed-on: https://go-review.googlesource.com/c/go/+/621576
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When the kernel parameter ptrace_scope is set to 2 or 3,
certain test cases in runtime-gdb_test.go will fail.
We should skip these tests.
Fixes#69932
Change-Id: I685d1217f1521d7f8801680cf6b71d8e7a265188
GitHub-Last-Rev: 063759e04c
GitHub-Pull-Request: golang/go#69933
Reviewed-on: https://go-review.googlesource.com/c/go/+/620857
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change some vars to consts, remove some unneeded string conversions.
Change-Id: Ib12eed11ef080c4b593c8369bb915117e7100045
Reviewed-on: https://go-review.googlesource.com/c/go/+/621838
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
For #51026Fixes#69971
Change-Id: I47f2938d20cbe9462bf738a506baedad4a7006c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/621837
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: I27bf98e84545746d90948dd06c4a7bd70782c49d
Reviewed-on: https://go-review.googlesource.com/c/go/+/621895
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Check presence of LSE support on ARM64 chip if we targeted it at compile time.
Related to #69124
Update #60905
Change-Id: I6fe244decbb4982548982e1f88376847721a33c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/610195
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Shu-Chun Weng <scw@google.com>
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap
Change-Id: I6695c0b143560d974b710e1d78e7a7d09278f7cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/620215
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
CL 476717 adopted the memory management mechanism on Plan 9 to
manage Wasm's linear memory. But the Plan 9 code uses global
variable bloc and blocMax to keep track of the runtime's and the
OS's sense of break, whereas the Wasm sbrk function doesn't use
those global variables, and directly goes to grow the linear
memory instead. This causes that if there is any unused portion at
the end of the linear memory, the runtime doesn't use it. This CL
fixes it, adopts the same mechanism as the Plan 9 code.
In particular, the runtime is not aware of any unused initial
memory at startup. Therefore, (most of) the extra initial memory
set by the linker are not actually used. This CL fixes this as
well.
For #69018.
Change-Id: I2ea6a138310627eda5f19a1c76b1e1327362e5f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/621635
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This change switches isSending to be an atomic.Int32 instead of an
atomic.Uint8. The Int32 version is managed as a counter, which is
something that we couldn't do with Uint8 without adding a new intrinsic
which may not be available on all architectures.
That is, instead of only being able to support 8 concurrent timer
firings on the same timer because we only have 8 independent bits to set
for each concurrent timer firing, we can now have 2^31-1 concurrent
timer firings before running into any issues. Like the fact that each
bit-set was matched with a clear, here we match increments with
decrements to indicate that we're in the "sending on a channel" critical
section in the timer code, so we can report the correct result back on
Stop or Reset.
We choose an Int32 instead of a Uint32 because it's easier to check for
obviously bad values (negative values are always bad) and 2^31-1
concurrent timer firings should be enough for anyone.
Previously, we avoided anything bigger than a Uint8 because we could
pack it into some padding in the runtime.timer struct. But it turns out
that the type that actually matters, runtime.timeTimer, is exactly 96
bytes in size. This means its in the next size class up in the 112 byte
size class because of an allocation header. We thus have some free space
to work with. This change increases the size of this struct from 96
bytes to 104 bytes.
(I'm not sure if runtime.timer is often allocated directly, but if it
is, we get lucky in the same way too. It's exactly 80 bytes in size,
which means its in the 96-byte size class, leaving us with some space to
work with.)
Fixes#69969.
Related to #69880 and #69312.
Change-Id: I9fd59cb6a69365c62971d1f225490a65c58f3e77
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/621616
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Remove linkname directives that are no longer necessary given
parquet-go/parquet-go#142 removes the dependency on the `memhash{32,64}`
functions.
This change also removes references to segmentio/parquet-go since that
repository was archived in favor of parquet-go/parquet-go.
Updates #67401
Change-Id: Ibafb0c41b39cdb86dac5531f62787fb5cb8d3f01
GitHub-Last-Rev: e14c4e4dfe
GitHub-Pull-Request: golang/go#67784
Reviewed-on: https://go-review.googlesource.com/c/go/+/589795
Auto-Submit: Ian Lance Taylor <iant@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
This is a peace-of-mind change to make sure that delayed-zeroed memory
(in the large alloc case) is globally visible from the moment the
allocation is published back to the caller.
The way it's written right now is good enough for the garbage collector
(we already have a publication barrier for a nil span.largeType, so the
GC will ignore the noscan span) but this might matter for user code on
weak memory architectures.
Change-Id: I06ac9b95863074e5f09382629083b19bfa87fdb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/619036
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Last CL we separated mallocgc into several specialized paths. Let's
split up heapSetType too. This will make the specialized heapSetType
functions inlineable and cut out some branches as well as a function
call.
Microbenchmark results at this point in the stack:
│ before.out │ after-5.out │
│ sec/op │ sec/op vs base │
Malloc8-4 13.52n ± 3% 12.15n ± 2% -10.13% (p=0.002 n=6)
Malloc16-4 21.49n ± 2% 18.32n ± 4% -14.75% (p=0.002 n=6)
MallocTypeInfo8-4 27.12n ± 1% 18.64n ± 2% -31.30% (p=0.002 n=6)
MallocTypeInfo16-4 28.71n ± 3% 21.63n ± 5% -24.65% (p=0.002 n=6)
geomean 21.81n 17.31n -20.64%
Change-Id: I5de9ac5089b9eb49bf563af2a74e6dc564420e05
Reviewed-on: https://go-review.googlesource.com/c/go/+/614795
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Right now mallocgc is a monster of a function. In real programs, we see
that a substantial amount of time in mallocgc is spent in mallocgc
itself. It's very branch-y, holds a lot of state, and handles quite a few
disparate cases, trying to merge them together.
This change breaks apart mallocgc into separate, leaner functions.
There's some duplication now, but there are a lot of branches that can
be pruned as a result.
There's definitely still more we can do here. heapSetType can be inlined
and broken down for each case, since its internals roughly map to each
case anyway (done in a follow-up CL). We can probably also do more with
the size class lookups, since we know more about the size of the object
in each case than before.
Below are the savings for the full stack up until now.
│ after-3.out │ after-4.out │
│ sec/op │ sec/op vs base │
Malloc8-4 13.32n ± 2% 12.17n ± 1% -8.63% (p=0.002 n=6)
Malloc16-4 21.64n ± 3% 19.38n ± 10% -10.47% (p=0.002 n=6)
MallocTypeInfo8-4 23.15n ± 2% 19.91n ± 2% -14.00% (p=0.002 n=6)
MallocTypeInfo16-4 25.86n ± 4% 22.48n ± 5% -13.11% (p=0.002 n=6)
MallocLargeStruct-4 270.0n ± ∞ ¹
geomean 20.38n 30.97n -11.58%
Change-Id: I681029c0b442f9221c4429950626f06299a5cfe4
Reviewed-on: https://go-review.googlesource.com/c/go/+/614257
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change breaks out the debug.malloc codepaths into dedicated
functions, both for making mallocgc easier to read, and to reduce the
function's size (currently all that code is inlined and really doesn't
need to be).
This is a microoptimization that on its own changes very little, but
together with other optimizations and a breaking up of the various
malloc paths will matter all together ("death by a thousand cuts").
Change-Id: I30b3ab4a1f349ba85b4a1b5b2c399abcdfe4844f
Reviewed-on: https://go-review.googlesource.com/c/go/+/617879
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
These debug checks are very occasionally helpful, but they do cost real
time. The biggest issue seems to be the bloat of mallocgc due to the
"throw" paths. Overall, after some follow-ups, this change cuts about
1ns off of the mallocgc fast path.
This is a microoptimization that on its own changes very little, but
together with other optimizations and a breaking up of the various
malloc paths will matter all together ("death by a thousand cuts").
Change-Id: I07c4547ad724b9f94281320846677fb558957721
Reviewed-on: https://go-review.googlesource.com/c/go/+/617878
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
shouldhelpgc is a very unhelpful name, because it has nothing to do with
assists and solely to do with GC triggering. Name it checkGCTrigger
instead, which is much clearer.
Change-Id: Id38debd424ddb397376c0cea6e74b3fe94002f71
Reviewed-on: https://go-review.googlesource.com/c/go/+/617877
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change stops tracking assistG across malloc to reduce number of
slots the compiler must keep track of in mallocgc, which adds to
register pressure. It also makes the call to deductAssistCredit only
happen if the GC is running.
This is a microoptimization that on its own changes very little, but
together with other optimizations and a breaking up of the various
malloc paths will matter all together ("death by a thousand cuts").
Change-Id: I4cfac7f3e8e873ba66ff3b553072737a4707e2c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/617876
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
This is an allocator microoptimization. There's no reason to check
gcphase in general, since it's mostly for debugging anyway.
writeBarrier.enabled is set in all the same cases here, and we force one
fewer cache line (probably) to be touched during malloc.
Conceptually, it also makes a bit more sense. The allocate-black policy
is partly informed by the write barrier design.
Change-Id: Ia5ff593d64c29cf7f4d1bced3204056566444a98
Reviewed-on: https://go-review.googlesource.com/c/go/+/617875
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Checking whether the current allocation needs to be profiled is
currently branch-y and weirdly a lot of code. The branches are
generally predictable, but it's a surprising number of instructions.
Part of the problem is that MemProfileRate is just a global that can be
set at any time, so we need to load it and check certain settings
explicitly. In an ideal world, we would just always subtract from
nextSample and have a single branch to take the slow path if we
subtract below zero.
If MemProfileRate were a function, we could trash all the nextSample
values intentionally in each mcache. This would be slow, but
MemProfileRate changes rarely while the malloc hot path is well, hot.
Unfortunate...
Although this ideal world is, AFAICT, impossible, we can still get
close. If we cache the value of MemProfileRate in each mcache, then we
can force malloc to take the slow path whenever MemProfileRate changes.
This does require two additional loads, but crucially, these loads are
independent of everything else in mallocgc. Furthermore, the branch
dependent on those loads is incredibly predictable in practice.
This CL on its own has little-to-no impact on mallocgc. But this
codepath is going to be duplicated in several places in the next CL, so
it'll pay to simplify it. Also, we're very much trying to remedy a
death-by-a-thousand-cuts situation, and malloc is currently still kind
of a monster -- it will not help if mallocgc isn't really streamlined
itself.
Lastly, there's a nice property now that all nextSample values get
immediately re-sampled when MemProfileRate changes.
Change-Id: I6443d0cf9bd7861595584442b675ac1be8ea3455
Reviewed-on: https://go-review.googlesource.com/c/go/+/615815
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
This change brings back a minor optimization lost in the Go 1.22 cycle
wherein the 8-byte pointer-ful span class spans would have the pointer
bitmap written ahead of time in bulk, because there's only one possible
pattern.
│ before │ after │
│ sec/op │ sec/op vs base │
MallocTypeInfo8-4 25.13n ± 1% 23.59n ± 2% -6.15% (p=0.002 n=6)
Change-Id: I135b84bb1d5b7e678b841b56430930bc73c0a038
Reviewed-on: https://go-review.googlesource.com/c/go/+/614256
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
For whatever reason, span.heapBits is kind of slow. It accounts for
about a quarter of the cost of writeHeapBitsSmall, which is absurd. We
get a nice speed improvement for small allocations by eliminating this
call.
│ before │ after │
│ sec/op │ sec/op vs base │
MallocTypeInfo16-4 29.47n ± 1% 27.02n ± 1% -8.31% (p=0.002 n=6)
Change-Id: I6270e26902e5a9254cf1503fac81c3c799c59d6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/614255
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Extendible hashing splits a swisstable map into many swisstables. This
keeps grow operations small.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap
Change-Id: Id91f34af9e686bf35eb8882ee479956ece89e821
Reviewed-on: https://go-review.googlesource.com/c/go/+/604936
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Based on the benchmarks in github.com/cockroachlabs/swiss.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I9ad925d3272c671e21ec04eb2da5ebd8f0fc6a28
Reviewed-on: https://go-review.googlesource.com/c/go/+/596295
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
src/runtime/testdata/testprogcgo/threadprof.go contains C code with a
variable called nullptr. This conflicts with the nullptr keyword in
the C23 revision of the C standard (showing up as gccgo test build
failures when updating GCC to use C23 by default when building C
code).
Rename that variable to nullpointer to avoid the clash with the
keyword (any other name that's not a keyword would work just as well).
Change-Id: Ida5ef371a3f856c611409884e185c3d5ded8e86c
GitHub-Last-Rev: 2ec464703b
GitHub-Pull-Request: golang/go#69927
Reviewed-on: https://go-review.googlesource.com/c/go/+/620955
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Use the new SwissTable-based map in internal/runtime/maps as the basis
for the runtime map when GOEXPERIMENT=swissmap.
Integration is complete enough to pass all.bash. Notable missing
features:
* Race integration / concurrent write detection
* Stack-allocated maps
* Specialized "fast" map variants
* Indirect key / elem
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-longtest-swissmap
Change-Id: Ie97b656b6d8e05c0403311ae08fef9f51756a639
Reviewed-on: https://go-review.googlesource.com/c/go/+/594596
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The Ticker Stop and Reset methods don't report a value,
so we don't need to track whether they are interrupting a send.
This includes a test that used to fail about 2% of the time on
my laptop when run under x/tools/cmd/stress.
Change-Id: Ic6d14b344594149dd3c24b37bbe4e42e83f9a9ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/620136
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
syscall.SyscallX consumes a lot of stack space, which is a problem
because they are nosplit functions. They used to use less stack space,
but CL 563315, that landed in Go 1.23, increased the stack usage by a
lot.
This CL reduces the stack usage back to the previous level.
Fixes#69813.
Change-Id: Iddedd28b693c66a258da687389768055c493fc2e
Reviewed-on: https://go-review.googlesource.com/c/go/+/618497
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Add a new package that will contain a new "Swiss Table"
(https://abseil.io/about/design/swisstables) map implementation, which
is intended to eventually replace the existing runtime map
implementation.
This implementation is based on the fabulous
github.com/cockroachdb/swiss package contributed by Peter Mattis.
This CL adds an hash map implementation. It supports all the core
operations, but does not have incremental growth.
For #54766.
Change-Id: I52cf371448c3817d471ddb1f5a78f3513565db41
Reviewed-on: https://go-review.googlesource.com/c/go/+/582415
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
AT_RANDOM is unfortunately used by libc before we run (so make sure it's
not cleared) but also is available to cgo programs after we did. It
would be unfortunate if a cgo program assumed it could use AT_RANDOM but
instead found all zeroes there.
Change-Id: I82eff34d8cf5a499b439052b7827b8ef7cabc21d
Reviewed-on: https://go-review.googlesource.com/c/go/+/608437
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
readRandom doesn't matter on Linux because of startupRand, but it does
on Windows and macOS. Windows already uses the same API as crypto/rand.
Switch macOS away from the /dev/urandom read.
Updates #68278
Cq-Include-Trybots: luci.golang.try:gotip-darwin-amd64_14
Change-Id: Ie8f105e35658a6f10ff68798d14883e3b212eb3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/608436
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
For #66821
Change-Id: I525c308d6d6243a2bc805e819dcf40b67e52ade5
Reviewed-on: https://go-review.googlesource.com/c/go/+/608435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Daniel McCarney <daniel@binaryparadox.net>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
This reverts commit 5b0f8596b7.
Reason for revert: This CL breaks gotip-linux-amd64-noopt builder.
Change-Id: I3950211f05c90e4955c0785409b796987741a9f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/617715
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
I've done some more testing of the new isSending field.
I'm not able to get more than 2 bits set. That said,
with this change it's significantly less likely to have even
2 bits set. The idea here is to clear the bit before possibly
locking the channel we are sending the value on, thus avoiding
some delay and some serialization.
For #69312
Change-Id: I8b5f167f162bbcbcbf7ea47305967f349b62b0f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/617497
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
While working on CL 611241 and CL 616375, I introduced a bug that wasn't
caught by any test. CL 611241 added more inline expansion at sample time
for block/mutex profile stacks collected via frame pointer unwinding.
CL 616375 then changed how inline expansion for those stacks is done at
reporting time. So some frames passed through multiple rounds of inline
expansion, and this lead to duplicate stack frames in some cases. The
stacks from TestBlockMutexProfileInlineExpansion looked like
sync.(*Mutex).Unlock
runtime/pprof.inlineF
runtime/pprof.inlineE
runtime/pprof.inlineD
runtime/pprof.inlineD
runtime.goexit
after those two CLs, and in particular after CL 616375. Note the extra
inlineD frame. The test didn't catch that since it was only looking for
a few frames in the stacks rather than checking the entire stacks.
This CL makes that test stricter by checking the entire expected stacks
rather than just a portion of the stacks.
Change-Id: I0acc739d826586e9a63a081bb98ef512d72cdc9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/617235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
I noticed in pprof that acquirem() was a bit of a hotspot. It turns out
that we can use the same trick that runtime.rand() does, and only
acquirem if we're doing something non-nosplit -- in this case, getting a
new state -- but otherwise just do getg().m, which is safe because we're
inside runtime and don't call split functions.
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ sec/op │ sec/op vs base │
ParallelGetRandom-16 2.651n ± 4% 2.416n ± 7% -8.87% (p=0.001 n=10)
│ B/s │ B/s vs base │
ParallelGetRandom-16 1.406Gi ± 4% 1.542Gi ± 6% +9.72% (p=0.001 n=10)
Change-Id: Iae075f4e298b923e499cd01adfabacab725a8684
Reviewed-on: https://go-review.googlesource.com/c/go/+/616738
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Over the years we've had various bugs in pprof stack handling resulting
in appendLocsForStack crashing because stk is too short for a cached
location. i.e., the cached location claims several inlined frames. Those
should always appear together in stk. If some frames are missing from
stk, appendLocsForStack.
If we find this case, replace the slice out of bounds panic with an
explicit panic that contains more context.
Change-Id: I52725a689baf42b8db627ce3e1bc6c654ef245d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/617135
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Updates #66779
Updates #69577
Change-Id: I0dea5a30aab87aaa443e7e6646c1d07aa865ac1c
GitHub-Last-Rev: 1cea46deb3
GitHub-Pull-Request: golang/go#69719
Reviewed-on: https://go-review.googlesource.com/c/go/+/616696
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>