The newly added functions create a copy of their input with all bytes in
invalid UTF-8 byte sequences mapped to the UTF-8 byte sequence
given as replacement parameter.
Fixes#25805
Change-Id: Iaf65f65b40c0581c6bb000f1590408d6628321d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/142003
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When string letters are all in lower/upper cases, both functions respectively
return original string.
Fixes#30987
Change-Id: Ie8d664f7af5e087f82c1bc156933e9a995645bf4
Reviewed-on: https://go-review.googlesource.com/c/go/+/171735
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL 56470 unindented bytes.Fields, but not strings.Fields. Do so now to
make it easier to diff the two functions for potential differences.
Change-Id: Ifef81f50cee64e8277e91efa5ec5521d8d21d3bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/170951
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When I was working on the fix for #31038 (make TrimSpace return nil on
all-space input) I noticed that there were no tests for TrimLeftFunc
and TrimRightFunc, including the funky nil behavior. So add some!
I've just reused the existing TrimFunc test cases for TrimLeftFunc and
TrimRightFunc, as well as adding new tests for the empty string and
all-trimmed cases (which test the nil-returning behavior of TrimFunc and
TrimLeftFunc).
Change-Id: Ib580d4364e9b3c91350305f9d9873080d7862904
Reviewed-on: https://go-review.googlesource.com/c/go/+/170061
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This change adds a fast path for ASCII strings to both
strings.TrimSpace and bytes.TrimSpace. It doesn't slow down the
non-ASCII path much, if at all.
I added benchmarks for strings.TrimSpace as it didn't have any, and
I fleshed out the benchmarks for bytes.TrimSpace as it just had one
case (for ASCII). The benchmarks (and the code!) are now the same
between the two versions. Below are the benchmark results:
strings.TrimSpace:
name old time/op new time/op delta
TrimSpace/NoTrim-8 18.6ns ± 0% 3.8ns ± 0% -79.53% (p=0.000 n=5+4)
TrimSpace/ASCII-8 33.5ns ± 2% 6.0ns ± 3% -82.05% (p=0.008 n=5+5)
TrimSpace/SomeNonASCII-8 97.1ns ± 1% 88.6ns ± 1% -8.68% (p=0.008 n=5+5)
TrimSpace/JustNonASCII-8 144ns ± 0% 143ns ± 0% ~ (p=0.079 n=4+5)
bytes.TrimSpace:
name old time/op new time/op delta
TrimSpace/NoTrim-8 18.9ns ± 1% 4.1ns ± 1% -78.34% (p=0.008 n=5+5)
TrimSpace/ASCII-8 29.9ns ± 0% 6.3ns ± 1% -79.06% (p=0.008 n=5+5)
TrimSpace/SomeNonASCII-8 91.5ns ± 0% 82.3ns ± 0% -10.03% (p=0.008 n=5+5)
TrimSpace/JustNonASCII-8 150ns ± 0% 150ns ± 0% ~ (all equal)
Fixes#29122
Change-Id: Ica45cd86a219cadf60173ec9db260133cd1d7951
Reviewed-on: https://go-review.googlesource.com/c/go/+/152917
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There are no empty function declarations in package strings anymore, so
strings.s is no longer needed.
Change-Id: I16fe161a9c06804811e98af0ca074f8f46e2f49d
Reviewed-on: https://go-review.googlesource.com/c/go/+/166458
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change-Id: I2ff29aa9909be3062fcd5f65af261f5d8c46fbc1
Reviewed-on: https://go-review.googlesource.com/c/153843
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Cleans things up quite a bit.
There's still a few more, like runtime.cmpstring, which might also
be worth fixing.
Change-Id: Ide18dd621efc129cc686db223f47fa0b044b5580
Reviewed-on: https://go-review.googlesource.com/c/148578
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
At each comparison, we're making a copy of the whole string.
Instead, use unsafe to share the string backing store with a []byte.
It reduces the test time from ~4sec to ~1sec on my machine
(darwin/amd64). Some builders were having much more trouble with this
test (>3min), it may help more there.
Fixes#26174Fixes#28573Fixes#26155
Update #26473
Change-Id: Id5856fd26faf6ff46e763a088f039230556a4116
Reviewed-on: https://go-review.googlesource.com/c/147358
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This lets []byte->string conversions which are used as arguments to
strings.IndexByte and friends have their backing store allocated on
the stack.
It only prevents allocation when the string is small enough (32
bytes), so it isn't perfect. But reusing the []byte backing store
directly requires a bunch more compiler analysis (see #2205 and
related issues).
Fixes#25864.
Change-Id: Ie52430422196e3c91e5529d6e56a8435ced1fc4c
Reviewed-on: https://go-review.googlesource.com/c/146018
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When an invalid UTF-8 byte sequence is decoded in a range loop over a string
a utf8.RuneError rune is returned. This is not distinguishable from decoding
the valid '\uFFFD' sequence representing utf8.RuneError from a string without
further checks within the range loop.
The previous Map code did not do any extra checks and would thereby not map
invalid UTF-8 byte sequences correctly when those were mapping to utf8.RuneError.
Fix this by adding the extra checks necessary to distinguish the decoding
of invalid utf8 byte sequences from decoding the sequence for utf8.RuneError
when the mapping of a rune is utf8.RuneError.
This fix does not result in a measureable performance regression:
name old time/op new time/op delta
ByteByteMap 1.05µs ± 3% 1.03µs ± 3% ~ (p=0.118 n=10+10)
Map/identity/ASCII 169ns ± 2% 170ns ± 1% ~ (p=0.501 n=9+10)
Map/identity/Greek 298ns ± 1% 303ns ± 4% ~ (p=0.338 n=10+10)
Map/change/ASCII 323ns ± 3% 325ns ± 4% ~ (p=0.679 n=8+10)
Map/change/Greek 628ns ± 5% 635ns ± 1% ~ (p=0.460 n=10+9)
MapNoChanges 120ns ± 4% 119ns ± 1% ~ (p=0.496 n=10+9)
Fixes#26305
Change-Id: I70e99fa244983c5040756fa4549ac1e8cb6022c3
Reviewed-on: https://go-review.googlesource.com/c/131495
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
And start using it elsewhere in the standard library, removing the
copies in the process.
While at it, rewrite the io.WriteString godoc to be more clear, since it
can now make reference to the defined interface.
Fixes#27946.
Change-Id: Id5ba223c09c19e5fb49815bd3b1bd3254fc786f3
Reviewed-on: https://go-review.googlesource.com/c/139457
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I omitted vendor directories and anything necessary for bootstrapping.
(Tested by bootstrapping with Go 1.4)
Updates #27864
Change-Id: I7d9b68d0372d3a34dee22966cca323513ece7e8a
Reviewed-on: https://go-review.googlesource.com/137856
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
A number of explicit function literals found through the
unlambda linter are removed.
Fixes#26802
Change-Id: I0b122bdd95e9cb804c77efe20483fdf681c8154e
Reviewed-on: https://go-review.googlesource.com/127756
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
To report the capacity of the underlying buffer. The method mirrors
bytes.Buffer.Cap.
The method can be useful to know whether or not calling write or grow
methods will result in an allocation, or to know how much memory has
been allocated so far.
Fixes#26269.
Change-Id: I391db45ae825011566b594836991e28135369a78
Reviewed-on: https://go-review.googlesource.com/122835
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
On the OpenBSD builder this reduces the test time from 213 seconds to
60 seconds, without loss of testing.
Not sure why the test is so much slower on OpenBSD, so not closing the
issues.
Updates #26155
Updates #26174
Change-Id: I13b58bbe3b209e591c308765077d2342943a3d2a
Reviewed-on: https://go-review.googlesource.com/121820
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ralph Corderoy <ralph@inputplus.co.uk>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The existing implementation of bytes.Compare on s390x doesn't work properly for slices longer
than 256 elements. This change fixes that. Added tests for long strings and slices of bytes.
Fixes#26114
Change-Id: If6d8b68ee6dbcf99a24f867a1d3438b1f208954f
Reviewed-on: https://go-review.googlesource.com/121495
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If one quickly looks at the strings package godoc, reading the name
TrimLeft, one might think it removes a prefix from the string.
The function's godoc does explain its purpose, but it's apparent that it
is not clear enough, as there have been numerous raised issues about
this confusion: #12771#14657#18160#19371#20085#25328#26119. These
questions are also frequent elsewhere on the internet.
Add a very short paragraph to the godoc, to hopefully point new Go
developers in the right direction faster. Do the same thing for
TrimRight and TrimSuffix.
Change-Id: I4dee5ed8dd9fba565b4755bad12ae1ee6d277959
Reviewed-on: https://go-review.googlesource.com/121637
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Each URL was manually verified to ensure it did not serve up incorrect
content.
Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Fix encoding of PAD (U+0080) which has the same value as utf8.RuneSelf
being incorrectly encoded as \x80 in strings.Map due to using <= instead
of a < comparison operator to check one byte encodings for utf8.
Fixes#25242
Change-Id: Ib6c7d1f425a7ba81e431b6d64009e713d94ea3bc
Reviewed-on: https://go-review.googlesource.com/111286
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The existing implementation only considers the special ASCII
case when the lower character is an upper case letter. This
means that most ASCII comparisons use unicode.SimpleFold even
when it is not necessary.
benchmark old ns/op new ns/op delta
BenchmarkEqualFold-8 450 390 -13.33%
Change-Id: I735ca3c30fc0145c186d2a54f31fd39caab2c3fa
Reviewed-on: https://go-review.googlesource.com/110018
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
NewReplacer's documentation says that "replacements are performed in
order", meaning that substrings are replaced in the order they appear
in the target string, and not that the old->new replacements are
applied in the order they're passed to NewReplacer.
Rephrase the doc to make this clearer.
Fixes#25071
Change-Id: Icf3aa6a9d459b94764c9d577e4a76ad8c04d158d
Reviewed-on: https://go-review.googlesource.com/109375
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.
Once this is in, the next step is to move the other functions
(Compare, Equal, ...).
Update #19792
This change seems complicated enough that we might just declare
"not worth it" and abandon. Opinions welcome.
The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.
The wrapper functions have been cleaned up as they are now actually
checked by vet.
Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
Reviewed-on: https://go-review.googlesource.com/98016
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Despite the existing test that locks in the allocation behavior, people
really want a benchmark. So:
BenchmarkBuildString_Builder/1Write_NoGrow-4 20000000 60.4 ns/op 48 B/op 1 allocs/op
BenchmarkBuildString_Builder/3Write_NoGrow-4 10000000 230 ns/op 336 B/op 3 allocs/op
BenchmarkBuildString_Builder/3Write_Grow-4 20000000 102 ns/op 112 B/op 1 allocs/op
BenchmarkBuildString_ByteBuffer/1Write_NoGrow-4 10000000 125 ns/op 160 B/op 2 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_NoGrow-4 5000000 339 ns/op 400 B/op 3 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_Grow-4 5000000 316 ns/op 336 B/op 3 allocs/op
I don't think these allocate-as-fast-as-you-can benchmarks are very
interesting because they're effectively just GC benchmarks, but sure.
If one wants to see that there's 1 fewer allocation, there it is. The
ns/op and B/op numbers will change as the built string size changes.
Updates #18990
Change-Id: Ifccf535bd396217434a0e6989e195105f90132ae
Reviewed-on: https://go-review.googlesource.com/96980
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
All credit and blame goes to Ian for this suggestion, copied from the
runtime.
Fixes#23382
Updates #7921
Change-Id: I3d5a9ee4ab730c87e0f3feff3e7fceff9bcf9e18
Reviewed-on: https://go-review.googlesource.com/86976
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The Builder's ReadFrom method allows the underlying unsafe slice to
escape, and for callers to subsequently modify memory that had been
unsafely converted into an immutable string.
In the original proposal for Builder (#18990), I'd noted there should
be no Read methods:
> There would be no Reset or Bytes or Truncate or Read methods.
> Nothing that could mutate the []byte once it was unsafely converted
> to a string.
And in my prototype (https://golang.org/cl/37767), I handled ReadFrom
properly, but when https://golang.org/cl/74931 arrived, I missed that
it had a ReadFrom method and approved it.
Because we're so close to the Go 1.10 release, just remove the
ReadFrom method rather than think about possible fixes. It has
marginal utility in a Builder anyway.
Also, fix a separate bug that also allowed mutation of a slice's
backing array after it had been converted into a slice by disallowing
copies of the Builder by value.
Updates #18990Fixes#23083Fixes#23084
Change-Id: Id1f860f8a4f5f88b32213cf85108ebc609acb95f
Reviewed-on: https://go-review.googlesource.com/83255
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 65851 (bytes) and CL 65910 (strings) “improve[d] readability”
by removing the special case that bypassed the whole function body
when chars == "". In doing so, yes, the function was unindented a
level, which is nice, but the runtime of that case went from O(1) to O(n)
where n = len(s).
I don't know if anyone's code depends on the O(1) behavior in this case,
but quite possibly someone's does.
This CL adds the special case back, with a comment to prevent future
deletions, and without reindenting each function body in full.
Change-Id: I5aba33922b304dd1b8657e6d51d6c937a7f95c81
Reviewed-on: https://go-review.googlesource.com/78112
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This is like a write-only subset of bytes.Buffer with an
allocation-free String method.
Fixes#18990.
Change-Id: Icdf7240f4309a52924dc3af04a39ecd737a210f4
Reviewed-on: https://go-review.googlesource.com/74931
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: Ifa0384722dd879af7f5edb7b7aaac5ede3cff46d
Reviewed-on: https://go-review.googlesource.com/74690
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This change removes the check of len(chars) > 0 inside the Index and
IndexAny functions which was redundant.
Change-Id: Iffbc0f2b3332c6e31c7514b5f644b6fe7bdcfe0d
Reviewed-on: https://go-review.googlesource.com/65910
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
We initialize fieldStart to 0, then set it to i without ever reading
0, so we might as well just initialize it to i.
Change-Id: I17905b25d54a62b6bc76f915353756ed5eb6972b
Reviewed-on: https://go-review.googlesource.com/52933
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Avelino <t@avelino.xxx>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I6bfe5b914cf11be1cd1f8e61d557cc718725f0be
Reviewed-on: https://go-review.googlesource.com/49013
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add a example for string.Compare that return the three possible results.
Change-Id: I103cf39327c1868fb249538d9e22b11865ba4b70
Reviewed-on: https://go-review.googlesource.com/49011
Reviewed-by: Heschi Kreinick <heschi@google.com>
Change-Id: Ib6a59735381ce744553f1ac96eeb65a194c8da10
Reviewed-on: https://go-review.googlesource.com/48860
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change-Id: I69d1359d8868d4c5b173e4d831e38cea7dfeb713
Reviewed-on: https://go-review.googlesource.com/48859
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Apparently people get confused by the fact that
Split("", ",")
returns []{""} instead of []{}.
This is actually just a consequence of the fact that if the separator
sep (2nd argument) is not found the string s (1st argument), then the
Split* functions return a length 1 slice with the string s in it.
Document the general case: if sep is not in s, what you get is a len 1
slice with s in it; unless both s and sep are "", in that case you get
an empty slice of length 0.
Fixes#19726
Change-Id: I64c8220b91acd1e5aa1cc1829199e0cd8c47c404
Reviewed-on: https://go-review.googlesource.com/44950
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Implements detection of x86 cpu features that
are used in the go standard library.
Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.
Updates: #15403
Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The new Map implementation introduced in golang.org/cl/33201
did not differentiate if an invalid UTF-8 sequence was decoded
or the RuneError rune. It would therefore always advance by
3 bytes (which is the length of the RuneError rune) instead
of 1 for an invalid sequences. This cl adds a check to correctly
determine the length of bytes needed to advance to the next rune.
Fixes#19330.
Change-Id: I1e7f9333f3ef6068ffc64015bb0a9f32b0b7111d
Reviewed-on: https://go-review.googlesource.com/37597
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Using 'sep' as parameter name for strings functions that take a
separator argument is fine, but for functions like Index or Count that
look for a substring it's better to use 'substr' (like Contains
already does).
Fixes#19039
Change-Id: Idd557409c8fea64ce830ab0e3fec37d3d56a79f0
Reviewed-on: https://go-review.googlesource.com/36874
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Updates the s390x-specific files in these packages with the changes
to the amd64-specific files made during the review of CL 31690. I'd
like to keep these files in sync unless there is a reason to
diverge.
Change-Id: Id83e5ce11a45f877bdcc991d02b14416d1a2d8d2
Reviewed-on: https://go-review.googlesource.com/32574
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In a large codebase within Google, there are thousands of uses of:
ContainsAny|IndexAny|LastIndexAny|Trim|TrimLeft|TrimRight
An analysis of their usage shows that over 97% of them only use character
sets consisting of only ASCII symbols.
Uses of ContainsAny|IndexAny|LastIndexAny:
6% are 1 character (e.g., "\n" or " ")
58% are 2-4 characters (e.g., "<>" or "\r\n\t ")
24% are 5-9 characters (e.g., "()[]*^$")
10% are 10+ characters (e.g., "+-=&|><!(){}[]^\"~*?:\\/ ")
We optimize for ASCII sets, which are commonly used to search for
"control" characters in some string. We don't optimize for the
single character scenario since IndexRune or IndexByte could be used.
Uses of Trim|TrimLeft|TrimRight:
71% are 1 character (e.g., "\n" or " ")
14% are 2 characters (e.g., "\r\n")
10% are 3-4 characters (e.g., " \t\r\n")
5% are 10+ characters (e.g., "0123456789abcdefABCDEF")
We optimize for the single character case with a simple closured function
that only checks for that character's value. We optimize for the medium
and larger sets using a 16-byte bit-map representing a set of ASCII characters.
The benchmarks below have the following suffix name "%d:%d" where the first
number is the length of the input and the second number is the length
of the charset.
== bytes package ==
benchmark old ns/op new ns/op delta
BenchmarkIndexAnyASCII/1:1-4 5.09 5.23 +2.75%
BenchmarkIndexAnyASCII/1:2-4 5.81 5.85 +0.69%
BenchmarkIndexAnyASCII/1:4-4 7.22 7.50 +3.88%
BenchmarkIndexAnyASCII/1:8-4 11.0 11.1 +0.91%
BenchmarkIndexAnyASCII/1:16-4 17.5 17.8 +1.71%
BenchmarkIndexAnyASCII/16:1-4 36.0 34.0 -5.56%
BenchmarkIndexAnyASCII/16:2-4 46.6 36.5 -21.67%
BenchmarkIndexAnyASCII/16:4-4 78.0 40.4 -48.21%
BenchmarkIndexAnyASCII/16:8-4 136 47.4 -65.15%
BenchmarkIndexAnyASCII/16:16-4 254 61.5 -75.79%
BenchmarkIndexAnyASCII/256:1-4 542 388 -28.41%
BenchmarkIndexAnyASCII/256:2-4 705 382 -45.82%
BenchmarkIndexAnyASCII/256:4-4 1089 386 -64.55%
BenchmarkIndexAnyASCII/256:8-4 1994 394 -80.24%
BenchmarkIndexAnyASCII/256:16-4 3843 411 -89.31%
BenchmarkIndexAnyASCII/4096:1-4 8522 5873 -31.08%
BenchmarkIndexAnyASCII/4096:2-4 11253 5861 -47.92%
BenchmarkIndexAnyASCII/4096:4-4 17824 5883 -66.99%
BenchmarkIndexAnyASCII/4096:8-4 32053 5871 -81.68%
BenchmarkIndexAnyASCII/4096:16-4 60512 5888 -90.27%
BenchmarkTrimASCII/1:1-4 79.5 70.8 -10.94%
BenchmarkTrimASCII/1:2-4 79.0 105 +32.91%
BenchmarkTrimASCII/1:4-4 79.6 109 +36.93%
BenchmarkTrimASCII/1:8-4 78.8 118 +49.75%
BenchmarkTrimASCII/1:16-4 80.2 132 +64.59%
BenchmarkTrimASCII/16:1-4 243 116 -52.26%
BenchmarkTrimASCII/16:2-4 243 171 -29.63%
BenchmarkTrimASCII/16:4-4 243 176 -27.57%
BenchmarkTrimASCII/16:8-4 241 184 -23.65%
BenchmarkTrimASCII/16:16-4 238 199 -16.39%
BenchmarkTrimASCII/256:1-4 2580 840 -67.44%
BenchmarkTrimASCII/256:2-4 2603 1175 -54.86%
BenchmarkTrimASCII/256:4-4 2572 1188 -53.81%
BenchmarkTrimASCII/256:8-4 2550 1191 -53.29%
BenchmarkTrimASCII/256:16-4 2585 1208 -53.27%
BenchmarkTrimASCII/4096:1-4 39773 12181 -69.37%
BenchmarkTrimASCII/4096:2-4 39946 17231 -56.86%
BenchmarkTrimASCII/4096:4-4 39641 17179 -56.66%
BenchmarkTrimASCII/4096:8-4 39835 17175 -56.88%
BenchmarkTrimASCII/4096:16-4 40229 17215 -57.21%
== strings package ==
benchmark old ns/op new ns/op delta
BenchmarkIndexAnyASCII/1:1-4 5.94 4.97 -16.33%
BenchmarkIndexAnyASCII/1:2-4 5.94 5.55 -6.57%
BenchmarkIndexAnyASCII/1:4-4 7.45 7.21 -3.22%
BenchmarkIndexAnyASCII/1:8-4 10.8 10.6 -1.85%
BenchmarkIndexAnyASCII/1:16-4 17.4 17.2 -1.15%
BenchmarkIndexAnyASCII/16:1-4 36.4 32.2 -11.54%
BenchmarkIndexAnyASCII/16:2-4 49.6 34.6 -30.24%
BenchmarkIndexAnyASCII/16:4-4 77.5 37.9 -51.10%
BenchmarkIndexAnyASCII/16:8-4 138 45.5 -67.03%
BenchmarkIndexAnyASCII/16:16-4 241 59.1 -75.48%
BenchmarkIndexAnyASCII/256:1-4 509 378 -25.74%
BenchmarkIndexAnyASCII/256:2-4 720 381 -47.08%
BenchmarkIndexAnyASCII/256:4-4 1142 384 -66.37%
BenchmarkIndexAnyASCII/256:8-4 1999 391 -80.44%
BenchmarkIndexAnyASCII/256:16-4 3735 403 -89.21%
BenchmarkIndexAnyASCII/4096:1-4 7973 5824 -26.95%
BenchmarkIndexAnyASCII/4096:2-4 11432 5809 -49.19%
BenchmarkIndexAnyASCII/4096:4-4 18327 5819 -68.25%
BenchmarkIndexAnyASCII/4096:8-4 33059 5828 -82.37%
BenchmarkIndexAnyASCII/4096:16-4 59703 5817 -90.26%
BenchmarkTrimASCII/1:1-4 71.9 71.8 -0.14%
BenchmarkTrimASCII/1:2-4 73.3 103 +40.52%
BenchmarkTrimASCII/1:4-4 71.8 106 +47.63%
BenchmarkTrimASCII/1:8-4 71.2 113 +58.71%
BenchmarkTrimASCII/1:16-4 71.6 128 +78.77%
BenchmarkTrimASCII/16:1-4 152 116 -23.68%
BenchmarkTrimASCII/16:2-4 160 168 +5.00%
BenchmarkTrimASCII/16:4-4 172 170 -1.16%
BenchmarkTrimASCII/16:8-4 200 177 -11.50%
BenchmarkTrimASCII/16:16-4 254 193 -24.02%
BenchmarkTrimASCII/256:1-4 1438 864 -39.92%
BenchmarkTrimASCII/256:2-4 1551 1195 -22.95%
BenchmarkTrimASCII/256:4-4 1770 1200 -32.20%
BenchmarkTrimASCII/256:8-4 2195 1216 -44.60%
BenchmarkTrimASCII/256:16-4 3054 1224 -59.92%
BenchmarkTrimASCII/4096:1-4 21726 12557 -42.20%
BenchmarkTrimASCII/4096:2-4 23586 17508 -25.77%
BenchmarkTrimASCII/4096:4-4 26898 17510 -34.90%
BenchmarkTrimASCII/4096:8-4 33714 17595 -47.81%
BenchmarkTrimASCII/4096:16-4 47429 17700 -62.68%
The benchmarks added test the worst case. For IndexAny, that is when the
charset matches none of the input. For Trim, it is when the charset matches
all of the input.
Change-Id: I970874d101a96b33528fc99b165379abe58cf6ea
Reviewed-on: https://go-review.googlesource.com/31593
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Martin Möhrmann <martisch@uos.de>
In all previous versions of Go, the behavior of IndexRune(s, r)
where r was utf.RuneError was that it would effectively return the
index of any invalid UTF-8 byte sequence (include RuneError).
Optimizations made in http://golang.org/cl/28537 and
http://golang.org/cl/28546 altered this undocumented behavior such
that RuneError would only match on the RuneError rune itself.
Although, the new behavior is arguably reasonable, it did break code
that depended on the previous behavior. Thus, we add special checks
to ensure that we preserve the old behavior.
There is a slight performance hit for correctness:
benchmark old ns/op new ns/op delta
BenchmarkIndexRune/10-4 19.3 21.6 +11.92%
BenchmarkIndexRune/32-4 33.6 35.2 +4.76%
This only occurs on small strings. The performance hit for larger strings
is neglible and not shown.
Fixes#17611
Change-Id: I1d863a741213d46c40b2e1724c41245df52502a5
Reviewed-on: https://go-review.googlesource.com/32123
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Panic if Repeat is given a negative count or
if the value of (len(*) * count) is detected
to overflow.
We panic because we cannot change the
signature of Repeat to return an error.
Fixes#16237
Change-Id: I9f5ba031a5b8533db0582d7a672ffb715143f3fb
Reviewed-on: https://go-review.googlesource.com/29954
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
IndexHard4-4 1.50ms ± 2% 0.71ms ± 0% -52.36% (p=0.000 n=20+19)
This also fixes a bug, that caused a string of length 16 to use
two 8-byte comparisons instead of one 16-byte. And adds a test for
cases when partial_match fails.
Change-Id: I1ee8fc4e068bb36c95c45de78f067c822c0d9df0
Reviewed-on: https://go-review.googlesource.com/22551
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We already had special cases for 0 and 1. Add 2 and 3 for now too.
To be removed if the compiler is improved later (#6714).
This halves the number of allocations and total bytes allocated via
common filepath.Join calls, improving filepath.Walk performance.
Noticed as part of investigating filepath.Walk in #16399.
Change-Id: If7b1bb85606d4720f3ebdf8de7b1e12ad165079d
Reviewed-on: https://go-review.googlesource.com/25005
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
The 17-31 byte code is broken. Disabled it.
Added a bunch of tests to at least cover the cases
in indexShortStr. I'll channel Brad and wonder why
this CL ever got in without any tests.
Fixes#15679
Change-Id: I84a7b283a74107db865b9586c955dcf5f2d60161
Reviewed-on: https://go-review.googlesource.com/23106
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL/19862 (f79b50b8d5) recently introduced the constants
SeekStart, SeekCurrent, and SeekEnd to the io package. We should use these constants
consistently throughout the code base.
Updates #15269
Change-Id: If7fcaca7676e4a51f588528f5ced28220d9639a2
Reviewed-on: https://go-review.googlesource.com/22097
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Merges explodetests into splittests which already contain
some of the tests that cover explode.
Adds a test to cover the utf8.RuneError branch in explode.
name old time/op new time/op delta
Split1-2 14.9ms ± 0% 14.2ms ± 0% -4.06% (p=0.000 n=47+49)
Change-Id: I00f796bd2edab70e926ea9e65439d820c6a28254
Reviewed-on: https://go-review.googlesource.com/21609
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, there is no easy allocation-free way to turn a
[]byte or string into an io.Reader. Thus, we add a Reset method
to bytes.Reader and strings.Reader to allow the reuse of these
Readers with another []byte or string.
This is consistent with the fact that many standard library io.Readers
already support a Reset method of some type:
bufio.Reader
flate.Reader
gzip.Reader
zlib.Reader
debug/dwarf.LineReader
bytes.Buffer
crypto/rc4.Cipher
Fixes#15033
Change-Id: I456fd1af77af6ef0b4ac6228b058ac1458ff3d19
Reviewed-on: https://go-review.googlesource.com/21386
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.
Go policy has been single space after periods in comments for some time.
The copyright header template at:
https://golang.org/doc/contribute.html#copyright
also uses a single space.
Make them all consistent.
Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Named returned values should only be used on public funcs and methods
when it contributes to the documentation.
Named return values should not be used if they're only saving the
programmer a few lines of code inside the body of the function,
especially if that means there's stutter in the documentation or it
was only there so the programmer could use a naked return
statement. (Naked returns should not be used except in very small
functions)
This change is a manual audit & cleanup of public func signatures.
Signatures were not changed if:
* the func was private (wouldn't be in public godoc)
* the documentation referenced it
* the named return value was an interesting name. (i.e. it wasn't
simply stutter, repeating the name of the type)
There should be no changes in behavior. (At least: none intended)
Change-Id: I3472ef49619678fe786e5e0994bdf2d9de76d109
Reviewed-on: https://go-review.googlesource.com/20024
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
The previous commit (git 2ae77376) just did golang.org. This one
includes golang.org subdomains like blog, play, and build.
Change-Id: I4469f7b307ae2a12ea89323422044e604c5133ae
Reviewed-on: https://go-review.googlesource.com/12071
Reviewed-by: Rob Pike <r@golang.org>
Also add a reference to the strings blog post.
Fixes#11045.
Change-Id: Ic0a8908cbd7b51a36d104849fa0e8abfd54de2b9
Reviewed-on: https://go-review.googlesource.com/10662
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently the packages have the following index functions:
func Index(s, sep []byte) int
func IndexAny(s []byte, chars string) int
func IndexByte(s []byte, c byte) int
func IndexFunc(s []byte, f func(r rune) bool) int
func IndexRune(s []byte, r rune) int
func LastIndex(s, sep []byte) int
func LastIndexAny(s []byte, chars string) int
func LastIndexFunc(s []byte, f func(r rune) bool) int
Searching for the last occurrence of a byte is quite common
for string parsing algorithms (e.g. find the last paren on a line).
Also addition of LastIndexByte makes the set more orthogonal.
Change-Id: Ida168849acacf8e78dd70c1354bef9eac5effafe
Reviewed-on: https://go-review.googlesource.com/9500
Reviewed-by: Rob Pike <r@golang.org>
As noted on recently on golang-nuts, there's currently no way to know
the total size of a strings.Reader or bytes.Reader when using ReadAt
on them. Most callers resort to wrapping it in an io.SectionReader to
retain that information.
The SizeReaderAt abstraction (an io.ReaderAt with a Size() int64
method) has proven useful as a way of expressing a concurrency-safe
read-only number of bytes.
As one example, see http://talks.golang.org/2013/oscon-dl.slide#49 and
the rest of that presentation for its use in dl.google.com.
SizeReaderAt is also used in the open source google-api-go-client, and
within Google's internal codebase, where it exists in a public package
created in 2013 with the package comment: "These may migrate to the
standard library after we have enough experience with their feel."
I'm still as happy with the SizeReaderAt abstraction and its
composabilty as I was in 2013, so I'd like to make these two Readers
also be SizeReaderAts.
Fixes#9667
Change-Id: Ie6f145ada419dd116280472d8c029f046d5edf70
Reviewed-on: https://go-review.googlesource.com/3199
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
The strings.Trim function and variants allocate memory on the heap when creating a function to pass into TrimFunc.
Add a benchmark to document the behavior; an issue will be submitted to address this behavior in the compiler if possible.
Change-Id: I8b66721f077951f7e7b8cf3cf346fac27a9b68c0
Reviewed-on: https://go-review.googlesource.com/8200
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Regular expression noteMarker requires the definition of a (who) section
when reading note from a sequence of comments.
Change-Id: I9635de9b86f00d20ec108097fee4d4a8f76237b2
Reviewed-on: https://go-review.googlesource.com/1952
Reviewed-by: Russ Cox <rsc@golang.org>
The function is here ONLY for symmetry with package bytes.
This function should be used ONLY if it makes code clearer.
It is not here for performance. Remove any performance benefit.
If performance becomes an issue, the compiler should be fixed to
recognize the three-way compare (for all comparable types)
rather than encourage people to micro-optimize by using this function.
Change-Id: I71f4130bce853f7aef724c6044d15def7987b457
Reviewed-on: https://go-review.googlesource.com/3012
Reviewed-by: Rob Pike <r@golang.org>
The implementation is the same assembly (or Go) routine.
Change-Id: Ib937c461c24ad2d5be9b692b4eed40d9eb031412
Reviewed-on: https://go-review.googlesource.com/2828
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Noticed while reviewing https://golang.org/cl/147690043/
I'd never seen anybody use IndexRune before, and
unsurprisingly it doesn't use the other fast paths in the
strings/bytes packages. IndexByte uses assembly.
Also, less code this way.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/147700043