Commit Graph

126 Commits

Author SHA1 Message Date
Russ Cox c3c4aea55b regexp: limit size of parsed regexps
Set a 128 MB limit on the amount of space used by []syntax.Inst
in the compiled form corresponding to a given regexp.

Also set a 128 MB limit on the rune storage in the *syntax.Regexp
tree itself.

Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue.

Fixes CVE-2022-41715.
Fixes #55949.

Change-Id: Ia656baed81564436368cf950e1c5409752f28e1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/439356
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Roland Shoemaker <roland@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
2022-10-05 20:39:49 +00:00
cui fliter 1888875182 regexp: fix a few function names on comments
Change-Id: I192dd34c677e52e16f0ef78e1dae58a78f6d1aac
GitHub-Last-Rev: 1638a74689
GitHub-Pull-Request: golang/go#55967
Reviewed-on: https://go-review.googlesource.com/c/go/+/436885
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-10-02 02:31:23 +00:00
Bryan Boreham 0293c51bc5 regexp: avoid copying each instruction executed
Inst is a 40-byte struct, so avoiding the copy gives a decent speedup:

name                            old time/op    new time/op     delta
Find-8                             160ns ± 4%      109ns ± 4%  -32.22%  (p=0.008 n=5+5)
FindAllNoMatches-8                70.4ns ± 4%     53.8ns ± 0%  -23.58%  (p=0.016 n=5+4)
FindString-8                       154ns ± 6%      107ns ± 0%  -30.37%  (p=0.016 n=5+4)
FindSubmatch-8                     194ns ± 1%      135ns ± 1%  -30.56%  (p=0.008 n=5+5)
FindStringSubmatch-8               193ns ± 8%      131ns ± 0%  -31.82%  (p=0.008 n=5+5)
Literal-8                         42.8ns ± 2%     34.8ns ± 0%  -18.67%  (p=0.008 n=5+5)
NotLiteral-8                       917ns ± 2%      636ns ± 0%  -30.68%  (p=0.008 n=5+5)
MatchClass-8                      1.18µs ± 3%     0.91µs ± 1%  -22.27%  (p=0.016 n=5+4)
MatchClass_InRange-8              1.11µs ± 1%     0.87µs ± 2%  -21.38%  (p=0.008 n=5+5)
ReplaceAll-8                       659ns ± 6%      596ns ± 3%   -9.60%  (p=0.008 n=5+5)
AnchoredLiteralShortNonMatch-8    34.2ns ± 0%     30.4ns ± 1%  -11.20%  (p=0.016 n=4+5)
AnchoredLiteralLongNonMatch-8     38.7ns ± 0%     38.7ns ± 0%     ~     (p=0.579 n=5+5)
AnchoredShortMatch-8              67.0ns ± 1%     52.7ns ± 0%  -21.31%  (p=0.016 n=5+4)
AnchoredLongMatch-8                121ns ± 0%      124ns ±10%     ~     (p=0.730 n=5+5)
OnePassShortA-8                    392ns ± 0%      231ns ± 3%  -41.10%  (p=0.008 n=5+5)
NotOnePassShortA-8                 370ns ± 0%      282ns ± 1%  -23.81%  (p=0.008 n=5+5)
OnePassShortB-8                    280ns ± 0%      179ns ± 1%  -36.05%  (p=0.008 n=5+5)
NotOnePassShortB-8                 226ns ± 0%      185ns ± 3%  -18.26%  (p=0.008 n=5+5)
OnePassLongPrefix-8               51.7ns ± 0%     39.1ns ± 1%  -24.28%  (p=0.016 n=4+5)
OnePassLongNotPrefix-8             213ns ± 2%      132ns ± 1%  -37.86%  (p=0.008 n=5+5)
MatchParallelShared-8             25.3ns ± 3%     23.4ns ± 7%   -7.50%  (p=0.016 n=5+5)
MatchParallelCopied-8             26.5ns ± 7%     22.3ns ± 7%  -16.06%  (p=0.008 n=5+5)
QuoteMetaAll-8                    45.8ns ± 1%     45.8ns ± 1%     ~     (p=1.000 n=5+5)
QuoteMetaNone-8                   24.3ns ± 0%     24.3ns ± 0%     ~     (p=0.325 n=5+5)
Compile/Onepass-8                 1.98µs ± 0%     1.97µs ± 0%   -0.22%  (p=0.016 n=5+4)
Compile/Medium-8                  4.56µs ± 0%     4.55µs ± 1%     ~     (p=0.595 n=5+5)
Compile/Hard-8                    35.7µs ± 0%     35.3µs ± 3%     ~     (p=0.151 n=5+5)
Match/Easy0/16-8                  2.18ns ± 2%     2.19ns ± 5%     ~     (p=0.690 n=5+5)
Match/Easy0/32-8                  27.4ns ± 2%     27.6ns ± 4%     ~     (p=1.000 n=5+5)
Match/Easy0/1K-8                   246ns ± 0%      252ns ± 7%     ~     (p=0.238 n=5+5)
Match/Easy0/32K-8                 4.58µs ± 7%     4.64µs ± 5%     ~     (p=1.000 n=5+5)
Match/Easy0/1M-8                   235µs ± 0%      235µs ± 0%     ~     (p=0.886 n=4+4)
Match/Easy0/32M-8                 7.86ms ± 0%     7.86ms ± 1%     ~     (p=0.730 n=4+5)
Match/Easy0i/16-8                 2.15ns ± 0%     2.15ns ± 0%     ~     (p=0.246 n=5+5)
Match/Easy0i/32-8                  507ns ± 2%      466ns ± 4%   -8.03%  (p=0.008 n=5+5)
Match/Easy0i/1K-8                 14.7µs ± 0%     13.6µs ± 2%   -7.63%  (p=0.008 n=5+5)
Match/Easy0i/32K-8                 571µs ± 1%      570µs ± 1%     ~     (p=0.556 n=4+5)
Match/Easy0i/1M-8                 18.2ms ± 0%     18.8ms ±11%     ~     (p=0.548 n=5+5)
Match/Easy0i/32M-8                 581ms ± 0%      590ms ± 1%   +1.52%  (p=0.016 n=4+5)
Match/Easy1/16-8                  2.17ns ± 0%     2.15ns ± 0%   -0.90%  (p=0.000 n=5+4)
Match/Easy1/32-8                  25.1ns ± 0%     25.4ns ± 4%     ~     (p=0.651 n=5+5)
Match/Easy1/1K-8                   462ns ± 1%      431ns ± 4%   -6.56%  (p=0.008 n=5+5)
Match/Easy1/32K-8                 18.8µs ± 0%     18.8µs ± 1%     ~     (p=1.000 n=5+5)
Match/Easy1/1M-8                   658µs ± 0%      658µs ± 1%     ~     (p=0.841 n=5+5)
Match/Easy1/32M-8                 21.0ms ± 1%     21.0ms ± 2%     ~     (p=0.841 n=5+5)
Match/Medium/16-8                 2.15ns ± 0%     2.16ns ± 0%     ~     (p=0.714 n=4+5)
Match/Medium/32-8                  561ns ± 1%      512ns ± 5%   -8.69%  (p=0.008 n=5+5)
Match/Medium/1K-8                 16.9µs ± 0%     15.2µs ± 1%  -10.40%  (p=0.008 n=5+5)
Match/Medium/32K-8                 632µs ± 0%      631µs ± 1%     ~     (p=0.421 n=5+5)
Match/Medium/1M-8                 20.3ms ± 1%     20.1ms ± 0%     ~     (p=0.190 n=5+4)
Match/Medium/32M-8                 650ms ± 1%      646ms ± 0%   -0.58%  (p=0.032 n=5+4)
Match/Hard/16-8                   2.15ns ± 0%     2.15ns ± 1%     ~     (p=0.111 n=5+5)
Match/Hard/32-8                    870ns ± 2%      667ns ± 1%  -23.28%  (p=0.008 n=5+5)
Match/Hard/1K-8                   26.9µs ± 0%     21.0µs ± 2%  -21.83%  (p=0.008 n=5+5)
Match/Hard/32K-8                   833µs ± 0%      833µs ± 1%     ~     (p=0.548 n=5+5)
Match/Hard/1M-8                   26.6ms ± 0%     26.8ms ± 1%     ~     (p=0.905 n=4+5)
Match/Hard/32M-8                   856ms ± 0%      851ms ± 0%   -0.65%  (p=0.016 n=5+4)
Match/Hard1/16-8                  2.96µs ±12%     1.81µs ± 3%  -38.68%  (p=0.008 n=5+5)
Match/Hard1/32-8                  5.62µs ± 3%     3.48µs ± 0%  -38.07%  (p=0.016 n=5+4)
Match/Hard1/1K-8                   175µs ± 5%      108µs ± 0%  -37.85%  (p=0.016 n=5+4)
Match/Hard1/32K-8                 4.09ms ± 2%     4.05ms ± 0%   -0.85%  (p=0.016 n=5+4)
Match/Hard1/1M-8                   131ms ± 0%      131ms ± 3%     ~     (p=0.151 n=5+5)
Match/Hard1/32M-8                  4.19s ± 0%      4.20s ± 1%     ~     (p=1.000 n=5+5)
Match_onepass_regex/16-8           262ns ± 2%      170ns ± 2%  -35.13%  (p=0.008 n=5+5)
Match_onepass_regex/32-8           463ns ± 0%      306ns ± 0%  -33.90%  (p=0.008 n=5+5)
Match_onepass_regex/1K-8          13.3µs ± 2%      8.8µs ± 0%  -33.84%  (p=0.008 n=5+5)
Match_onepass_regex/32K-8          424µs ± 3%      280µs ± 1%  -33.93%  (p=0.008 n=5+5)
Match_onepass_regex/1M-8          13.4ms ± 0%      9.0ms ± 1%  -32.80%  (p=0.016 n=4+5)
Match_onepass_regex/32M-8          427ms ± 0%      288ms ± 1%  -32.60%  (p=0.008 n=5+5)

Change-Id: I02c54176ed5c9f5b5fc99524a2d5eb1c490f0ebf
Reviewed-on: https://go-review.googlesource.com/c/go/+/355789
Reviewed-by: Peter Weinberger <pjw@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
2022-06-04 20:10:54 +00:00
Ludi Rehak e7b0559448 regexp/syntax: fix typo in comment
Fix typo in comment describing IsWordChar.

Change-Id: Ia283813cf5662e218ee6d0411fb0c1b1ad1021f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/393435
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-29 01:00:55 +00:00
Ian Lance Taylor 0bf545e51f regexp/syntax: rename ErrInvalidDepth to ErrNestingDepth
The proposal accepted the name ErrNestingDepth.

For #51684

Change-Id: I702365f19e5e1889dbcc3c971eecff68e0b08727
Reviewed-on: https://go-review.googlesource.com/c/go/+/401854
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 22:35:03 +00:00
Ian Lance Taylor 575fd8817a regexp: change ErrInvalidDepth message to match proposal
Also update the file in $GOROOT/api/next to use proposal number.

For #51684

Change-Id: I28bfa6bc1cee98a17b13da196d41cda34d968bb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/401076
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 00:10:17 +00:00
Russ Cox 19309779ac all: gofmt main repo
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]

Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.

For #51082.

Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-11 16:34:30 +00:00
Russ Cox 81431c7aa7 all: replace `` and '' with “ (U+201C) and ” (U+201D) in doc comments
go/doc in all its forms applies this replacement when rendering
the comments. We are considering formatting doc comments,
including doing this replacement as part of the formatting.
Apply it to our source files ahead of time.

For #51082.

Change-Id: Ifcc1f5861abb57c5d14e7d8c2102dfb31b7a3a19
Reviewed-on: https://go-review.googlesource.com/c/go/+/384262
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-05 17:52:29 +00:00
Russ Cox 1af60b2f49 regexp/syntax: add and use ErrInvalidDepth
The fix for #51112 introduced a depth check but used
ErrInternalError to avoid introduce new API in a CL that
would be backported to earlier releases.

New API accepted in proposal #51684.

This CL adds a distinct error for this case.

For #51112.
Fixes #51684.

Change-Id: I068fc70aafe4218386a06103d9b7c847fb7ffa65
Reviewed-on: https://go-review.googlesource.com/c/go/+/384617
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-04 10:59:27 +00:00
Russ Cox 690ac4071f all: remove trailing blank doc comment lines
A future change to gofmt will rewrite

	// Doc comment.
	//
	func f()

to

	// Doc comment.
	func f()

Apply that change preemptively to all doc comments.

For #51082.

Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:07 +00:00
Andy Pan 737837c9d4 regexp: use input.step() to advance one rune in Regexp.allMatches()
Change-Id: I32944f4ed519419e168e62f9ed6df63961839259
Reviewed-on: https://go-review.googlesource.com/c/go/+/359197
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-27 20:22:06 +00:00
Jinwook Jeong 40e24a942b regexp: fix typo in the overview
Correct the slice expression in the description of Index functions.

Change-Id: I97a1b670c4c7e600d858f6550b647f677ef90b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/360058
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
2022-03-02 11:49:04 +00:00
Russ Cox 452f24ae94 regexp/syntax: reject very deeply nested regexps in Parse
The regexp code assumes it can recurse over the structure of
a regexp safely. Go's growable stacks make that reasonable
for all plausible regexps, but implausible ones can reach the
“infinite recursion?” stack limit.

This CL limits the depth of any parsed regexp to 1000.
That is, the depth of the parse tree is required to be ≤ 1000.
Regexps that require deeper parse trees will return ErrInternalError.
A future CL will change the error to ErrInvalidDepth,
but using ErrInternalError for now avoids introducing new API
in point releases when this is backported.

Fixes #51112.

Change-Id: I97d2cd82195946eb43a4ea8561f5b95f91fb14c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/384616
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-02-10 15:23:05 +00:00
luochuanhang e4ab8b0fe6 regexp: add the missing is
Change-Id: I23264972329aa3414067cd0e0986b69bb39bbeb5
GitHub-Last-Rev: d1d668a3cb
GitHub-Pull-Request: golang/go#50650
Reviewed-on: https://go-review.googlesource.com/c/go/+/378935
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
2022-01-19 22:56:09 +00:00
Russ Cox f229e7031a all: go fix -fix=buildtag std cmd (except for bootstrap deps, vendor)
When these packages are released as part of Go 1.18,
Go 1.16 will no longer be supported, so we can remove
the +build tags in these files.

Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs
as defined in src/cmd/dist/buildtool.go, which need to continue
to build with Go 1.4 for now.

Also reverted src/vendor and src/cmd/vendor, which will need
to be updated in their own repos first.

Manual changes in runtime/pprof/mprof_test.go to adjust line numbers.

For #41184.

Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344955
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2021-10-28 18:17:57 +00:00
Russ Cox 702e337174 regexp: document and implement that invalid UTF-8 bytes are the same as U+FFFD
What should it mean to run a regexp match on invalid UTF-8 bytes?
The coherent behavior options are:

1. Invalid UTF-8 does not match any character classes,
   nor a U+FFFD literal (nor \x{fffd}).
2. Each byte of invalid UTF-8 is treated identically to a U+FFFD in the input,
   as a utf8.DecodeRune loop might.

RE2 uses Rule 1.
Because it works byte at a time, it can also provide \C to match any
single byte of input, which matches invalid UTF-8 as well.
This provides the nice property that a match for a regexp without \C
is guaranteed to be valid UTF-8.

Unfortunately, today Go has an incoherent mix of these two, although
mostly Rule 2. This is a deviation from RE2, and it gives up the nice
property, but we probably can't correct that at this point.
In particular .* already matches entire inputs today, valid UTF-8 or
not, and I doubt we can break that.

This CL adopts Rule 2 officially, fixing the few places that deviate from it.

Fixes #48749.

Change-Id: I96402527c5dfb1146212f568ffa09dde91d71244
Reviewed-on: https://go-review.googlesource.com/c/go/+/354569
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2021-10-11 15:28:50 +00:00
Russ Cox 4d8db00641 all: use bytes.Cut, strings.Cut
Many uses of Index/IndexByte/IndexRune/Split/SplitN
can be written more clearly using the new Cut functions.
Do that. Also rewrite to other functions if that's clearer.

For #46336.

Change-Id: I68d024716ace41a57a8bf74455c62279bde0f448
Reviewed-on: https://go-review.googlesource.com/c/go/+/351711
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-10-06 15:53:04 +00:00
Russ Cox 2a61b3c590 regexp: fix repeat of preferred empty match
In Perl mode, (|a)* should match an empty string at the start of the
input. Instead it matches as many a's as possible.
Because (|a)+ is handled correctly, matching only an empty string,
this leads to the paradox that e* can match more text than e+
(for e = (|a)) and that e+ is sometimes different from ee*.

This is a very old bug that ultimately derives from the picture I drew
for e* in https://swtch.com/~rsc/regexp/regexp1.html. The picture is
correct for longest-match (POSIX) regexps but subtly wrong for
preferred-match (Perl) regexps in the case where e has a preferred
empty match. Pointed out by Andrew Gallant in private mail.

The current code treats e* and e+ as the same structure, with
different entry points. In the case of e* the preference list ends up
not quite in the right order, in part because the “before e” and
“after e” states are the same state. Splitting them apart fixes the
preference list, and that can be done by compiling e* as if it were
(e+)?.

Like with any bug fix, there is a very low chance of breaking a
program that accidentally depends on the buggy behavior.

RE2, Go, and Rust all have this bug, and we've all agreed to fix it,
to keep the implementations in sync.

Fixes #46123.

Change-Id: I70e742e71e0a23b626593b16ddef3c1e73b413b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/318750
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-05-13 14:52:20 +00:00
Russ Cox d4b2638234 all: go fmt std cmd (but revert vendor)
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).

Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild

Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-02-20 03:54:50 +00:00
Tom Payne 1d3baf20dc regexp/syntax: add note about Unicode character classes
As proposed on golang-nuts:
https://groups.google.com/g/golang-nuts/c/M3lmSUptExQ/m/hRySV9GsCAAJ

Includes the latest updates from re2's mksyntaxgo:
https://code.googlesource.com/re2/+/refs/heads/master/doc/mksyntaxgo

Change-Id: Ib7b79aa6531f473feabd0a7f1d263cd65c4388e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/264678
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2020-11-25 15:10:09 +00:00
Andy Balholm 0ce1ffc49d regexp/syntax: append patchLists in constant time
By keeping a tail pointer, we can append to a patchList in constant
time, rather than in time proportional to the length of the list. This
gets rid of the quadratic compile times we were seeing for long series
of alternations.

This is basically the same change as
e9d517989f.

Fixes #39542.

Change-Id: Ib4ca0ca9c55abd1594df1984653c7d311ccf7572
Reviewed-on: https://go-review.googlesource.com/c/go/+/238079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-06-18 00:40:11 +00:00
Russ Cox 4d9ecde30a regexp/syntax: fix comment on p.literal and simplify
p.literal's doc comment said it returned a value but it doesn't.
While we're here, p.newLiteral is only called from p.literal,
so simplify the code by merging the two.

Change-Id: Ia357937a99f4e7473f0f1ec837113a39eaeb83d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/222659
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-04-17 22:12:02 +00:00
Sylvain Zimmer 782fcb44b9 regexp: add (*Regexp).SubexpIndex
SubexpIndex returns the index of the first subexpression with the given name,
or -1 if there is no subexpression with that name.

Fixes #32420

Change-Id: Ie1f9d22d50fb84e18added80a9d9a9f6dca8ffc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/187919
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-10 09:38:07 +00:00
Keith Randall 9b80791584 regexp: skip long-running benchmarks if -short is specified
This CL helps race.bash finish in a reasonable amount of
time. Otherwise the Match/Hard1/32M benchmark takes over 1200 seconds
to finish on arm64, triggering a timeout.  With this change the regexp
benchmarks as a whole take only about a minute.

Change-Id: Ie2260ef9f5709e32a74bd76f135bc384b2d9853f
Reviewed-on: https://go-review.googlesource.com/c/go/+/201742
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-17 19:53:52 +00:00
Pantelis Sampaziotis 9748e64fe5 regexp: add examples for FindSubmatchIndex and Longest
updates #21450

Change-Id: Ibffe0dadc1e1523c55cd5f5b8a69bc1c399a255d
GitHub-Last-Rev: 507f555081
GitHub-Pull-Request: golang/go#33497
Reviewed-on: https://go-review.googlesource.com/c/go/+/189177
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-28 00:37:35 +00:00
Liz Rice 5253a6dc72 regexp: add more examples for Regexp methods
Since I first started on this CL, most of the methods have had examples
added by other folks, so this is now one new example, and additions to
two existing examples for extra clarity.

The issue has a comment about not necessarily having examples for all
methods, but I recall finding this package pretty confusing when I first
used it, and having concrete examples would have really helped me
navigate all the different options. There are more
String methods with examples now, but I think seeing how the byte-slice
methods work could also be helpful to explain the differences.

Updates #21450

Change-Id: I27b4eeb634fb8ab59f791c0961cce79a67889826
Reviewed-on: https://go-review.googlesource.com/c/go/+/120145
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-18 20:26:36 +00:00
Pantelis Sampaziotis 68a6536848 regexp: add example for NumSubexp
Updates #21450

Change-Id: Idf276e97f816933cc0f752cdcd5e713b5c975833
GitHub-Last-Rev: 198e585f92
GitHub-Pull-Request: golang/go#33490
Reviewed-on: https://go-review.googlesource.com/c/go/+/189138
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-10 20:33:25 +00:00
psampaz 2b6b474f64 regexp: add example for ReplaceAll
Updates #21450

Change-Id: Ia31c20b52bae5daeb33d918234c2f0944a8aeb07
GitHub-Last-Rev: cc85544770
GitHub-Pull-Request: golang/go#33489
Reviewed-on: https://go-review.googlesource.com/c/go/+/189137
Run-TryBot: Sylvain Zimmer <sylvinus@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-05 23:52:39 +00:00
Daniel Martí 03ac39ce5e std: remove unused bits of code all over the place
Some were never used, and some haven't been used for years.

One exception is net/http's readerAndCloser, which was only used in a
test. Move it to a test file.

While at it, remove a check in regexp that could never fire; the field
is an uint32, so it can never be negative.

Change-Id: Ia2200f6afa106bae4034045ea8233b452f38747b
Reviewed-on: https://go-review.googlesource.com/c/go/+/192621
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-09-02 12:57:37 +00:00
Keegan Carruthers-Smith 648c7b592a regexp/syntax: exclude full range from String negation case
If the char class is 0x0-0x10ffff we mistakenly would String that to `[^]`,
which is not a valid regex.

Fixes #31807

Change-Id: I9ceeaddc28b67b8e1de12b6703bcb124cc784556
Reviewed-on: https://go-review.googlesource.com/c/go/+/175679
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-22 04:43:25 +00:00
Sylvain Zimmer 8116599f01 regexp: optimize for provably too short inputs
For many patterns we can compute the minimum input length at compile time.
If the input is shorter, we can return early and get a huge speedup.

As pointed out by Damian Gryski, Perl's regex engine contains a number of
these kinds of fail-fast optimizations:
https://perldoc.perl.org/perlreguts.html#Peep-hole-Optimisation-and-Analysis

Benchmarks: (including new ones for compile time)

name               old time/op    new time/op    delta
Compile/Onepass-8    4.39µs ± 1%    4.40µs ± 0%  +0.34%  (p=0.029 n=9+8)
Compile/Medium-8     9.80µs ± 0%    9.91µs ± 0%  +1.17%  (p=0.000 n=10+10)
Compile/Hard-8       72.7µs ± 0%    73.5µs ± 0%  +1.10%  (p=0.000 n=9+10)

name                       old time/op    new time/op      delta
Match/Easy0/16-8             52.6ns ± 5%       4.9ns ± 0%     -90.68%  (p=0.000 n=10+9)
Match/Easy0/32-8             64.1ns ±10%      61.4ns ± 1%        ~     (p=0.188 n=10+9)
Match/Easy0/1K-8              280ns ± 1%       277ns ± 2%      -0.97%  (p=0.004 n=10+10)
Match/Easy0/32K-8            4.61µs ± 1%      4.55µs ± 1%      -1.49%  (p=0.000 n=9+10)
Match/Easy0/1M-8              229µs ± 0%       226µs ± 1%      -1.29%  (p=0.000 n=8+10)
Match/Easy0/32M-8            7.50ms ± 1%      7.47ms ± 1%        ~     (p=0.165 n=10+10)
Match/Easy0i/16-8             533ns ± 1%         5ns ± 2%     -99.07%  (p=0.000 n=10+10)
Match/Easy0i/32-8             950ns ± 0%       950ns ± 1%        ~     (p=0.920 n=10+9)
Match/Easy0i/1K-8            27.5µs ± 1%      27.5µs ± 0%        ~     (p=0.739 n=10+10)
Match/Easy0i/32K-8           1.13ms ± 0%      1.13ms ± 1%        ~     (p=0.079 n=9+10)
Match/Easy0i/1M-8            36.7ms ± 2%      36.1ms ± 0%      -1.64%  (p=0.000 n=10+9)
Match/Easy0i/32M-8            1.17s ± 0%       1.16s ± 1%      -0.80%  (p=0.004 n=8+9)
Match/Easy1/16-8             55.5ns ± 6%       4.9ns ± 1%     -91.19%  (p=0.000 n=10+9)
Match/Easy1/32-8             58.3ns ± 8%      56.6ns ± 1%        ~     (p=0.449 n=10+8)
Match/Easy1/1K-8              750ns ± 0%       748ns ± 1%        ~     (p=0.072 n=8+10)
Match/Easy1/32K-8            31.8µs ± 0%      31.6µs ± 1%      -0.50%  (p=0.035 n=10+9)
Match/Easy1/1M-8             1.10ms ± 1%      1.09ms ± 0%      -0.95%  (p=0.000 n=10+9)
Match/Easy1/32M-8            35.5ms ± 0%      35.2ms ± 1%      -1.05%  (p=0.000 n=9+10)
Match/Medium/16-8             442ns ± 2%         5ns ± 1%     -98.89%  (p=0.000 n=10+10)
Match/Medium/32-8             875ns ± 0%       878ns ± 1%        ~     (p=0.071 n=9+10)
Match/Medium/1K-8            26.1µs ± 0%      25.9µs ± 0%      -0.64%  (p=0.000 n=10+10)
Match/Medium/32K-8           1.09ms ± 1%      1.08ms ± 0%      -0.84%  (p=0.000 n=10+9)
Match/Medium/1M-8            34.9ms ± 0%      34.6ms ± 1%      -0.98%  (p=0.000 n=9+10)
Match/Medium/32M-8            1.12s ± 1%       1.11s ± 1%      -0.98%  (p=0.000 n=10+9)
Match/Hard/16-8               721ns ± 1%         5ns ± 0%     -99.32%  (p=0.000 n=10+9)
Match/Hard/32-8              1.32µs ± 1%      1.31µs ± 0%      -0.71%  (p=0.000 n=9+9)
Match/Hard/1K-8              39.8µs ± 1%      39.7µs ± 1%        ~     (p=0.165 n=10+10)
Match/Hard/32K-8             1.57ms ± 0%      1.56ms ± 0%      -0.70%  (p=0.000 n=10+9)
Match/Hard/1M-8              50.4ms ± 1%      50.1ms ± 1%      -0.57%  (p=0.007 n=10+10)
Match/Hard/32M-8              1.62s ± 1%       1.60s ± 0%      -0.98%  (p=0.000 n=10+10)
Match/Hard1/16-8             3.88µs ± 1%      3.86µs ± 0%        ~     (p=0.118 n=10+10)
Match/Hard1/32-8             7.44µs ± 1%      7.46µs ± 1%        ~     (p=0.109 n=10+10)
Match/Hard1/1K-8              232µs ± 1%       229µs ± 1%      -1.31%  (p=0.000 n=10+9)
Match/Hard1/32K-8            7.41ms ± 2%      7.41ms ± 0%        ~     (p=0.237 n=10+8)
Match/Hard1/1M-8              238ms ± 1%       238ms ± 0%        ~     (p=0.481 n=10+10)
Match/Hard1/32M-8             7.69s ± 1%       7.61s ± 0%      -1.00%  (p=0.000 n=10+10)

Fixes #31329

Change-Id: I04640e8c59178ec8b3106e13ace9b109b6bdbc25
Reviewed-on: https://go-review.googlesource.com/c/go/+/171023
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-15 15:28:22 +00:00
Dmitry Vyukov 7f5434c44c regexp: clarify docs re Submatch result
Currently we say that a negative index means no match,
but we don't say how "no match" is expressed when 'Index'
is not present. Say how it is expressed.

Change-Id: I82b6c9038557ac49852ac03642afc0bc545bb4a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/175677
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-08 17:03:20 +00:00
Valentin Vidic 56f0c046cf regexp: add ReplaceAllStringFunc example
Change-Id: I016312f3ecf3dfcbf0eaf24e31b6842d80abb029
GitHub-Last-Rev: 360047c900
GitHub-Pull-Request: golang/go#30445
Reviewed-on: https://go-review.googlesource.com/c/164257
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-27 21:48:41 +00:00
Francesc Campoy 20930c7623 regexp: limit the capacity of slices of bytes returned by FindX
This change limits the capacity of the slices of bytes returned by:

- Find
- FindAll
- FindAllSubmatch

to be the same as their length.

Fixes #30169

Change-Id: I07b632757d2bfeab42fce0d42364e2a16c597360
Reviewed-on: https://go-review.googlesource.com/c/161877
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-26 23:51:17 +00:00
Vladimir Kovpak 55c55dda1f regexp: use backquotes for all regular expression examples
This commit performs replace double quote to backquote,
so now all examples looks consistent.

Change-Id: I8cf760ce1bdeff9619a88e531161b9516385241b
GitHub-Last-Rev: e3e636cebb
GitHub-Pull-Request: golang/go#28879
Reviewed-on: https://go-review.googlesource.com/c/150397
Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-20 12:05:15 +00:00
Vladimir Kovpak 108414946e regexp: add matching and finding examples
This commit adds examples for Match, Find,
FindAllSubmatch, FindSubmatch and Match functions.

Change-Id: I2bdf8c3cee6e89d618109397378c1fc91aaf1dfb
GitHub-Last-Rev: 33f34b7adc
GitHub-Pull-Request: golang/go#28837
Reviewed-on: https://go-review.googlesource.com/c/150020
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-19 23:00:33 +00:00
Brad Fitzpatrick 3813edf26e all: use "reports whether" consistently in the few places that didn't
Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns true if")

This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.

(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)

Created with:

$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)

Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-02 22:47:58 +00:00
Russ Cox bf68744a12 regexp: add partial Deprecation comment to Copy
Change-Id: I21b7817e604a48330f1ee250f7b1b2adc1f16067
Reviewed-on: https://go-review.googlesource.com/c/139784
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 17:48:44 +00:00
Russ Cox 5160e0d18c regexp: add DeepEqual test
This locks in behavior we accidentally broke
and then restored during the Go 1.11 cycle.
See #26219.

It also locks in new behavior that DeepEqual
always works, instead of only usually working.

This CL is the final piece of a series of CLs to make
DeepEqual always work, by eliminating the machine
cache and making other related optimizations.
Overall, this whole sequence of CLs achieves:

name                             old time/op    new time/op    delta
Find-12                             264ns ± 3%     260ns ± 0%   -1.59%  (p=0.000 n=10+9)
FindAllNoMatches-12                 140ns ± 2%     133ns ± 0%   -5.34%  (p=0.000 n=10+7)
FindString-12                       256ns ± 0%     249ns ± 0%   -2.73%  (p=0.000 n=8+8)
FindSubmatch-12                     339ns ± 1%     333ns ± 1%   -1.73%  (p=0.000 n=9+10)
FindStringSubmatch-12               322ns ± 0%     322ns ± 1%     ~     (p=0.450 n=8+10)
Literal-12                          100ns ± 2%      92ns ± 0%   -8.13%  (p=0.000 n=10+10)
NotLiteral-12                      1.50µs ± 0%    1.47µs ± 0%   -1.65%  (p=0.000 n=8+8)
MatchClass-12                      2.18µs ± 0%    2.15µs ± 0%   -1.05%  (p=0.000 n=10+9)
MatchClass_InRange-12              2.12µs ± 0%    2.11µs ± 0%   -0.65%  (p=0.000 n=10+9)
ReplaceAll-12                      1.41µs ± 0%    1.41µs ± 0%     ~     (p=0.254 n=7+10)
AnchoredLiteralShortNonMatch-12    89.8ns ± 0%    81.5ns ± 0%   -9.22%  (p=0.000 n=8+9)
AnchoredLiteralLongNonMatch-12      105ns ± 3%      97ns ± 0%   -7.21%  (p=0.000 n=10+10)
AnchoredShortMatch-12               141ns ± 0%     128ns ± 0%   -9.22%  (p=0.000 n=9+9)
AnchoredLongMatch-12                276ns ± 4%     253ns ± 2%   -8.23%  (p=0.000 n=10+10)
OnePassShortA-12                    620ns ± 0%     587ns ± 0%   -5.26%  (p=0.000 n=10+6)
NotOnePassShortA-12                 575ns ± 3%     547ns ± 1%   -4.77%  (p=0.000 n=10+10)
OnePassShortB-12                    493ns ± 0%     455ns ± 0%   -7.62%  (p=0.000 n=8+9)
NotOnePassShortB-12                 423ns ± 0%     406ns ± 1%   -3.95%  (p=0.000 n=8+10)
OnePassLongPrefix-12                112ns ± 0%     109ns ± 1%   -2.77%  (p=0.000 n=9+10)
OnePassLongNotPrefix-12             405ns ± 0%     349ns ± 0%  -13.74%  (p=0.000 n=8+9)
MatchParallelShared-12              501ns ± 1%      38ns ± 2%  -92.42%  (p=0.000 n=10+10)
MatchParallelCopied-12             39.1ns ± 0%    38.6ns ± 1%   -1.38%  (p=0.002 n=6+10)
QuoteMetaAll-12                    94.6ns ± 0%    94.8ns ± 0%   +0.26%  (p=0.001 n=10+9)
QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
Match/Easy0/32-12                  79.1ns ± 0%    72.0ns ± 0%   -8.95%  (p=0.000 n=9+9)
Match/Easy0/1K-12                   307ns ± 1%     297ns ± 0%   -3.32%  (p=0.000 n=10+7)
Match/Easy0/32K-12                 4.65µs ± 2%    4.67µs ± 1%     ~     (p=0.633 n=10+8)
Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.684 n=10+10)
Match/Easy0/32M-12                 7.98ms ± 1%    7.96ms ± 0%   -0.31%  (p=0.014 n=9+9)
Match/Easy0i/32-12                 1.13µs ± 1%    1.10µs ± 0%   -3.18%  (p=0.000 n=9+10)
Match/Easy0i/1K-12                 32.5µs ± 0%    31.7µs ± 0%   -2.61%  (p=0.000 n=9+9)
Match/Easy0i/32K-12                1.59ms ± 0%    1.26ms ± 0%  -20.71%  (p=0.000 n=9+7)
Match/Easy0i/1M-12                 51.0ms ± 0%    40.4ms ± 0%  -20.68%  (p=0.000 n=10+7)
Match/Easy0i/32M-12                 1.63s ± 0%     1.30s ± 0%  -20.62%  (p=0.001 n=7+7)
Match/Easy1/32-12                  75.1ns ± 1%    67.4ns ± 0%  -10.24%  (p=0.000 n=8+10)
Match/Easy1/1K-12                   861ns ± 0%     879ns ± 0%   +2.18%  (p=0.000 n=8+8)
Match/Easy1/32K-12                 39.2µs ± 1%    34.1µs ± 0%  -13.01%  (p=0.000 n=10+8)
Match/Easy1/1M-12                  1.38ms ± 0%    1.17ms ± 0%  -15.06%  (p=0.000 n=10+8)
Match/Easy1/32M-12                 44.2ms ± 1%    37.5ms ± 0%  -15.15%  (p=0.000 n=10+9)
Match/Medium/32-12                 1.04µs ± 1%    1.03µs ± 0%   -0.64%  (p=0.002 n=9+8)
Match/Medium/1K-12                 31.3µs ± 0%    31.2µs ± 0%   -0.36%  (p=0.000 n=9+9)
Match/Medium/32K-12                1.44ms ± 0%    1.20ms ± 0%  -17.02%  (p=0.000 n=8+7)
Match/Medium/1M-12                 46.1ms ± 0%    38.2ms ± 0%  -17.14%  (p=0.001 n=6+8)
Match/Medium/32M-12                 1.48s ± 0%     1.23s ± 0%  -17.10%  (p=0.000 n=9+7)
Match/Hard/32-12                   1.54µs ± 1%    1.47µs ± 0%   -4.64%  (p=0.000 n=9+10)
Match/Hard/1K-12                   46.4µs ± 1%    44.4µs ± 0%   -4.35%  (p=0.000 n=9+8)
Match/Hard/32K-12                  2.19ms ± 0%    1.78ms ± 7%  -18.74%  (p=0.000 n=8+10)
Match/Hard/1M-12                   70.1ms ± 0%    57.7ms ± 7%  -17.62%  (p=0.000 n=8+10)
Match/Hard/32M-12                   2.24s ± 0%     1.84s ± 8%  -17.92%  (p=0.000 n=8+10)
Match/Hard1/32-12                  8.17µs ± 1%    7.95µs ± 0%   -2.72%  (p=0.000 n=8+10)
Match/Hard1/1K-12                   254µs ± 2%     245µs ± 0%   -3.62%  (p=0.000 n=9+10)
Match/Hard1/32K-12                 9.58ms ± 1%    8.54ms ± 7%  -10.87%  (p=0.000 n=10+10)
Match/Hard1/1M-12                   306ms ± 1%     271ms ± 8%  -11.42%  (p=0.000 n=9+10)
Match/Hard1/32M-12                  9.79s ± 1%     8.58s ± 9%  -12.37%  (p=0.000 n=9+10)
Match_onepass_regex/32-12           808ns ± 0%     716ns ± 1%  -11.39%  (p=0.000 n=8+9)
Match_onepass_regex/1K-12          27.8µs ± 0%    19.9µs ± 2%  -28.51%  (p=0.000 n=8+9)
Match_onepass_regex/32K-12          925µs ± 0%     631µs ± 2%  -31.71%  (p=0.000 n=9+9)
Match_onepass_regex/1M-12          29.5ms ± 0%    20.2ms ± 2%  -31.53%  (p=0.000 n=10+9)
Match_onepass_regex/32M-12          945ms ± 0%     648ms ± 2%  -31.39%  (p=0.000 n=9+9)
CompileOnepass-12                  4.67µs ± 0%    4.60µs ± 0%   -1.48%  (p=0.000 n=10+10)
[Geo mean]                         24.5µs         21.4µs       -12.94%

https://perf.golang.org/search?q=upload:20181004.5

Change-Id: Icb17b306830dc5489efbb55900937b94ce0eb047
Reviewed-on: https://go-review.googlesource.com/c/139783
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 17:48:42 +00:00
Russ Cox 3ca1f28e54 regexp: evaluate context flags lazily
There's no point in computing whether we're at the
beginning of the line if the NFA isn't going to ask.
Wait to compute that until asked.

Whatever minor slowdowns were introduced by
the conversion to pools that were not repaid by
other optimizations are taken care of by this one.

name                             old time/op    new time/op    delta
Find-12                             252ns ± 0%     260ns ± 0%   +3.34%  (p=0.000 n=10+8)
FindAllNoMatches-12                 136ns ± 4%     134ns ± 4%   -0.96%  (p=0.033 n=10+10)
FindString-12                       246ns ± 0%     250ns ± 0%   +1.46%  (p=0.000 n=8+10)
FindSubmatch-12                     332ns ± 1%     332ns ± 0%     ~     (p=0.101 n=9+10)
FindStringSubmatch-12               321ns ± 1%     322ns ± 1%     ~     (p=0.717 n=9+10)
Literal-12                         91.6ns ± 0%    92.3ns ± 0%   +0.74%  (p=0.000 n=9+9)
NotLiteral-12                      1.47µs ± 0%    1.47µs ± 0%   +0.38%  (p=0.000 n=9+8)
MatchClass-12                      2.15µs ± 0%    2.15µs ± 0%   +0.39%  (p=0.000 n=10+10)
MatchClass_InRange-12              2.09µs ± 0%    2.11µs ± 0%   +0.75%  (p=0.000 n=9+9)
ReplaceAll-12                      1.40µs ± 0%    1.40µs ± 0%     ~     (p=0.525 n=10+10)
AnchoredLiteralShortNonMatch-12    83.5ns ± 0%    81.6ns ± 0%   -2.28%  (p=0.000 n=9+10)
AnchoredLiteralLongNonMatch-12      101ns ± 0%      97ns ± 1%   -3.54%  (p=0.000 n=10+10)
AnchoredShortMatch-12               131ns ± 0%     128ns ± 0%   -2.29%  (p=0.000 n=10+9)
AnchoredLongMatch-12                268ns ± 1%     252ns ± 1%   -6.04%  (p=0.000 n=10+10)
OnePassShortA-12                    614ns ± 0%     587ns ± 1%   -4.33%  (p=0.000 n=6+10)
NotOnePassShortA-12                 552ns ± 0%     547ns ± 1%   -0.89%  (p=0.000 n=10+10)
OnePassShortB-12                    494ns ± 0%     455ns ± 0%   -7.96%  (p=0.000 n=9+9)
NotOnePassShortB-12                 411ns ± 0%     406ns ± 0%   -1.30%  (p=0.000 n=9+9)
OnePassLongPrefix-12                109ns ± 0%     108ns ± 1%     ~     (p=0.064 n=8+9)
OnePassLongNotPrefix-12             403ns ± 0%     349ns ± 0%  -13.30%  (p=0.000 n=9+8)
MatchParallelShared-12             38.9ns ± 1%    37.9ns ± 1%   -2.65%  (p=0.000 n=10+8)
MatchParallelCopied-12             39.2ns ± 1%    38.3ns ± 2%   -2.20%  (p=0.001 n=10+10)
QuoteMetaAll-12                    94.5ns ± 0%    94.7ns ± 0%   +0.18%  (p=0.043 n=10+9)
QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
Match/Easy0/32-12                  72.2ns ± 0%    71.9ns ± 0%   -0.38%  (p=0.009 n=8+10)
Match/Easy0/1K-12                   296ns ± 1%     297ns ± 0%   +0.51%  (p=0.001 n=10+9)
Match/Easy0/32K-12                 4.57µs ± 3%    4.61µs ± 2%     ~     (p=0.280 n=10+10)
Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.986 n=10+10)
Match/Easy0/32M-12                 7.96ms ± 0%    7.98ms ± 0%   +0.22%  (p=0.010 n=10+9)
Match/Easy0i/32-12                 1.09µs ± 0%    1.10µs ± 0%   +0.23%  (p=0.000 n=8+9)
Match/Easy0i/1K-12                 31.7µs ± 0%    31.7µs ± 0%   +0.09%  (p=0.003 n=9+8)
Match/Easy0i/32K-12                1.61ms ± 0%    1.27ms ± 1%  -21.03%  (p=0.000 n=8+10)
Match/Easy0i/1M-12                 51.4ms ± 0%    40.4ms ± 0%  -21.29%  (p=0.000 n=8+8)
Match/Easy0i/32M-12                 1.65s ± 0%     1.30s ± 1%  -21.22%  (p=0.000 n=9+9)
Match/Easy1/32-12                  67.6ns ± 1%    67.2ns ± 0%     ~     (p=0.085 n=10+9)
Match/Easy1/1K-12                   873ns ± 2%     880ns ± 0%   +0.78%  (p=0.006 n=9+7)
Match/Easy1/32K-12                 39.7µs ± 1%    34.3µs ± 3%  -13.53%  (p=0.000 n=10+10)
Match/Easy1/1M-12                  1.41ms ± 1%    1.19ms ± 3%  -15.48%  (p=0.000 n=10+10)
Match/Easy1/32M-12                 44.9ms ± 1%    38.0ms ± 2%  -15.21%  (p=0.000 n=10+10)
Match/Medium/32-12                 1.04µs ± 0%    1.03µs ± 0%   -0.57%  (p=0.000 n=9+9)
Match/Medium/1K-12                 31.2µs ± 0%    31.4µs ± 1%   +0.61%  (p=0.000 n=8+10)
Match/Medium/32K-12                1.45ms ± 1%    1.20ms ± 0%  -17.70%  (p=0.000 n=10+8)
Match/Medium/1M-12                 46.4ms ± 0%    38.4ms ± 2%  -17.32%  (p=0.000 n=6+9)
Match/Medium/32M-12                 1.49s ± 1%     1.24s ± 1%  -16.81%  (p=0.000 n=10+10)
Match/Hard/32-12                   1.47µs ± 0%    1.47µs ± 0%   -0.31%  (p=0.000 n=9+10)
Match/Hard/1K-12                   44.5µs ± 1%    44.4µs ± 0%     ~     (p=0.075 n=10+10)
Match/Hard/32K-12                  2.09ms ± 0%    1.78ms ± 7%  -14.88%  (p=0.000 n=8+10)
Match/Hard/1M-12                   67.8ms ± 5%    56.9ms ± 7%  -16.05%  (p=0.000 n=10+10)
Match/Hard/32M-12                   2.17s ± 5%     1.84s ± 6%  -15.21%  (p=0.000 n=10+10)
Match/Hard1/32-12                  7.89µs ± 0%    7.94µs ± 0%   +0.61%  (p=0.000 n=9+9)
Match/Hard1/1K-12                   246µs ± 0%     245µs ± 0%   -0.30%  (p=0.010 n=9+10)
Match/Hard1/32K-12                 8.93ms ± 0%    8.17ms ± 0%   -8.44%  (p=0.000 n=9+8)
Match/Hard1/1M-12                   286ms ± 0%     269ms ± 9%   -5.66%  (p=0.028 n=9+10)
Match/Hard1/32M-12                  9.16s ± 0%     8.61s ± 8%   -5.98%  (p=0.028 n=9+10)
Match_onepass_regex/32-12           825ns ± 0%     712ns ± 0%  -13.75%  (p=0.000 n=8+8)
Match_onepass_regex/1K-12          28.7µs ± 1%    19.8µs ± 0%  -30.99%  (p=0.000 n=9+8)
Match_onepass_regex/32K-12          950µs ± 1%     628µs ± 0%  -33.83%  (p=0.000 n=9+8)
Match_onepass_regex/1M-12          30.4ms ± 0%    20.1ms ± 0%  -33.74%  (p=0.000 n=9+8)
Match_onepass_regex/32M-12          974ms ± 1%     646ms ± 0%  -33.73%  (p=0.000 n=9+8)
CompileOnepass-12                  4.60µs ± 0%    4.59µs ± 0%     ~     (p=0.063 n=8+9)
[Geo mean]                         23.1µs         21.3µs        -7.44%

https://perf.golang.org/search?q=upload:20181004.4

Change-Id: I47cdd09f6dcde1d7c317080e9b4df42c7d0a8d24
Reviewed-on: https://go-review.googlesource.com/c/139782
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 17:48:41 +00:00
Russ Cox a376435ae5 regexp: use pools for NFA machines
Now the machine struct is only used for NFA execution.
Use global pools to cache machines instead of per-Regexp lists.

Also eliminate some tail calls in NFA execution, to pay for
the added overhead of sync.Pool.

name                             old time/op    new time/op    delta
Find-12                             252ns ± 0%     252ns ± 0%     ~     (p=1.000 n=10+10)
FindAllNoMatches-12                 134ns ± 1%     136ns ± 4%     ~     (p=0.443 n=9+10)
FindString-12                       246ns ± 0%     246ns ± 0%   -0.16%  (p=0.046 n=10+8)
FindSubmatch-12                     333ns ± 2%     332ns ± 1%     ~     (p=0.489 n=10+9)
FindStringSubmatch-12               320ns ± 0%     321ns ± 1%   +0.55%  (p=0.005 n=10+9)
Literal-12                         91.1ns ± 0%    91.6ns ± 0%   +0.55%  (p=0.000 n=10+9)
NotLiteral-12                      1.45µs ± 0%    1.47µs ± 0%   +0.82%  (p=0.000 n=10+9)
MatchClass-12                      2.19µs ± 0%    2.15µs ± 0%   -2.01%  (p=0.000 n=9+10)
MatchClass_InRange-12              2.09µs ± 0%    2.09µs ± 0%     ~     (p=0.082 n=10+9)
ReplaceAll-12                      1.39µs ± 0%    1.40µs ± 0%   +0.50%  (p=0.000 n=10+10)
AnchoredLiteralShortNonMatch-12    82.4ns ± 0%    83.5ns ± 0%   +1.36%  (p=0.000 n=8+9)
AnchoredLiteralLongNonMatch-12      106ns ± 1%     101ns ± 0%   -4.36%  (p=0.000 n=10+10)
AnchoredShortMatch-12               130ns ± 0%     131ns ± 0%   +0.77%  (p=0.000 n=9+10)
AnchoredLongMatch-12                272ns ± 0%     268ns ± 1%   -1.46%  (p=0.000 n=8+10)
OnePassShortA-12                    615ns ± 0%     614ns ± 0%     ~     (p=0.094 n=10+6)
NotOnePassShortA-12                 549ns ± 0%     552ns ± 0%   +0.52%  (p=0.000 n=9+10)
OnePassShortB-12                    494ns ± 0%     494ns ± 0%     ~     (p=0.247 n=8+9)
NotOnePassShortB-12                 412ns ± 1%     411ns ± 0%     ~     (p=0.625 n=10+9)
OnePassLongPrefix-12                108ns ± 0%     109ns ± 0%   +0.93%  (p=0.000 n=10+8)
OnePassLongNotPrefix-12             402ns ± 0%     403ns ± 0%   +0.14%  (p=0.041 n=8+9)
MatchParallelShared-12             38.6ns ± 2%    38.9ns ± 1%     ~     (p=0.172 n=9+10)
MatchParallelCopied-12             39.4ns ± 7%    39.2ns ± 1%     ~     (p=0.423 n=10+10)
QuoteMetaAll-12                    94.9ns ± 0%    94.5ns ± 0%   -0.42%  (p=0.000 n=9+10)
QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
Match/Easy0/32-12                  72.1ns ± 0%    72.2ns ± 0%     ~     (p=0.435 n=9+8)
Match/Easy0/1K-12                   298ns ± 0%     296ns ± 1%   -1.01%  (p=0.000 n=8+10)
Match/Easy0/32K-12                 4.64µs ± 1%    4.57µs ± 3%   -1.39%  (p=0.030 n=10+10)
Match/Easy0/1M-12                   234µs ± 0%     234µs ± 0%     ~     (p=0.971 n=10+10)
Match/Easy0/32M-12                 7.95ms ± 0%    7.96ms ± 0%     ~     (p=0.278 n=9+10)
Match/Easy0i/32-12                 1.10µs ± 0%    1.09µs ± 0%   -0.29%  (p=0.000 n=9+8)
Match/Easy0i/1K-12                 31.8µs ± 1%    31.7µs ± 0%     ~     (p=0.704 n=10+9)
Match/Easy0i/32K-12                1.62ms ± 1%    1.61ms ± 0%   -1.12%  (p=0.000 n=10+8)
Match/Easy0i/1M-12                 51.8ms ± 0%    51.4ms ± 0%   -0.84%  (p=0.000 n=8+8)
Match/Easy0i/32M-12                 1.65s ± 0%     1.65s ± 0%   -0.46%  (p=0.000 n=9+9)
Match/Easy1/32-12                  67.7ns ± 1%    67.6ns ± 1%     ~     (p=0.723 n=10+10)
Match/Easy1/1K-12                   873ns ± 0%     873ns ± 2%     ~     (p=0.345 n=10+9)
Match/Easy1/32K-12                 39.4µs ± 0%    39.7µs ± 1%   +0.66%  (p=0.000 n=10+10)
Match/Easy1/1M-12                  1.39ms ± 0%    1.41ms ± 1%   +1.10%  (p=0.000 n=10+10)
Match/Easy1/32M-12                 44.3ms ± 0%    44.9ms ± 1%   +1.18%  (p=0.000 n=10+10)
Match/Medium/32-12                 1.04µs ± 0%    1.04µs ± 0%   -0.58%  (p=0.000 n=9+9)
Match/Medium/1K-12                 31.4µs ± 0%    31.2µs ± 0%   -0.62%  (p=0.000 n=8+8)
Match/Medium/32K-12                1.45ms ± 0%    1.45ms ± 1%     ~     (p=0.356 n=9+10)
Match/Medium/1M-12                 46.4ms ± 0%    46.4ms ± 0%     ~     (p=0.142 n=8+6)
Match/Medium/32M-12                 1.49s ± 1%     1.49s ± 1%     ~     (p=0.739 n=10+10)
Match/Hard/32-12                   1.48µs ± 0%    1.47µs ± 0%   -0.53%  (p=0.000 n=9+9)
Match/Hard/1K-12                   45.0µs ± 1%    44.5µs ± 1%   -1.06%  (p=0.000 n=10+10)
Match/Hard/32K-12                  2.24ms ± 0%    2.09ms ± 0%   -6.56%  (p=0.000 n=8+8)
Match/Hard/1M-12                   71.6ms ± 0%    67.8ms ± 5%   -5.36%  (p=0.000 n=7+10)
Match/Hard/32M-12                   2.29s ± 0%     2.17s ± 5%   -5.40%  (p=0.000 n=9+10)
Match/Hard1/32-12                  7.89µs ± 0%    7.89µs ± 0%     ~     (p=0.053 n=9+9)
Match/Hard1/1K-12                   244µs ± 0%     246µs ± 0%   +0.71%  (p=0.000 n=10+9)
Match/Hard1/32K-12                 10.3ms ± 0%     8.9ms ± 0%  -13.76%  (p=0.000 n=10+9)
Match/Hard1/1M-12                   331ms ± 0%     286ms ± 0%  -13.72%  (p=0.000 n=9+9)
Match/Hard1/32M-12                  10.6s ± 0%      9.2s ± 0%  -13.72%  (p=0.000 n=10+9)
Match_onepass_regex/32-12           830ns ± 0%     825ns ± 0%   -0.57%  (p=0.000 n=9+8)
Match_onepass_regex/1K-12          28.7µs ± 1%    28.7µs ± 1%   -0.22%  (p=0.040 n=9+9)
Match_onepass_regex/32K-12          949µs ± 0%     950µs ± 1%     ~     (p=0.236 n=8+9)
Match_onepass_regex/1M-12          30.4ms ± 0%    30.4ms ± 0%     ~     (p=0.059 n=8+9)
Match_onepass_regex/32M-12          973ms ± 0%     974ms ± 1%     ~     (p=0.258 n=9+9)
CompileOnepass-12                  4.64µs ± 0%    4.60µs ± 0%   -0.90%  (p=0.000 n=10+8)
[Geo mean]                         23.3µs         23.1µs        -1.16%

https://perf.golang.org/search?q=upload:20181004.3

Change-Id: I46f3d52ce89c8cd992cf554473c27af81fd81bfd
Reviewed-on: https://go-review.googlesource.com/c/139781
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 17:48:36 +00:00
Russ Cox 60b2971180 regexp: split one-pass execution out of machine struct
This allows the one-pass executions to have their
own pool of (much smaller) allocated structures.
A step toward eliminating the per-Regexp machine cache.

Not much effect on benchmarks, since there are no
optimizations here, and pools are a tiny bit slower than a
locked data structure for single-threaded code.

name                             old time/op    new time/op    delta
Find-12                             254ns ± 0%     252ns ± 0%  -0.94%  (p=0.000 n=9+10)
FindAllNoMatches-12                 135ns ± 0%     134ns ± 1%  -0.49%  (p=0.002 n=9+9)
FindString-12                       247ns ± 0%     246ns ± 0%  -0.24%  (p=0.003 n=8+10)
FindSubmatch-12                     334ns ± 0%     333ns ± 2%    ~     (p=0.283 n=10+10)
FindStringSubmatch-12               321ns ± 0%     320ns ± 0%  -0.51%  (p=0.000 n=9+10)
Literal-12                         92.2ns ± 0%    91.1ns ± 0%  -1.25%  (p=0.000 n=9+10)
NotLiteral-12                      1.47µs ± 0%    1.45µs ± 0%  -0.99%  (p=0.000 n=9+10)
MatchClass-12                      2.17µs ± 0%    2.19µs ± 0%  +0.84%  (p=0.000 n=7+9)
MatchClass_InRange-12              2.13µs ± 0%    2.09µs ± 0%  -1.70%  (p=0.000 n=10+10)
ReplaceAll-12                      1.39µs ± 0%    1.39µs ± 0%  +0.51%  (p=0.000 n=10+10)
AnchoredLiteralShortNonMatch-12    83.2ns ± 0%    82.4ns ± 0%  -0.96%  (p=0.000 n=8+8)
AnchoredLiteralLongNonMatch-12      105ns ± 0%     106ns ± 1%    ~     (p=0.087 n=10+10)
AnchoredShortMatch-12               131ns ± 0%     130ns ± 0%  -0.76%  (p=0.000 n=10+9)
AnchoredLongMatch-12                267ns ± 0%     272ns ± 0%  +2.01%  (p=0.000 n=10+8)
OnePassShortA-12                    611ns ± 0%     615ns ± 0%  +0.61%  (p=0.000 n=9+10)
NotOnePassShortA-12                 552ns ± 0%     549ns ± 0%  -0.46%  (p=0.000 n=8+9)
OnePassShortB-12                    491ns ± 0%     494ns ± 0%  +0.61%  (p=0.000 n=8+8)
NotOnePassShortB-12                 412ns ± 0%     412ns ± 1%    ~     (p=0.151 n=9+10)
OnePassLongPrefix-12                112ns ± 0%     108ns ± 0%  -3.57%  (p=0.000 n=10+10)
OnePassLongNotPrefix-12             410ns ± 0%     402ns ± 0%  -1.95%  (p=0.000 n=9+8)
MatchParallelShared-12             38.8ns ± 1%    38.6ns ± 2%    ~     (p=0.536 n=10+9)
MatchParallelCopied-12             39.2ns ± 3%    39.4ns ± 7%    ~     (p=0.986 n=10+10)
QuoteMetaAll-12                    94.6ns ± 0%    94.9ns ± 0%  +0.29%  (p=0.001 n=8+9)
QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%    ~     (all equal)
Match/Easy0/32-12                  72.9ns ± 0%    72.1ns ± 0%  -1.07%  (p=0.000 n=9+9)
Match/Easy0/1K-12                   298ns ± 0%     298ns ± 0%    ~     (p=0.140 n=6+8)
Match/Easy0/32K-12                 4.60µs ± 2%    4.64µs ± 1%    ~     (p=0.171 n=10+10)
Match/Easy0/1M-12                   235µs ± 0%     234µs ± 0%  -0.14%  (p=0.004 n=10+10)
Match/Easy0/32M-12                 7.96ms ± 0%    7.95ms ± 0%  -0.12%  (p=0.043 n=10+9)
Match/Easy0i/32-12                 1.09µs ± 0%    1.10µs ± 0%  +0.15%  (p=0.000 n=8+9)
Match/Easy0i/1K-12                 31.7µs ± 0%    31.8µs ± 1%    ~     (p=0.905 n=9+10)
Match/Easy0i/32K-12                1.61ms ± 0%    1.62ms ± 1%  +1.12%  (p=0.000 n=9+10)
Match/Easy0i/1M-12                 51.4ms ± 0%    51.8ms ± 0%  +0.85%  (p=0.000 n=8+8)
Match/Easy0i/32M-12                 1.65s ± 1%     1.65s ± 0%    ~     (p=0.113 n=9+9)
Match/Easy1/32-12                  67.9ns ± 0%    67.7ns ± 1%    ~     (p=0.232 n=8+10)
Match/Easy1/1K-12                   884ns ± 0%     873ns ± 0%  -1.29%  (p=0.000 n=9+10)
Match/Easy1/32K-12                 39.2µs ± 0%    39.4µs ± 0%  +0.50%  (p=0.000 n=9+10)
Match/Easy1/1M-12                  1.39ms ± 0%    1.39ms ± 0%  +0.29%  (p=0.000 n=9+10)
Match/Easy1/32M-12                 44.2ms ± 1%    44.3ms ± 0%  +0.21%  (p=0.029 n=10+10)
Match/Medium/32-12                 1.05µs ± 0%    1.04µs ± 0%  -0.27%  (p=0.001 n=8+9)
Match/Medium/1K-12                 31.3µs ± 0%    31.4µs ± 0%  +0.39%  (p=0.000 n=9+8)
Match/Medium/32K-12                1.45ms ± 0%    1.45ms ± 0%  +0.33%  (p=0.000 n=8+9)
Match/Medium/1M-12                 46.2ms ± 0%    46.4ms ± 0%  +0.35%  (p=0.000 n=9+8)
Match/Medium/32M-12                 1.48s ± 0%     1.49s ± 1%  +0.70%  (p=0.000 n=8+10)
Match/Hard/32-12                   1.49µs ± 0%    1.48µs ± 0%  -0.43%  (p=0.000 n=10+9)
Match/Hard/1K-12                   45.1µs ± 1%    45.0µs ± 1%    ~     (p=0.393 n=10+10)
Match/Hard/32K-12                  2.18ms ± 1%    2.24ms ± 0%  +2.71%  (p=0.000 n=9+8)
Match/Hard/1M-12                   69.7ms ± 1%    71.6ms ± 0%  +2.76%  (p=0.000 n=9+7)
Match/Hard/32M-12                   2.23s ± 1%     2.29s ± 0%  +2.65%  (p=0.000 n=9+9)
Match/Hard1/32-12                  7.89µs ± 0%    7.89µs ± 0%    ~     (p=0.286 n=9+9)
Match/Hard1/1K-12                   244µs ± 0%     244µs ± 0%    ~     (p=0.905 n=9+10)
Match/Hard1/32K-12                 10.3ms ± 0%    10.3ms ± 0%    ~     (p=0.796 n=10+10)
Match/Hard1/1M-12                   331ms ± 0%     331ms ± 0%    ~     (p=0.167 n=8+9)
Match/Hard1/32M-12                  10.6s ± 0%     10.6s ± 0%    ~     (p=0.315 n=8+10)
Match_onepass_regex/32-12           812ns ± 0%     830ns ± 0%  +2.19%  (p=0.000 n=10+9)
Match_onepass_regex/1K-12          28.5µs ± 0%    28.7µs ± 1%  +0.97%  (p=0.000 n=10+9)
Match_onepass_regex/32K-12          936µs ± 0%     949µs ± 0%  +1.43%  (p=0.000 n=10+8)
Match_onepass_regex/1M-12          30.2ms ± 0%    30.4ms ± 0%  +0.62%  (p=0.000 n=10+8)
Match_onepass_regex/32M-12          970ms ± 0%     973ms ± 0%  +0.35%  (p=0.000 n=10+9)
CompileOnepass-12                  4.63µs ± 1%    4.64µs ± 0%    ~     (p=0.060 n=10+10)
[Geo mean]                         23.3µs         23.3µs       +0.12%

https://perf.golang.org/search?q=upload:20181004.2

Change-Id: Iff9e9f9d4a4698162126a2f300e8ed1b1a39361e
Reviewed-on: https://go-review.googlesource.com/c/139780
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-12 17:48:35 +00:00
Russ Cox 2d4346b319 regexp: split bit-state execution out of machine struct
This allows the bit-state executions to have their
own pool of allocated structures. A step toward
eliminating the per-Regexp machine cache.

Note especially the -92% on MatchParallelShared.
This is real but not a complete story: the other
execution engines still need to be de-shared,
but the benchmark was only using bit-state.

The tiny slowdowns in unrelated code are noise.

name                             old time/op    new time/op    delta
Find-12                             264ns ± 3%     254ns ± 0%   -3.86%  (p=0.000 n=10+9)
FindAllNoMatches-12                 140ns ± 2%     135ns ± 0%   -3.91%  (p=0.000 n=10+9)
FindString-12                       256ns ± 0%     247ns ± 0%   -3.52%  (p=0.000 n=8+8)
FindSubmatch-12                     339ns ± 1%     334ns ± 0%   -1.41%  (p=0.000 n=9+10)
FindStringSubmatch-12               322ns ± 0%     321ns ± 0%   -0.21%  (p=0.005 n=8+9)
Literal-12                          100ns ± 2%      92ns ± 0%   -8.10%  (p=0.000 n=10+9)
NotLiteral-12                      1.50µs ± 0%    1.47µs ± 0%   -1.91%  (p=0.000 n=8+9)
MatchClass-12                      2.18µs ± 0%    2.17µs ± 0%   -0.20%  (p=0.001 n=10+7)
MatchClass_InRange-12              2.12µs ± 0%    2.13µs ± 0%   +0.23%  (p=0.000 n=10+10)
ReplaceAll-12                      1.41µs ± 0%    1.39µs ± 0%   -1.30%  (p=0.000 n=7+10)
AnchoredLiteralShortNonMatch-12    89.8ns ± 0%    83.2ns ± 0%   -7.35%  (p=0.000 n=8+8)
AnchoredLiteralLongNonMatch-12      105ns ± 3%     105ns ± 0%     ~     (p=0.186 n=10+10)
AnchoredShortMatch-12               141ns ± 0%     131ns ± 0%   -7.09%  (p=0.000 n=9+10)
AnchoredLongMatch-12                276ns ± 4%     267ns ± 0%   -3.23%  (p=0.000 n=10+10)
OnePassShortA-12                    620ns ± 0%     611ns ± 0%   -1.39%  (p=0.000 n=10+9)
NotOnePassShortA-12                 575ns ± 3%     552ns ± 0%   -3.97%  (p=0.000 n=10+8)
OnePassShortB-12                    493ns ± 0%     491ns ± 0%   -0.33%  (p=0.000 n=8+8)
NotOnePassShortB-12                 423ns ± 0%     412ns ± 0%   -2.60%  (p=0.000 n=8+9)
OnePassLongPrefix-12                112ns ± 0%     112ns ± 0%     ~     (all equal)
OnePassLongNotPrefix-12             405ns ± 0%     410ns ± 0%   +1.23%  (p=0.000 n=8+9)
MatchParallelShared-12              501ns ± 1%      39ns ± 1%  -92.27%  (p=0.000 n=10+10)
MatchParallelCopied-12             39.1ns ± 0%    39.2ns ± 3%     ~     (p=0.785 n=6+10)
QuoteMetaAll-12                    94.6ns ± 0%    94.6ns ± 0%     ~     (p=0.439 n=10+8)
QuoteMetaNone-12                   52.7ns ± 0%    52.7ns ± 0%     ~     (all equal)
Match/Easy0/32-12                  79.1ns ± 0%    72.9ns ± 0%   -7.85%  (p=0.000 n=9+9)
Match/Easy0/1K-12                   307ns ± 1%     298ns ± 0%   -2.99%  (p=0.000 n=10+6)
Match/Easy0/32K-12                 4.65µs ± 2%    4.60µs ± 2%     ~     (p=0.159 n=10+10)
Match/Easy0/1M-12                   234µs ± 0%     235µs ± 0%   +0.17%  (p=0.003 n=10+10)
Match/Easy0/32M-12                 7.98ms ± 1%    7.96ms ± 0%     ~     (p=0.278 n=9+10)
Match/Easy0i/32-12                 1.13µs ± 1%    1.09µs ± 0%   -3.24%  (p=0.000 n=9+8)
Match/Easy0i/1K-12                 32.5µs ± 0%    31.7µs ± 0%   -2.66%  (p=0.000 n=9+9)
Match/Easy0i/32K-12                1.59ms ± 0%    1.61ms ± 0%   +0.75%  (p=0.000 n=9+9)
Match/Easy0i/1M-12                 51.0ms ± 0%    51.4ms ± 0%   +0.77%  (p=0.000 n=10+8)
Match/Easy0i/32M-12                 1.63s ± 0%     1.65s ± 1%   +1.24%  (p=0.000 n=7+9)
Match/Easy1/32-12                  75.1ns ± 1%    67.9ns ± 0%   -9.54%  (p=0.000 n=8+8)
Match/Easy1/1K-12                   861ns ± 0%     884ns ± 0%   +2.71%  (p=0.000 n=8+9)
Match/Easy1/32K-12                 39.2µs ± 1%    39.2µs ± 0%     ~     (p=0.090 n=10+9)
Match/Easy1/1M-12                  1.38ms ± 0%    1.39ms ± 0%     ~     (p=0.095 n=10+9)
Match/Easy1/32M-12                 44.2ms ± 1%    44.2ms ± 1%     ~     (p=0.218 n=10+10)
Match/Medium/32-12                 1.04µs ± 1%    1.05µs ± 0%   +1.05%  (p=0.000 n=9+8)
Match/Medium/1K-12                 31.3µs ± 0%    31.3µs ± 0%   -0.14%  (p=0.004 n=9+9)
Match/Medium/32K-12                1.44ms ± 0%    1.45ms ± 0%   +0.18%  (p=0.001 n=8+8)
Match/Medium/1M-12                 46.1ms ± 0%    46.2ms ± 0%   +0.13%  (p=0.003 n=6+9)
Match/Medium/32M-12                 1.48s ± 0%     1.48s ± 0%   +0.20%  (p=0.002 n=9+8)
Match/Hard/32-12                   1.54µs ± 1%    1.49µs ± 0%   -3.60%  (p=0.000 n=9+10)
Match/Hard/1K-12                   46.4µs ± 1%    45.1µs ± 1%   -2.78%  (p=0.000 n=9+10)
Match/Hard/32K-12                  2.19ms ± 0%    2.18ms ± 1%   -0.51%  (p=0.006 n=8+9)
Match/Hard/1M-12                   70.1ms ± 0%    69.7ms ± 1%   -0.52%  (p=0.006 n=8+9)
Match/Hard/32M-12                   2.24s ± 0%     2.23s ± 1%   -0.42%  (p=0.046 n=8+9)
Match/Hard1/32-12                  8.17µs ± 1%    7.89µs ± 0%   -3.42%  (p=0.000 n=8+9)
Match/Hard1/1K-12                   254µs ± 2%     244µs ± 0%   -3.91%  (p=0.000 n=9+9)
Match/Hard1/32K-12                 9.58ms ± 1%   10.35ms ± 0%   +8.00%  (p=0.000 n=10+10)
Match/Hard1/1M-12                   306ms ± 1%     331ms ± 0%   +8.27%  (p=0.000 n=9+8)
Match/Hard1/32M-12                  9.79s ± 1%    10.60s ± 0%   +8.29%  (p=0.000 n=9+8)
Match_onepass_regex/32-12           808ns ± 0%     812ns ± 0%   +0.47%  (p=0.000 n=8+10)
Match_onepass_regex/1K-12          27.8µs ± 0%    28.5µs ± 0%   +2.32%  (p=0.000 n=8+10)
Match_onepass_regex/32K-12          925µs ± 0%     936µs ± 0%   +1.24%  (p=0.000 n=9+10)
Match_onepass_regex/1M-12          29.5ms ± 0%    30.2ms ± 0%   +2.38%  (p=0.000 n=10+10)
Match_onepass_regex/32M-12          945ms ± 0%     970ms ± 0%   +2.60%  (p=0.000 n=9+10)
CompileOnepass-12                  4.67µs ± 0%    4.63µs ± 1%   -0.84%  (p=0.000 n=10+10)
[Geo mean]                         24.5µs         23.3µs        -5.04%

https://perf.golang.org/search?q=upload:20181004.1

Change-Id: Idbc2b76223718265657819ff38be2d9aba1c54b4
Reviewed-on: https://go-review.googlesource.com/c/139779
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-12 17:48:34 +00:00
Russ Cox 872a547957 regexp: fix BenchmarkMatch_onepass_regex
This benchmark - in contrast to all other benchmarks - was
running the regexp match on 1-byte substrings of the input
instead of the entire input. Worse, it was doing so by preallocating
a slice of slices of every 1-byte substring. Needless to say,
this does not accurately reflect what happens when the regexp
matcher is given a large input.

Change-Id: Icd5b95f0e43f554a6b93164916745941366e03d6
Reviewed-on: https://go-review.googlesource.com/c/139778
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-11 18:01:17 +00:00
Russ Cox e2d70b8b4b regexp: simplify BenchmarkCompileOnepass
One benchmark is fine.
Having one per test case is overkill.

Change-Id: Id4ce789484dab1e79026bdd23cbcd63b2eaceb3f
Reviewed-on: https://go-review.googlesource.com/c/139777
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-11 18:01:06 +00:00
Alan Donovan 8c610aa633 regexp: fix incorrect name in Match doc comment
Change-Id: I628aad9a3abe9cc0c3233f476960e53bd291eca9
Reviewed-on: https://go-review.googlesource.com/135235
Reviewed-by: Ralph Corderoy <ralph@inputplus.co.uk>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-09-13 16:29:06 +00:00
Ian Lance Taylor 3396034155 regexp/syntax: don't do both linear and binary sesarch in MatchRunePos
MatchRunePos is a significant element of regexp performance, so some
attention to optimization is appropriate. Before this CL, a
non-matching rune would do both a linear search in the first four
entries, and a binary search over all the entries. Change the code to
optimize for the common case of two runes, to only do a linear search
when there are up to four entries, and to only do a binary search when
there are more than four entries.

Updates #26623

name                             old time/op    new time/op    delta
Find-12                             260ns ± 1%     275ns ± 7%   +5.84%  (p=0.000 n=8+10)
FindAllNoMatches-12                 144ns ± 9%     143ns ±12%     ~     (p=0.187 n=10+10)
FindString-12                       256ns ± 4%     254ns ± 1%     ~     (p=0.357 n=9+8)
FindSubmatch-12                     587ns ±12%     593ns ±11%     ~     (p=0.516 n=10+10)
FindStringSubmatch-12               534ns ±12%     525ns ±14%     ~     (p=0.565 n=10+10)
Literal-12                          104ns ±14%     106ns ±11%     ~     (p=0.145 n=10+10)
NotLiteral-12                      1.51µs ± 8%    1.47µs ± 2%     ~     (p=0.508 n=10+9)
MatchClass-12                      2.47µs ± 1%    2.26µs ± 6%   -8.55%  (p=0.000 n=8+10)
MatchClass_InRange-12              2.18µs ± 5%    2.25µs ±11%   +2.85%  (p=0.009 n=9+10)
ReplaceAll-12                      2.35µs ± 6%    2.08µs ±23%  -11.59%  (p=0.010 n=9+10)
AnchoredLiteralShortNonMatch-12    93.2ns ± 9%    93.2ns ±11%     ~     (p=0.716 n=10+10)
AnchoredLiteralLongNonMatch-12      118ns ±10%     117ns ± 9%     ~     (p=0.802 n=10+10)
AnchoredShortMatch-12               142ns ± 1%     141ns ± 1%   -0.53%  (p=0.007 n=8+8)
AnchoredLongMatch-12                303ns ± 9%     304ns ± 6%     ~     (p=0.724 n=10+10)
OnePassShortA-12                    620ns ± 1%     618ns ± 9%     ~     (p=0.162 n=8+10)
NotOnePassShortA-12                 599ns ± 8%     568ns ± 1%   -5.21%  (p=0.000 n=10+8)
OnePassShortB-12                    525ns ± 7%     489ns ± 1%   -6.93%  (p=0.000 n=10+8)
NotOnePassShortB-12                 449ns ± 9%     431ns ±11%   -4.05%  (p=0.033 n=10+10)
OnePassLongPrefix-12                119ns ± 6%     114ns ± 0%   -3.88%  (p=0.006 n=10+9)
OnePassLongNotPrefix-12             420ns ± 9%     410ns ± 7%     ~     (p=0.645 n=10+9)
MatchParallelShared-12              376ns ± 0%     375ns ± 0%   -0.45%  (p=0.003 n=8+10)
MatchParallelCopied-12             39.4ns ± 1%    39.1ns ± 0%   -0.55%  (p=0.004 n=10+9)
QuoteMetaAll-12                     139ns ± 7%     142ns ± 7%     ~     (p=0.445 n=10+10)
QuoteMetaNone-12                   56.7ns ± 0%    61.3ns ± 7%   +8.03%  (p=0.001 n=8+10)
Match/Easy0/32-12                  83.4ns ± 7%    83.1ns ± 8%     ~     (p=0.541 n=10+10)
Match/Easy0/1K-12                   417ns ± 8%     394ns ± 6%     ~     (p=0.059 n=10+9)
Match/Easy0/32K-12                 7.05µs ± 8%    7.30µs ± 9%     ~     (p=0.190 n=10+10)
Match/Easy0/1M-12                   291µs ±17%     284µs ±10%     ~     (p=0.481 n=10+10)
Match/Easy0/32M-12                 9.89ms ± 4%   10.27ms ± 8%     ~     (p=0.315 n=10+10)
Match/Easy0i/32-12                 1.13µs ± 1%    1.14µs ± 1%   +1.51%  (p=0.000 n=8+8)
Match/Easy0i/1K-12                 35.7µs ±11%    36.8µs ±10%     ~     (p=0.143 n=10+10)
Match/Easy0i/32K-12                1.70ms ± 7%    1.72ms ± 7%     ~     (p=0.776 n=9+6)

name                             old alloc/op   new alloc/op   delta
Find-12                             0.00B          0.00B          ~     (all equal)
FindAllNoMatches-12                 0.00B          0.00B          ~     (all equal)
FindString-12                       0.00B          0.00B          ~     (all equal)
FindSubmatch-12                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
FindStringSubmatch-12               32.0B ± 0%     32.0B ± 0%     ~     (all equal)

name                             old allocs/op  new allocs/op  delta
Find-12                              0.00           0.00          ~     (all equal)
FindAllNoMatches-12                  0.00           0.00          ~     (all equal)
FindString-12                        0.00           0.00          ~     (all equal)
FindSubmatch-12                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
FindStringSubmatch-12                1.00 ± 0%      1.00 ± 0%     ~     (all equal)

name                             old speed      new speed      delta
QuoteMetaAll-12                   101MB/s ± 8%    99MB/s ± 7%     ~     (p=0.529 n=10+10)
QuoteMetaNone-12                  458MB/s ± 0%   425MB/s ± 8%   -7.22%  (p=0.003 n=8+10)
Match/Easy0/32-12                 385MB/s ± 7%   386MB/s ± 7%     ~     (p=0.579 n=10+10)
Match/Easy0/1K-12                2.46GB/s ± 8%  2.60GB/s ± 6%     ~     (p=0.065 n=10+9)
Match/Easy0/32K-12               4.66GB/s ± 7%  4.50GB/s ±10%     ~     (p=0.190 n=10+10)
Match/Easy0/1M-12                3.63GB/s ±15%  3.70GB/s ± 9%     ~     (p=0.481 n=10+10)
Match/Easy0/32M-12               3.40GB/s ± 4%  3.28GB/s ± 8%     ~     (p=0.315 n=10+10)
Match/Easy0i/32-12               28.4MB/s ± 1%  28.0MB/s ± 1%   -1.50%  (p=0.000 n=8+8)
Match/Easy0i/1K-12               28.8MB/s ±10%  27.9MB/s ±11%     ~     (p=0.143 n=10+10)
Match/Easy0i/32K-12              19.0MB/s ±14%  19.1MB/s ± 8%     ~     (p=1.000 n=10+6)

Change-Id: I238a451b36ad84b0f5534ff0af5c077a0d52d73a
Reviewed-on: https://go-review.googlesource.com/130417
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-22 17:11:57 +00:00
Russ Cox f50448f531 regexp: reword Match documentation to be more like Find
Before:

  // Find returns a slice holding the text of the leftmost match in b of the regular expression.

  // Match checks whether a textual regular expression matches a byte slice.

After:

  // Match reports whether the byte slice b contains any match of the regular expression re.

The use of different wording for Find and Match always makes me think
that Match required the entire string to match while Find clearly allows
a substring to match.

This CL makes the Match wording correspond more closely to Find,
to try to avoid that confusion.

Change-Id: I97fb82d5080d3246ee5cf52abf28d2a2296a5039
Reviewed-on: https://go-review.googlesource.com/123736
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-07-13 18:52:46 +00:00
Russ Cox a41d21695c regexp: revert "use sync.Pool to cache regexp.machine objects"
Revert CL 101715.

The size of a sync.Pool scales linearly with GOMAXPROCS,
making it inappropriate to put a sync.Pool in any individually
allocated object, as the sync.Pool documentation explains.
The change also broke DeepEqual on regexps.

I have a cleaner way to do this with global sync.Pools but it's
too late in the cycle. Will revisit in Go 1.12. For now, revert.

Fixes #26219.

Change-Id: Ie632e709eb3caf489d85efceac0e4b130ec2019f
Reviewed-on: https://go-review.googlesource.com/122596
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-07-09 15:12:53 +00:00
Alex Myasoedov 0dc814cd7f regexp: examples for Regexp.FindIndex and Regexp.FindAllSubmatchIndex methods
This commit adds examples that demonstrate usage in a practical way.

Change-Id: I105baf610764c14a2c247cfc0b0c06f27888d377
Reviewed-on: https://go-review.googlesource.com/78635
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-30 01:04:30 +00:00