Commit Graph

59 Commits

Author SHA1 Message Date
Russ Cox 930cf59ba8 regexp/syntax: recognize category aliases like \p{Letter}
The Unicode specification defines aliases for some of the general
category names. For example the category "L" has alias "Letter".

The regexp package supports \p{L} but not \p{Letter}, because there
was nothing in the Unicode tables that lets regexp know about Letter.
Now that package unicode provides CategoryAliases (see #70780),
we can use it to provide \p{Letter} as well.

This is the only feature missing from making package regexp suitable
for use in a JSON-API Schema implementation. (The official test suite
includes usage of aliases like \p{Letter} instead of \p{L}.)

For better conformity with Unicode TR18, also accept case-insensitive
matches for names and ignore underscores, hyphens, and spaces;
and add Any, ASCII, and Assigned.

Fixes #70781.

Change-Id: I50ff024d99255338fa8d92663881acb47f1e92a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/641377
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
2025-04-18 14:13:38 -07:00
nlwkobe30 7cd0a4be5c all: omit unnecessary 0 in slice expression
All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go.

Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6
GitHub-Last-Rev: 794aa9b053
GitHub-Pull-Request: golang/go#69170
Reviewed-on: https://go-review.googlesource.com/c/go/+/609678
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-09-03 20:55:15 +00:00
Olivier Mengué 0611816216 regexp/syntax: cleanup code generation in perl_groups.go
Cleanup code generation of perl_groups.go:
* Fix the generated code header to follow the standard https://go.dev/s/generatedcode
* Apply gofmt as last step of code generation
* Add //go:generate lines in parse.go to trigger code generation
* Adapt make_perl_groups.pl to handle writing directly to the output
  file (as we can't use shell redirection in go:generate lines)
* use strict; use warnings;

Change-Id: I675241da03dd3f6facc9eb9de00999bd9203ad70
Reviewed-on: https://go-review.googlesource.com/c/go/+/555995
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-02 13:59:01 +00:00
Daniel Martí 754f870381 crypto/tls,regexp: remove always-nil error results
These were harmless, but added unnecessary verbosity to the code.
This can happen as a result of refactors: for example,
the method sessionState used to return errors in some cases.

Change-Id: I4e6dacc01ae6a49b528c672979f95cbb86795a85
Reviewed-on: https://go-review.googlesource.com/c/go/+/528995
Reviewed-by: Leo Isla <islaleo93@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Olivier Mengué <olivier.mengue@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
2024-03-29 22:22:45 +00:00
Daniel Martí 27c7a3dcc3 regexp/syntax: use the Regexp.Equal static method directly
A follow-up for the recent https://go.dev/cl/573978.

Change-Id: I0e75ca0b37d9ef063bbdfb88d4d2e34647e0ee50
Reviewed-on: https://go-review.googlesource.com/c/go/+/574677
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-29 16:41:51 +00:00
apocelipes 51a96f86f7 regexp/syntax: simplify the code
Use the slices package and the built-in max to simplify the code.
There's no noticeable performance change in this modification.

Change-Id: I96e46ba8ab1323f1ba0b8c9b827836e217772cf2
GitHub-Last-Rev: f0111ac7e2
GitHub-Pull-Request: golang/go#66511
Reviewed-on: https://go-review.googlesource.com/c/go/+/573978
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-26 19:50:03 +00:00
Oleksandr Redko 3e82b5ee0a regexp/syntax: use standard generated code header
Updates doc by running these commands:
  - mksyntaxgo from the google/re2 repo, which changes comment according
    to https://golang.org/s/generatedcode
  - gofmt -w regexp/syntax/doc.go

Change-Id: I66a9dd9fa841cbce899ab3aa32d7face798d2920
Reviewed-on: https://go-review.googlesource.com/c/go/+/572275
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-03-19 11:21:02 +00:00
cui fliter 65172c2036 regexp: add available godoc link
Signed-off-by: cui fliter <imcusg@gmail.com>
Change-Id: I85339293d4cfb691125f991ec7162e9be186efdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/539599
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-02-26 20:50:01 +00:00
Mauri de Souza Meneguzzo c4d55ab912 regexp/syntax: regenerate docs with mksyntaxgo
This makes the docs up-to-date by running doc/mksyntaxgo from the google/re2 repo.

Change-Id: I80358eed071e7566c85edaeb1cc5514a6d8c37a7
GitHub-Last-Rev: 0f8c8df4f2
GitHub-Pull-Request: golang/go#65249
Reviewed-on: https://go-review.googlesource.com/c/go/+/558136
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-02-20 14:57:37 +00:00
qiulaidongfeng 3bd30298dd regexp/syntax: use min func
Change-Id: I679c906057577d4a795c07a2f572b969c3ee14d5
GitHub-Last-Rev: fba371d2d6
GitHub-Pull-Request: golang/go#63350
Reviewed-on: https://go-review.googlesource.com/c/go/+/532218
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-03 19:49:07 +00:00
Jes Cok 267323ef2d all: calculate the median uniformly
Like sort.Search, use "h := int(uint(i+j) >> 1)" style code to calculate
the median.

Change-Id: Ifb1a19dde1c6ed6c1654bc642fc9565a8b6c5fc4
GitHub-Last-Rev: e2213b7388
GitHub-Pull-Request: golang/go#62503
Reviewed-on: https://go-review.googlesource.com/c/go/+/526496
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-09-09 01:46:03 +00:00
Russ Cox 98c9f271d6 regexp/syntax: use more compact Regexp.String output
Compact the Regexp.String output. It was only ever intended for debugging,
but there are at least some uses in the wild where regexps are built up
using regexp/syntax and then formatted using the String method.
Compact the output to help that use case. Specifically:

 - Compact 2-element character class ranges: [a-b] -> [ab].
 - Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E).

Fixes #57950.

Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
2023-08-16 16:02:30 +00:00
Mauri de Souza Meneguzzo ee61186b33 regexp/syntax: accept (?<name>...) syntax as valid capture
Currently the only named capture supported by regexp is (?P<name>a).

The syntax (?<name>a) is also widely used and there is currently an effort from
 the Rust regex and RE2 teams to also accept this syntax.

Fixes #58458

Change-Id: If22d44d3a5c4e8133ec68238ab130c151ca7c5c5
GitHub-Last-Rev: 31b50e6ab4
GitHub-Pull-Request: golang/go#61624
Reviewed-on: https://go-review.googlesource.com/c/go/+/513838
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-07-31 19:52:57 +00:00
Ian Lance Taylor 9a0c506a4e all: re-run stringer
Re-run all go:generate stringer commands. This mostly adds checks
that the constant values did not change, but does add new strings
for the debug/dwarf and internal/pkgbits packages.

Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/483717
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-04-11 20:24:07 +00:00
Ludi Rehak 82bf12902f regexp/syntax: test for lowercase letters first in IsWordChar
Lowercase letters occur more frequently than uppercase letters
in English text.  In IsWordChar, evaluate the most common case
(lowercase letters) first to minimize the expected value of its
execution time. Code clarity does not suffer by rearranging the
order of the checks.

Add a benchmark on a sentence demonstrating the performance
improvement.

name           old time/op  new time/op  delta
IsWordChar-10   122ns ± 0%   114ns ± 1%  -6.68%  (p=0.000 n=8+10)

Change-Id: Ieee8126a4bd8ee8703905b4f75724623029f6fa2
Reviewed-on: https://go-review.googlesource.com/c/go/+/404100
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
2023-03-14 04:39:42 +00:00
cuiweixie 581a822a9e regexp: add ErrLarge error
For #56041

Change-Id: I6c98458b5c0d3b3636a53ee04fc97221f3fd8bbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/444817
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: xie cui <523516579@qq.com>
2022-11-02 18:15:21 +00:00
Russ Cox c3c4aea55b regexp: limit size of parsed regexps
Set a 128 MB limit on the amount of space used by []syntax.Inst
in the compiled form corresponding to a given regexp.

Also set a 128 MB limit on the rune storage in the *syntax.Regexp
tree itself.

Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue.

Fixes CVE-2022-41715.
Fixes #55949.

Change-Id: Ia656baed81564436368cf950e1c5409752f28e1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/439356
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Roland Shoemaker <roland@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
2022-10-05 20:39:49 +00:00
cui fliter 1888875182 regexp: fix a few function names on comments
Change-Id: I192dd34c677e52e16f0ef78e1dae58a78f6d1aac
GitHub-Last-Rev: 1638a74689
GitHub-Pull-Request: golang/go#55967
Reviewed-on: https://go-review.googlesource.com/c/go/+/436885
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-10-02 02:31:23 +00:00
Ludi Rehak e7b0559448 regexp/syntax: fix typo in comment
Fix typo in comment describing IsWordChar.

Change-Id: Ia283813cf5662e218ee6d0411fb0c1b1ad1021f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/393435
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-29 01:00:55 +00:00
Ian Lance Taylor 0bf545e51f regexp/syntax: rename ErrInvalidDepth to ErrNestingDepth
The proposal accepted the name ErrNestingDepth.

For #51684

Change-Id: I702365f19e5e1889dbcc3c971eecff68e0b08727
Reviewed-on: https://go-review.googlesource.com/c/go/+/401854
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 22:35:03 +00:00
Ian Lance Taylor 575fd8817a regexp: change ErrInvalidDepth message to match proposal
Also update the file in $GOROOT/api/next to use proposal number.

For #51684

Change-Id: I28bfa6bc1cee98a17b13da196d41cda34d968bb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/401076
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 00:10:17 +00:00
Russ Cox 19309779ac all: gofmt main repo
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]

Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.

For #51082.

Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-11 16:34:30 +00:00
Russ Cox 81431c7aa7 all: replace `` and '' with “ (U+201C) and ” (U+201D) in doc comments
go/doc in all its forms applies this replacement when rendering
the comments. We are considering formatting doc comments,
including doing this replacement as part of the formatting.
Apply it to our source files ahead of time.

For #51082.

Change-Id: Ifcc1f5861abb57c5d14e7d8c2102dfb31b7a3a19
Reviewed-on: https://go-review.googlesource.com/c/go/+/384262
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-05 17:52:29 +00:00
Russ Cox 1af60b2f49 regexp/syntax: add and use ErrInvalidDepth
The fix for #51112 introduced a depth check but used
ErrInternalError to avoid introduce new API in a CL that
would be backported to earlier releases.

New API accepted in proposal #51684.

This CL adds a distinct error for this case.

For #51112.
Fixes #51684.

Change-Id: I068fc70aafe4218386a06103d9b7c847fb7ffa65
Reviewed-on: https://go-review.googlesource.com/c/go/+/384617
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-04 10:59:27 +00:00
Russ Cox 690ac4071f all: remove trailing blank doc comment lines
A future change to gofmt will rewrite

	// Doc comment.
	//
	func f()

to

	// Doc comment.
	func f()

Apply that change preemptively to all doc comments.

For #51082.

Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:07 +00:00
Russ Cox 452f24ae94 regexp/syntax: reject very deeply nested regexps in Parse
The regexp code assumes it can recurse over the structure of
a regexp safely. Go's growable stacks make that reasonable
for all plausible regexps, but implausible ones can reach the
“infinite recursion?” stack limit.

This CL limits the depth of any parsed regexp to 1000.
That is, the depth of the parse tree is required to be ≤ 1000.
Regexps that require deeper parse trees will return ErrInternalError.
A future CL will change the error to ErrInvalidDepth,
but using ErrInternalError for now avoids introducing new API
in point releases when this is backported.

Fixes #51112.

Change-Id: I97d2cd82195946eb43a4ea8561f5b95f91fb14c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/384616
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-02-10 15:23:05 +00:00
Russ Cox 702e337174 regexp: document and implement that invalid UTF-8 bytes are the same as U+FFFD
What should it mean to run a regexp match on invalid UTF-8 bytes?
The coherent behavior options are:

1. Invalid UTF-8 does not match any character classes,
   nor a U+FFFD literal (nor \x{fffd}).
2. Each byte of invalid UTF-8 is treated identically to a U+FFFD in the input,
   as a utf8.DecodeRune loop might.

RE2 uses Rule 1.
Because it works byte at a time, it can also provide \C to match any
single byte of input, which matches invalid UTF-8 as well.
This provides the nice property that a match for a regexp without \C
is guaranteed to be valid UTF-8.

Unfortunately, today Go has an incoherent mix of these two, although
mostly Rule 2. This is a deviation from RE2, and it gives up the nice
property, but we probably can't correct that at this point.
In particular .* already matches entire inputs today, valid UTF-8 or
not, and I doubt we can break that.

This CL adopts Rule 2 officially, fixing the few places that deviate from it.

Fixes #48749.

Change-Id: I96402527c5dfb1146212f568ffa09dde91d71244
Reviewed-on: https://go-review.googlesource.com/c/go/+/354569
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2021-10-11 15:28:50 +00:00
Russ Cox 4d8db00641 all: use bytes.Cut, strings.Cut
Many uses of Index/IndexByte/IndexRune/Split/SplitN
can be written more clearly using the new Cut functions.
Do that. Also rewrite to other functions if that's clearer.

For #46336.

Change-Id: I68d024716ace41a57a8bf74455c62279bde0f448
Reviewed-on: https://go-review.googlesource.com/c/go/+/351711
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-10-06 15:53:04 +00:00
Russ Cox 2a61b3c590 regexp: fix repeat of preferred empty match
In Perl mode, (|a)* should match an empty string at the start of the
input. Instead it matches as many a's as possible.
Because (|a)+ is handled correctly, matching only an empty string,
this leads to the paradox that e* can match more text than e+
(for e = (|a)) and that e+ is sometimes different from ee*.

This is a very old bug that ultimately derives from the picture I drew
for e* in https://swtch.com/~rsc/regexp/regexp1.html. The picture is
correct for longest-match (POSIX) regexps but subtly wrong for
preferred-match (Perl) regexps in the case where e has a preferred
empty match. Pointed out by Andrew Gallant in private mail.

The current code treats e* and e+ as the same structure, with
different entry points. In the case of e* the preference list ends up
not quite in the right order, in part because the “before e” and
“after e” states are the same state. Splitting them apart fixes the
preference list, and that can be done by compiling e* as if it were
(e+)?.

Like with any bug fix, there is a very low chance of breaking a
program that accidentally depends on the buggy behavior.

RE2, Go, and Rust all have this bug, and we've all agreed to fix it,
to keep the implementations in sync.

Fixes #46123.

Change-Id: I70e742e71e0a23b626593b16ddef3c1e73b413b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/318750
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-05-13 14:52:20 +00:00
Tom Payne 1d3baf20dc regexp/syntax: add note about Unicode character classes
As proposed on golang-nuts:
https://groups.google.com/g/golang-nuts/c/M3lmSUptExQ/m/hRySV9GsCAAJ

Includes the latest updates from re2's mksyntaxgo:
https://code.googlesource.com/re2/+/refs/heads/master/doc/mksyntaxgo

Change-Id: Ib7b79aa6531f473feabd0a7f1d263cd65c4388e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/264678
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2020-11-25 15:10:09 +00:00
Andy Balholm 0ce1ffc49d regexp/syntax: append patchLists in constant time
By keeping a tail pointer, we can append to a patchList in constant
time, rather than in time proportional to the length of the list. This
gets rid of the quadratic compile times we were seeing for long series
of alternations.

This is basically the same change as
e9d517989f.

Fixes #39542.

Change-Id: Ib4ca0ca9c55abd1594df1984653c7d311ccf7572
Reviewed-on: https://go-review.googlesource.com/c/go/+/238079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-06-18 00:40:11 +00:00
Russ Cox 4d9ecde30a regexp/syntax: fix comment on p.literal and simplify
p.literal's doc comment said it returned a value but it doesn't.
While we're here, p.newLiteral is only called from p.literal,
so simplify the code by merging the two.

Change-Id: Ia357937a99f4e7473f0f1ec837113a39eaeb83d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/222659
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-04-17 22:12:02 +00:00
Keegan Carruthers-Smith 648c7b592a regexp/syntax: exclude full range from String negation case
If the char class is 0x0-0x10ffff we mistakenly would String that to `[^]`,
which is not a valid regex.

Fixes #31807

Change-Id: I9ceeaddc28b67b8e1de12b6703bcb124cc784556
Reviewed-on: https://go-review.googlesource.com/c/go/+/175679
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-22 04:43:25 +00:00
Brad Fitzpatrick 3813edf26e all: use "reports whether" consistently in the few places that didn't
Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns true if")

This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.

(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)

Created with:

$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)

Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-02 22:47:58 +00:00
Ian Lance Taylor 3396034155 regexp/syntax: don't do both linear and binary sesarch in MatchRunePos
MatchRunePos is a significant element of regexp performance, so some
attention to optimization is appropriate. Before this CL, a
non-matching rune would do both a linear search in the first four
entries, and a binary search over all the entries. Change the code to
optimize for the common case of two runes, to only do a linear search
when there are up to four entries, and to only do a binary search when
there are more than four entries.

Updates #26623

name                             old time/op    new time/op    delta
Find-12                             260ns ± 1%     275ns ± 7%   +5.84%  (p=0.000 n=8+10)
FindAllNoMatches-12                 144ns ± 9%     143ns ±12%     ~     (p=0.187 n=10+10)
FindString-12                       256ns ± 4%     254ns ± 1%     ~     (p=0.357 n=9+8)
FindSubmatch-12                     587ns ±12%     593ns ±11%     ~     (p=0.516 n=10+10)
FindStringSubmatch-12               534ns ±12%     525ns ±14%     ~     (p=0.565 n=10+10)
Literal-12                          104ns ±14%     106ns ±11%     ~     (p=0.145 n=10+10)
NotLiteral-12                      1.51µs ± 8%    1.47µs ± 2%     ~     (p=0.508 n=10+9)
MatchClass-12                      2.47µs ± 1%    2.26µs ± 6%   -8.55%  (p=0.000 n=8+10)
MatchClass_InRange-12              2.18µs ± 5%    2.25µs ±11%   +2.85%  (p=0.009 n=9+10)
ReplaceAll-12                      2.35µs ± 6%    2.08µs ±23%  -11.59%  (p=0.010 n=9+10)
AnchoredLiteralShortNonMatch-12    93.2ns ± 9%    93.2ns ±11%     ~     (p=0.716 n=10+10)
AnchoredLiteralLongNonMatch-12      118ns ±10%     117ns ± 9%     ~     (p=0.802 n=10+10)
AnchoredShortMatch-12               142ns ± 1%     141ns ± 1%   -0.53%  (p=0.007 n=8+8)
AnchoredLongMatch-12                303ns ± 9%     304ns ± 6%     ~     (p=0.724 n=10+10)
OnePassShortA-12                    620ns ± 1%     618ns ± 9%     ~     (p=0.162 n=8+10)
NotOnePassShortA-12                 599ns ± 8%     568ns ± 1%   -5.21%  (p=0.000 n=10+8)
OnePassShortB-12                    525ns ± 7%     489ns ± 1%   -6.93%  (p=0.000 n=10+8)
NotOnePassShortB-12                 449ns ± 9%     431ns ±11%   -4.05%  (p=0.033 n=10+10)
OnePassLongPrefix-12                119ns ± 6%     114ns ± 0%   -3.88%  (p=0.006 n=10+9)
OnePassLongNotPrefix-12             420ns ± 9%     410ns ± 7%     ~     (p=0.645 n=10+9)
MatchParallelShared-12              376ns ± 0%     375ns ± 0%   -0.45%  (p=0.003 n=8+10)
MatchParallelCopied-12             39.4ns ± 1%    39.1ns ± 0%   -0.55%  (p=0.004 n=10+9)
QuoteMetaAll-12                     139ns ± 7%     142ns ± 7%     ~     (p=0.445 n=10+10)
QuoteMetaNone-12                   56.7ns ± 0%    61.3ns ± 7%   +8.03%  (p=0.001 n=8+10)
Match/Easy0/32-12                  83.4ns ± 7%    83.1ns ± 8%     ~     (p=0.541 n=10+10)
Match/Easy0/1K-12                   417ns ± 8%     394ns ± 6%     ~     (p=0.059 n=10+9)
Match/Easy0/32K-12                 7.05µs ± 8%    7.30µs ± 9%     ~     (p=0.190 n=10+10)
Match/Easy0/1M-12                   291µs ±17%     284µs ±10%     ~     (p=0.481 n=10+10)
Match/Easy0/32M-12                 9.89ms ± 4%   10.27ms ± 8%     ~     (p=0.315 n=10+10)
Match/Easy0i/32-12                 1.13µs ± 1%    1.14µs ± 1%   +1.51%  (p=0.000 n=8+8)
Match/Easy0i/1K-12                 35.7µs ±11%    36.8µs ±10%     ~     (p=0.143 n=10+10)
Match/Easy0i/32K-12                1.70ms ± 7%    1.72ms ± 7%     ~     (p=0.776 n=9+6)

name                             old alloc/op   new alloc/op   delta
Find-12                             0.00B          0.00B          ~     (all equal)
FindAllNoMatches-12                 0.00B          0.00B          ~     (all equal)
FindString-12                       0.00B          0.00B          ~     (all equal)
FindSubmatch-12                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
FindStringSubmatch-12               32.0B ± 0%     32.0B ± 0%     ~     (all equal)

name                             old allocs/op  new allocs/op  delta
Find-12                              0.00           0.00          ~     (all equal)
FindAllNoMatches-12                  0.00           0.00          ~     (all equal)
FindString-12                        0.00           0.00          ~     (all equal)
FindSubmatch-12                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
FindStringSubmatch-12                1.00 ± 0%      1.00 ± 0%     ~     (all equal)

name                             old speed      new speed      delta
QuoteMetaAll-12                   101MB/s ± 8%    99MB/s ± 7%     ~     (p=0.529 n=10+10)
QuoteMetaNone-12                  458MB/s ± 0%   425MB/s ± 8%   -7.22%  (p=0.003 n=8+10)
Match/Easy0/32-12                 385MB/s ± 7%   386MB/s ± 7%     ~     (p=0.579 n=10+10)
Match/Easy0/1K-12                2.46GB/s ± 8%  2.60GB/s ± 6%     ~     (p=0.065 n=10+9)
Match/Easy0/32K-12               4.66GB/s ± 7%  4.50GB/s ±10%     ~     (p=0.190 n=10+10)
Match/Easy0/1M-12                3.63GB/s ±15%  3.70GB/s ± 9%     ~     (p=0.481 n=10+10)
Match/Easy0/32M-12               3.40GB/s ± 4%  3.28GB/s ± 8%     ~     (p=0.315 n=10+10)
Match/Easy0i/32-12               28.4MB/s ± 1%  28.0MB/s ± 1%   -1.50%  (p=0.000 n=8+8)
Match/Easy0i/1K-12               28.8MB/s ±10%  27.9MB/s ±11%     ~     (p=0.143 n=10+10)
Match/Easy0i/32K-12              19.0MB/s ±14%  19.1MB/s ± 8%     ~     (p=1.000 n=10+6)

Change-Id: I238a451b36ad84b0f5534ff0af5c077a0d52d73a
Reviewed-on: https://go-review.googlesource.com/130417
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-22 17:11:57 +00:00
Tim Cooper 161874da2a all: update comment URLs from HTTP to HTTPS, where possible
Each URL was manually verified to ensure it did not serve up incorrect
content.

Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2018-06-01 21:52:00 +00:00
Ian Lance Taylor 9967582f77 regexp/syntax: update perl script to preserve \s behavior
Incorporate https://code-review.googlesource.com/#/c/re2/+/3050/ from
the re2 repository. Description of that change:

    Preserve the original behaviour of \s.

    Prior to Perl 5.18, \s did not match vertical tab. Bake that into
    make_perl_groups.pl as an override so that perl_groups.cc retains
    its current definitions when rebuilt with newer versions of Perl.

This fixes make_perl_groups.pl to generate an unchanged perl_groups.go
with perl versions 5.18 and later.

Fixes #22057

Change-Id: I9a56e9660092ed6c1ff1045b4a3847de355441a7
Reviewed-on: https://go-review.googlesource.com/103517
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-29 22:31:57 +00:00
Brad Fitzpatrick 48db2c01b4 all: use strings.Builder instead of bytes.Buffer where appropriate
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test
files. I skipped a few on purpose and probably missed a few others,
but otherwise I think this should be most of them.

Updates #18990

Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774
Reviewed-on: https://go-review.googlesource.com/102479
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-26 23:05:53 +00:00
Daniel Martí 8da180f6ca all: remove some unused return parameters
As found by unparam. Picked the low-hanging fruit, consisting only of
errors that were always nil and results that were never used. Left out
those that were useful for consistency with other func signatures.

Change-Id: I06b52bbd3541f8a5d66659c909bd93cb3e172018
Reviewed-on: https://go-review.googlesource.com/102418
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-24 19:44:47 +00:00
Daniel Martí d4b2168b23 regexp/syntax: make Op an fmt.Stringer
Using stringer.

Fixes #22684.

Change-Id: I62fbde5dcb337cf269686615616bd39a27491ac1
Reviewed-on: https://go-review.googlesource.com/95355
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-02-20 15:33:50 +00:00
Joe Tsai b53088a634 Revert "go/printer: forbid empty line before first comment in block"
This reverts commit 08f19bbde1.

Reason for revert:
The changed transformation takes effect on a larger set
of code snippets than expected.

For example, this:
    func foo() {

        // Comment
        bar()

    }
becomes:
    func foo() {
        // Comment
        bar()

    }

This is an unintended consequence.

Change-Id: Ifca88d6267dab8a8170791f7205124712bf8ace8
Reviewed-on: https://go-review.googlesource.com/81335
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-12-01 01:12:26 +00:00
Joe Tsai 08f19bbde1 go/printer: forbid empty line before first comment in block
To improve readability when exported fields are removed,
forbid the printer from emitting an empty line before the first comment
in a const, var, or type block.
Also, when printing the "Has filtered or unexported fields." message,
add an empty line before it to separate the message from the struct
or interfact contents.

Before the change:
<<<
type NamedArg struct {

        // Name is the name of the parameter placeholder.
        //
        // If empty, the ordinal position in the argument list will be
        // used.
        //
        // Name must omit any symbol prefix.
        Name string

        // Value is the value of the parameter.
        // It may be assigned the same value types as the query
        // arguments.
        Value interface{}
        // contains filtered or unexported fields
}
>>>

After the change:
<<<
type NamedArg struct {
        // Name is the name of the parameter placeholder.
        //
        // If empty, the ordinal position in the argument list will be
        // used.
        //
        // Name must omit any symbol prefix.
        Name string

        // Value is the value of the parameter.
        // It may be assigned the same value types as the query
        // arguments.
        Value interface{}

        // contains filtered or unexported fields
}
>>>

Fixes #18264

Change-Id: I9fe17ca39cf92fcdfea55064bd2eaa784ce48c88
Reviewed-on: https://go-review.googlesource.com/71990
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-11-02 18:17:22 +00:00
Sylvain Zimmer 67da597312 regexp: Remove duplicated function wordRune()
Fixes #21742

Change-Id: Ib56b092c490c27a4ba7ebdb6391f1511794710b8
Reviewed-on: https://go-review.googlesource.com/61034
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2017-09-08 23:39:37 +00:00
Daniel Martí ad74e450ca regexp/syntax: remove unused flags parameter
Found by github.com/mvdan/unparam.

Change-Id: I186d2afd067e97eb05d65c4599119b347f82867d
Reviewed-on: https://go-review.googlesource.com/37840
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-06 19:11:09 +00:00
Marcel van Lohuizen a2a4db7375 unicode: upgrade to version 9.0.0
Changes beyond generated tables:
- Now supports aliases to handle deprecated
  property classes.
- Some Mongolian letters are now modifiers.

Other changes:
- strconv: newly generated table to be in sync
- regexp/syntax: updated maxFold

Fixes #16191

Change-Id: I56bdf21ee2f775f2a82d0465b3772faf5c24cb61
Reviewed-on: https://go-review.googlesource.com/24496
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-06-28 15:08:11 +00:00
Brad Fitzpatrick cdcb8271a4 regexp/syntax: clarify that \Z means Perl's \Z
Fixes #14793

Change-Id: I408056d096cd6a999fa5e349704b5ea8e26d2e4e
Reviewed-on: https://go-review.googlesource.com/23201
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-18 04:43:32 +00:00
Emmanuel Odeke 53fd522c0d all: make copyright headers consistent with one space after period
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-02 13:43:18 +00:00
Brad Fitzpatrick 5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Brad Fitzpatrick 519474451a all: make copyright headers consistent with one space after period
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.

Go policy has been single space after periods in comments for some time.

The copyright header template at:

    https://golang.org/doc/contribute.html#copyright

also uses a single space.

Make them all consistent.

Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 23:34:33 +00:00
Nathan VanBenschoten b04f3b06ec all: replace strings.Index with strings.Contains where possible
Change-Id: Ia613f1c37bfce800ece0533a5326fca91d99a66a
Reviewed-on: https://go-review.googlesource.com/18120
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
2016-02-19 01:06:05 +00:00