Previously, Next would call either nextText or nextTag, but nextTag
could also call nextText. Both nextText and nextTag were responsible
for detecting "</a" end tags and "<!" comments. This change simplifies
the call chain and puts that responsibility in a single place.
R=andybalholm
CC=golang-dev
https://golang.org/cl/5263050
Previously, the tokenizer made two passes per token. The first pass
established the token boundary. The second pass picked out the tag name
and attributes inside that boundary. This was problematic when the two
passes disagreed. For example, "<p id=can't><p id=won't>" caused an
infinite loop because the first pass skipped everything inside the
single quotes, and recognized only one token, but the second pass never
got past the first '>'.
This change rewrites the tokenizer to use one pass, accumulating the
boundary points of token text, tag names, attribute keys and attribute
values as it looks for the token endpoint.
It should still be reasonably efficient: text, names, keys and values
are not lower-cased or unescaped (and converted from []byte to string)
until asked for.
One of the token_test test cases was fixed to be consistent with
html5lib. Three more test cases were temporarily disabled, and will be
re-enabled in a follow-up CL. All the parse_test test cases pass.
R=andybalholm, gri
CC=golang-dev
https://golang.org/cl/5244061
This continues the work in revision 914a659b44ff, now passing more test
cases. As before, the new tokenization tests match html5lib's behavior.
Fixes#2124.
R=dsymonds, r
CC=golang-dev
https://golang.org/cl/4867042
Change the signature of Split to have no count,
assuming a full split, and rename the existing
Split with a count to SplitN.
Do the same to package bytes.
Add a gofix module.
R=adg, dsymonds, alex.brainman, rsc
CC=golang-dev
https://golang.org/cl/4661051
I'm not sure if it's 100% correct wrt the HTML5 specification,
but the test suite has plenty of HTML comment test cases, and
we'll shake out any tokenization bugs as the parser improves its
coverage.
R=gri
CC=golang-dev
https://golang.org/cl/4186055