The content of an HTML <title> element is RCDATA, but the content of an SVG
<title> element is parsed as tags. Now the parser doesn't go into RCDATA
mode in foreign content.
Pass 4 additional tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6448111
If the body of an HTML document contains text, the <frameset> tag is
ignored. But not if the text is only whitespace.
Pass 4 additional tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6442043
This is part 1 of a 2 part changelist. Part 2 contains the mechanical
change to parse.go to compare atoms (ints) instead of strings.
The overall effect of the two changes are:
benchmark old ns/op new ns/op delta
BenchmarkParser 4462274 4058254 -9.05%
BenchmarkRawLevelTokenizer 913202 912917 -0.03%
BenchmarkLowLevelTokenizer 1268626 1267836 -0.06%
BenchmarkHighLevelTokenizer 1947305 1968944 +1.11%
R=rsc
CC=andybalholm, golang-dev, r
https://golang.org/cl/6305053
Implement the (3-per-family) Noah's Ark clause (i.e. don't put
more than three identical elements on the list of active formatting
elements.
Also, when running tests, sort attributes by name before dumping
them.
Pass 4 additional tests with Noah's Ark clause (including one
that needs attributes to be sorted).
Pass 5 additional, unrelated tests because of sorting attributes.
R=nigeltao, rsc
CC=golang-dev
https://golang.org/cl/6247056
Remove redundant checks for integration points.
Ignore null bytes in text.
Don't break out of foreign content for a <font> tag unless it
has a color, face, or size attribute.
Check for MathML text integration points when breaking out of
foreign content.
Pass two new tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6256045
Detect HTML integration points and MathML text integration points.
At these points, process tokens as HTML, not as foreign content.
Pass 33 more tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6249044
Handle text, comment, and doctype tokens in afterBodyIM, afterAfterBodyIM,
and afterAfterFramesetIM.
Pass three more tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6231043
Clean up flow of control.
Ignore </table>, </tbody>, </tfoot>, </thead>, </tr> if there is not
an appropriate element in table scope.
Pass 3 more tests.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6206093
Delete cases that just fall down to "anything else" action.
Handle </tbody>, </tfoot>, and </thead>.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6203061
Don't foster-parent text nodes that consist only of whitespace.
(I implemented this entirely in inTableIM instead of creating an
inTableTextIM, because the sole purpose of inTableTextIM seems to be
to combine character tokens into a string, which our tokenizer does
already.)
Use parseImpliedToken to clarify a couple of cases.
Handle <style>, <script>, <input>, and <form>.
Ignore doctype tokens.
Pass 20 additional tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6117048
This CL corrects the remaining differences that I could find between the
implementation of inBodyIM and the spec:
Handle <rp> and <rt>.
Adjust SVG and MathML attributes.
Reconstruct active formatting elements in the "any other start tag" case.
Pass 7 additional tests.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6101055
Clean up the flow of control.
Fix the TODO for handling <html> tags.
Add a case to ignore doctype declarations.
Pass one additional test.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6072047
This method will allow us to be explicit about what we're doing when
we insert an implied token, and avoid repeating the logic involved in
multiple places.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6060048
Add a case to ignore doctype tokens.
Clean up the flow of control to more clearly match the spec.
Pass one more test.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6062047
Add case for doctype tokens (which are ignored).
This CL does not change the status of any tests.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6061047
work in progress, and we are not ready to freeze its API for Go 1.
Package html still exists, containing just two functions: EscapeString
and UnescapeString.
Both the packages at exp/html and html are "package html". The former
is a superset of the latter.
At some point in the future, the exp/html code will move back into
html, once we have finalized the parser API.
R=rsc, dsymonds
CC=golang-dev
https://golang.org/cl/5571059