Commit Graph

24 Commits

Author SHA1 Message Date
Andrew Balholm 82e2272566 exp/html: detect "integration points" in SVG and MathML content
Detect HTML integration points and MathML text integration points.
At these points, process tokens as HTML, not as foreign content.

Pass 33 more tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6249044
2012-05-24 13:46:41 +10:00
Andrew Balholm e947eba291 exp/html: update test data
Import updated test data from the WebKit Subversion repository (SVN revision 118111).

Some of the old tests were failing because we were HTML5 compliant, but the tests weren't.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6228049
2012-05-24 10:35:31 +10:00
Andrew Balholm 33a89b5fda exp/html: adjust the last few insertion modes to match the spec
Handle text, comment, and doctype tokens in afterBodyIM, afterAfterBodyIM,
and afterAfterFramesetIM.

Pass three more tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6231043
2012-05-23 11:11:34 +10:00
Andrew Balholm 8f66d7dc32 exp/html: adjust inSelectIM to match spec
Simplify the flow of control.

Handle EOF, null bytes, <html>, <input>, <keygen>, <textarea>, <script>.

Pass 5 more tests.

R=golang-dev, rsc, nigeltao
CC=golang-dev
https://golang.org/cl/6220062
2012-05-22 15:30:13 +10:00
Andrew Balholm 7648f61c7d exp/html: adjust inCellIM to match spec
Clean up flow of control.

Ignore </table>, </tbody>, </tfoot>, </thead>, </tr> if there is not
an appropriate element in table scope.

Pass 3 more tests.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6206093
2012-05-22 10:31:08 +10:00
Andrew Balholm 4973c1fc7e exp/html: adjust inRowIM to match spec
Delete cases that just fall down to "anything else" action.

Handle </tbody>, </tfoot>, and </thead>.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6203061
2012-05-20 14:26:20 +10:00
Andrew Balholm a09e9811dc exp/html: adjust inTableBodyIM to match spec
Clean up flow of control.

Handle </tbody>, </tfoot>, and </thead>.

Pass 5 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6117057
2012-04-26 11:48:35 +10:00
Andrew Balholm dde8358a1c exp/html: adjust inTableIM to match spec
Don't foster-parent text nodes that consist only of whitespace.
(I implemented this entirely in inTableIM instead of creating an
inTableTextIM, because the sole purpose of inTableTextIM seems to be
to combine character tokens into a string, which our tokenizer does
already.)

Use parseImpliedToken to clarify a couple of cases.

Handle <style>, <script>, <input>, and <form>.

Ignore doctype tokens.

Pass 20 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6117048
2012-04-25 10:49:27 +10:00
Andrew Balholm b885633d62 exp/html: make inBodyIM match spec
This CL corrects the remaining differences that I could find between the
implementation of inBodyIM and the spec:

Handle <rp> and <rt>.

Adjust SVG and MathML attributes.

Reconstruct active formatting elements in the "any other start tag" case.

Pass 7 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6101055
2012-04-24 15:27:48 +10:00
Andrew Balholm 0cc8ee9808 exp/html: add more cases to inBodyIM
Don't set framesetOK to false for hidden input elements.

Handle <param>, <source>, <track>, <textarea>, <iframe>, <noembed>,
and <noscript>

Pass 7 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6094045
2012-04-22 16:19:21 +10:00
Andrew Balholm 904c7c8e99 exp/html: more work on inBodyIM
Reorder some cases.
Handle <pre>, <listing>, </form>, </li>, </dd>, </dt>, </h1>, </h2>,
</h3>, </h4>, </h5>, and </h6> tags.

Pass 6 additional tests.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6089043
2012-04-21 09:20:38 +10:00
Andrew Balholm eea5a432cb exp/html: start making inBodyIM match the spec
Reorder some start tags.

Improve handling of </body>.
Handle </html>.

Pass 2 additional tests (by handling </html>).

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6082043
2012-04-20 15:48:13 +10:00
Andrew Balholm 6791057296 exp/html: ignore null bytes in text
pass one additional test

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6048051
2012-04-20 14:25:42 +10:00
Andrew Balholm 7d63ff09a5 exp/html: improve afterHeadIM
Clean up the flow of control.
Fix the TODO for handling <html> tags.
Add a case to ignore doctype declarations.

Pass one additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6072047
2012-04-20 10:48:10 +10:00
Andrew Balholm fca32f02e9 exp/html: improve InHeadIM
Clean up the flow of control, and add a case for doctype tokens (to
ignore them).

R=nigeltao
CC=golang-dev
https://golang.org/cl/6069045
2012-04-20 09:08:58 +10:00
Andrew Balholm c88ca5906c exp/html: add parseImpliedToken method to parser
This method will allow us to be explicit about what we're doing when
we insert an implied token, and avoid repeating the logic involved in
multiple places.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6060048
2012-04-19 11:48:17 +10:00
Andrew Balholm b65c9a633e exp/html: improve beforeHeadIM
Add a case to ignore doctype tokens.

Clean up the flow of control to more clearly match the spec.

Pass one more test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6062047
2012-04-18 22:45:36 +10:00
Andrew Balholm b39bbf1e5b exp/html: adjust beforeHTMLIM to match spec
Add case for doctype tokens (which are ignored).

This CL does not change the status of any tests.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/6061047
2012-04-18 13:26:35 +10:00
Andrew Balholm 9a6cef8bbf exp/html: more comprehensive tests
Currently, the html package only runs a limited subset of the tests
in the testdata directory. This tends to limit development of the
parser to fixing the bug that causes the first failing test.

This CL gives it the ability to run all the tests and produce a
log showing the status of each test. (It does it when tests are run with
'go test --update-logs') The status is listed as PASS, FAIL, or PARSE
(PARSE means that parsing produced the correct tree, but rendering and
re-parsing does not produce the same tree).

When 'go test' is run without --update-logs, it runs the tests marked
'PASS' in the logs (and the parsing portion of the tests marked 'PARSE').
Thus it will fail if there has been a regression since the last
time the logs were updated.

My goal for this CL is to allow develoment of the html package to
be less test-driven, while still having the advantages of regression
tests. In other words, one can work on any portion of the parser
and quickly see whether he is breaking things or improving them.

Current statistics of the tests:
$ grep ^PASS *.log|wc -l
        1017
$ grep ^PARSE *.log|wc -l
          46
$ grep ^FAIL *.log|wc -l
         181

R=nigeltao
CC=golang-dev
https://golang.org/cl/6031049
2012-04-17 17:17:22 +10:00
Nigel Tao 6277656d69 html, exp/html: escape ' and " as &#39; and &#34;, since IE8 and
below do not support &apos;.

This makes package html consistent with package text/template's
HTMLEscape function.

Fixes #3489.

R=rsc, mikesamuel, dsymonds
CC=golang-dev
https://golang.org/cl/5992071
2012-04-12 09:35:43 +10:00
Robert Griesemer 7c6654aa70 all: fixed various typos
(Semi-automatically detected.)

R=golang-dev, remyoudompheng, r
CC=golang-dev
https://golang.org/cl/5715052
2012-03-01 14:56:05 -08:00
Rob Pike 5be24046c7 all: avoid bytes.NewBuffer(nil)
The practice encourages people to think this is the way to
create a bytes.Buffer when new(bytes.Buffer) or
just var buf bytes.Buffer work fine.
(html/token.go was missing the point altogether.)

R=golang-dev, bradfitz, r
CC=golang-dev
https://golang.org/cl/5637043
2012-02-06 14:09:00 +11:00
Russ Cox 2050a9e478 build: remove Make.pkg, Make.tool
Consequently, remove many package Makefiles,
and shorten the few that remain.

gomake becomes 'go tool make'.

Turn off test phases of run.bash that do not work,
flagged with $BROKEN.  Future CLs will restore these,
but this seemed like a big enough CL already.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5601057
2012-01-30 23:43:46 -05:00
Nigel Tao 324513bc5f html: move the HTML parser to an exp/html package. The parser is a
work in progress, and we are not ready to freeze its API for Go 1.

Package html still exists, containing just two functions: EscapeString
and UnescapeString.

Both the packages at exp/html and html are "package html". The former
is a superset of the latter.

At some point in the future, the exp/html code will move back into
html, once we have finalized the parser API.

R=rsc, dsymonds
CC=golang-dev
https://golang.org/cl/5571059
2012-01-25 10:54:59 +11:00