mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Andrew Balholm	74db9d298b	exp/html: don't treat SVG <title> like HTML <title> The content of an HTML <title> element is RCDATA, but the content of an SVG <title> element is parsed as tags. Now the parser doesn't go into RCDATA mode in foreign content. Pass 4 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6448111	2012-08-05 22:32:35 +10:00
Andrew Balholm	eff32f573b	exp/html: replace NUL with U+FFFD in text in foreign content Pass 5 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6452055	2012-07-29 16:29:49 +10:00
Andrew Balholm	a1f340fa1a	exp/html: parse CDATA sections in foreign content Also convert NUL to U+FFFD in comments. Pass 23 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6446055	2012-07-27 16:05:25 +10:00
Andrew Balholm	899be50991	exp/html: don't insert empty text nodes Pass 1 additional test. R=nigeltao CC=golang-dev https://golang.org/cl/6443048	2012-07-26 10:32:24 +10:00
Andrew Balholm	4d22519678	exp/html: allow frameset if body contains whitespace If the body of an HTML document contains text, the <frameset> tag is ignored. But not if the text is only whitespace. Pass 4 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6442043	2012-07-25 12:09:58 +10:00
Nigel Tao	66429dcf75	exp/html: simplify some of the parser's internal methods. benchmark old ns/op new ns/op delta BenchmarkParser 4006888 3950604 -1.40% R=r, andybalholm CC=golang-dev https://golang.org/cl/6301070	2012-06-13 10:13:05 +10:00
Nigel Tao	6c204982e0	exp/html: check the context node for consistency when parsing fragments. R=rsc CC=golang-dev https://golang.org/cl/6303053	2012-06-08 13:55:15 +10:00
Nigel Tao	c8fac7b967	exp/html: when parsing, compare atoms (ints) instead of strings. This is the mechanical part of the 2-part change that started with https://golang.org/cl/6305053/ R=rsc CC=andybalholm, golang-dev, r https://golang.org/cl/6295055	2012-06-07 13:46:57 +10:00
Nigel Tao	cd21eff705	exp/html: make the tokenizer return atoms for tag tokens. This is part 1 of a 2 part changelist. Part 2 contains the mechanical change to parse.go to compare atoms (ints) instead of strings. The overall effect of the two changes are: benchmark old ns/op new ns/op delta BenchmarkParser 4462274 4058254 -9.05% BenchmarkRawLevelTokenizer 913202 912917 -0.03% BenchmarkLowLevelTokenizer 1268626 1267836 -0.06% BenchmarkHighLevelTokenizer 1947305 1968944 +1.11% R=rsc CC=andybalholm, golang-dev, r https://golang.org/cl/6305053	2012-06-07 13:05:35 +10:00
Andrew Balholm	9c14184e25	exp/html: implement Noah's Ark clause Implement the (3-per-family) Noah's Ark clause (i.e. don't put more than three identical elements on the list of active formatting elements. Also, when running tests, sort attributes by name before dumping them. Pass 4 additional tests with Noah's Ark clause (including one that needs attributes to be sorted). Pass 5 additional, unrelated tests because of sorting attributes. R=nigeltao, rsc CC=golang-dev https://golang.org/cl/6247056	2012-05-29 13:39:54 +10:00
Andrew Balholm	c23041efd9	exp/html: adjust parseForeignContent to match spec Remove redundant checks for integration points. Ignore null bytes in text. Don't break out of foreign content for a <font> tag unless it has a color, face, or size attribute. Check for MathML text integration points when breaking out of foreign content. Pass two new tests. R=nigeltao CC=golang-dev https://golang.org/cl/6256045	2012-05-25 10:03:59 +10:00
Andrew Balholm	82e2272566	exp/html: detect "integration points" in SVG and MathML content Detect HTML integration points and MathML text integration points. At these points, process tokens as HTML, not as foreign content. Pass 33 more tests. R=nigeltao CC=golang-dev https://golang.org/cl/6249044	2012-05-24 13:46:41 +10:00
Andrew Balholm	33a89b5fda	exp/html: adjust the last few insertion modes to match the spec Handle text, comment, and doctype tokens in afterBodyIM, afterAfterBodyIM, and afterAfterFramesetIM. Pass three more tests. R=nigeltao CC=golang-dev https://golang.org/cl/6231043	2012-05-23 11:11:34 +10:00
Andrew Balholm	8f66d7dc32	exp/html: adjust inSelectIM to match spec Simplify the flow of control. Handle EOF, null bytes, <html>, <input>, <keygen>, <textarea>, <script>. Pass 5 more tests. R=golang-dev, rsc, nigeltao CC=golang-dev https://golang.org/cl/6220062	2012-05-22 15:30:13 +10:00
Andrew Balholm	7648f61c7d	exp/html: adjust inCellIM to match spec Clean up flow of control. Ignore </table>, </tbody>, </tfoot>, </thead>, </tr> if there is not an appropriate element in table scope. Pass 3 more tests. R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6206093	2012-05-22 10:31:08 +10:00
Andrew Balholm	4973c1fc7e	exp/html: adjust inRowIM to match spec Delete cases that just fall down to "anything else" action. Handle </tbody>, </tfoot>, and </thead>. R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6203061	2012-05-20 14:26:20 +10:00
Andrew Balholm	a09e9811dc	exp/html: adjust inTableBodyIM to match spec Clean up flow of control. Handle </tbody>, </tfoot>, and </thead>. Pass 5 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6117057	2012-04-26 11:48:35 +10:00
Andrew Balholm	dde8358a1c	exp/html: adjust inTableIM to match spec Don't foster-parent text nodes that consist only of whitespace. (I implemented this entirely in inTableIM instead of creating an inTableTextIM, because the sole purpose of inTableTextIM seems to be to combine character tokens into a string, which our tokenizer does already.) Use parseImpliedToken to clarify a couple of cases. Handle <style>, <script>, <input>, and <form>. Ignore doctype tokens. Pass 20 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6117048	2012-04-25 10:49:27 +10:00
Andrew Balholm	b885633d62	exp/html: make inBodyIM match spec This CL corrects the remaining differences that I could find between the implementation of inBodyIM and the spec: Handle <rp> and <rt>. Adjust SVG and MathML attributes. Reconstruct active formatting elements in the "any other start tag" case. Pass 7 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6101055	2012-04-24 15:27:48 +10:00
Andrew Balholm	0cc8ee9808	exp/html: add more cases to inBodyIM Don't set framesetOK to false for hidden input elements. Handle <param>, <source>, <track>, <textarea>, <iframe>, <noembed>, and <noscript> Pass 7 additional tests. R=nigeltao CC=golang-dev https://golang.org/cl/6094045	2012-04-22 16:19:21 +10:00
Andrew Balholm	904c7c8e99	exp/html: more work on inBodyIM Reorder some cases. Handle <pre>, <listing>, </form>, </li>, </dd>, </dt>, </h1>, </h2>, </h3>, </h4>, </h5>, and </h6> tags. Pass 6 additional tests. R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6089043	2012-04-21 09:20:38 +10:00
Andrew Balholm	eea5a432cb	exp/html: start making inBodyIM match the spec Reorder some start tags. Improve handling of </body>. Handle </html>. Pass 2 additional tests (by handling </html>). R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6082043	2012-04-20 15:48:13 +10:00
Andrew Balholm	6791057296	exp/html: ignore null bytes in text pass one additional test R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6048051	2012-04-20 14:25:42 +10:00
Andrew Balholm	7d63ff09a5	exp/html: improve afterHeadIM Clean up the flow of control. Fix the TODO for handling <html> tags. Add a case to ignore doctype declarations. Pass one additional test. R=nigeltao CC=golang-dev https://golang.org/cl/6072047	2012-04-20 10:48:10 +10:00
Andrew Balholm	fca32f02e9	exp/html: improve InHeadIM Clean up the flow of control, and add a case for doctype tokens (to ignore them). R=nigeltao CC=golang-dev https://golang.org/cl/6069045	2012-04-20 09:08:58 +10:00
Andrew Balholm	c88ca5906c	exp/html: add parseImpliedToken method to parser This method will allow us to be explicit about what we're doing when we insert an implied token, and avoid repeating the logic involved in multiple places. R=nigeltao CC=golang-dev https://golang.org/cl/6060048	2012-04-19 11:48:17 +10:00
Andrew Balholm	b65c9a633e	exp/html: improve beforeHeadIM Add a case to ignore doctype tokens. Clean up the flow of control to more clearly match the spec. Pass one more test. R=nigeltao CC=golang-dev https://golang.org/cl/6062047	2012-04-18 22:45:36 +10:00
Andrew Balholm	b39bbf1e5b	exp/html: adjust beforeHTMLIM to match spec Add case for doctype tokens (which are ignored). This CL does not change the status of any tests. R=golang-dev, nigeltao CC=golang-dev https://golang.org/cl/6061047	2012-04-18 13:26:35 +10:00
Nigel Tao	324513bc5f	html: move the HTML parser to an exp/html package. The parser is a work in progress, and we are not ready to freeze its API for Go 1. Package html still exists, containing just two functions: EscapeString and UnescapeString. Both the packages at exp/html and html are "package html". The former is a superset of the latter. At some point in the future, the exp/html code will move back into html, once we have finalized the parser API. R=rsc, dsymonds CC=golang-dev https://golang.org/cl/5571059	2012-01-25 10:54:59 +11:00

29 Commits