Commit Graph

33 Commits

Author SHA1 Message Date
Nigel Tao bca65e395e html: parse more malformed tags.
This continues the work in revision 914a659b44ff, now passing more test
cases. As before, the new tokenization tests match html5lib's behavior.

Fixes #2124.

R=dsymonds, r
CC=golang-dev
https://golang.org/cl/4867042
2011-08-11 18:49:09 +10:00
Nigel Tao 37afff2978 html: parse malformed tags missing a '>', such as `<p id=0</p>`.
The additional token_test.go cases matches html5lib behavior.

Fixes #2124.

R=gri
CC=golang-dev
https://golang.org/cl/4844055
2011-08-10 13:39:07 +10:00
Nigel Tao 1d0c141d7d html: parse doctype tokens; merge adjacent text nodes.
The test case input is "<!DOCTYPE html><span><button>foo</span>bar".
The correct parse is:
| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <span>
|       <button>
|         "foobar"

R=gri
CC=golang-dev
https://golang.org/cl/4794063
2011-08-01 10:26:46 +10:00
Nigel Tao 5f134f9b5b html: sync html/testdata/webkit with upstream WebKit.
As $GOROOT/src/pkg/html/testdata/webkit/README says, we're pulling from
$WEBKITROOT/LayoutTests/html5lib/resources.

R=r
CC=golang-dev
https://golang.org/cl/4810043
2011-07-21 12:50:45 +10:00
Nigel Tao 5a141064ed html: parse misnested formatting tags according to the HTML5 spec.
This is the "adoption agency" algorithm.

The test case input is "<a><p>X<a>Y</a>Z</p></a>". The correct parse is:
| <html>
|   <head>
|   <body>
|     <a>
|     <p>
|       <a>
|         "X"
|       <a>
|         "Y"
|       "Z"

R=gri
CC=golang-dev
https://golang.org/cl/4771042
2011-07-21 11:20:54 +10:00
Andrew Balholm 816c972ff0 html: handle character entities without semicolons
Fix the TODO: unescape("&notit;") should be "¬it;"

Also accept digits in entity names.

R=nigeltao
CC=golang-dev, rsc
https://golang.org/cl/4781042
2011-07-21 09:10:49 +10:00
Nigel Tao d360e0213d html: update section references in comments to the latest HTML5 spec.
R=r
CC=golang-dev
https://golang.org/cl/4699048
2011-07-13 16:53:02 +10:00
Yasuhiro Matsumoto 1e6d946594 html: parse start tags that aren't explicitly otherwise dealt with.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4626080
2011-07-06 13:08:52 +10:00
Yasuhiro Matsumoto 054cf72b56 html: fix nesting when parsing a close tag.
R=nigeltao
CC=golang-dev
https://golang.org/cl/4636067
2011-06-30 23:16:33 +10:00
Rob Pike ebb1566a46 strings.Split: make the default to split all.
Change the signature of Split to have no count,
assuming a full split, and rename the existing
Split with a count to SplitN.
Do the same to package bytes.
Add a gofix module.

R=adg, dsymonds, alex.brainman, rsc
CC=golang-dev
https://golang.org/cl/4661051
2011-06-28 09:43:14 +10:00
Brad Fitzpatrick 5e03143c1a html: improve attribute parsing, note package status
Fixes #1890

R=nigeltao
CC=golang-dev
https://golang.org/cl/4528102
2011-06-06 15:56:15 -07:00
Robert Hencke c8727c81bb pkg: spelling tweaks, A-H
R=ality, bradfitz, rsc, dsymonds, adg, qyzhai, dchest
CC=golang-dev
https://golang.org/cl/4536063
2011-05-18 13:14:56 -04:00
Brad Fitzpatrick f4e5f364c7 html: parse empty, unquoted, and single-quoted attribute values
Fixes #1391

R=nigeltao
CC=golang-dev
https://golang.org/cl/4453054
2011-05-12 16:11:35 -07:00
Brad Fitzpatrick 9d12307a12 ioutil: add Discard, update tree.
This also removes an unnecessary allocation in
http/transfer.go

R=r, rsc1, r2, adg
CC=golang-dev
https://golang.org/cl/4426066
2011-04-27 15:47:04 -07:00
Nigel Tao 6a186d38d1 src/pkg: make package doc comments consistently start with "Package foo".
R=rsc
CC=golang-dev
https://golang.org/cl/4442064
2011-04-20 09:57:05 +10:00
Rob Pike 8a90fd3c72 os: New Open API.
We replace the current Open with:
OpenFile(name, flag, perm) // same as old Open
Open(name) // same as old Open(name, O_RDONLY, 0)
Create(name) // same as old Open(name, O_RDWR|O_TRUNC|O_CREAT, 0666)

This CL includes a gofix module and full code updates: all.bash passes.
(There may be a few comments I missed.)

The interesting packages are:
        gofix
        os
Everything else is automatically generated except for hand tweaks to:
        src/pkg/io/ioutil/ioutil.go
        src/pkg/io/ioutil/tempfile.go
        src/pkg/crypto/tls/generate_cert.go
        src/cmd/goyacc/goyacc.go
        src/cmd/goyacc/units.y

R=golang-dev, bradfitzwork, rsc, r2
CC=golang-dev
https://golang.org/cl/4357052
2011-04-04 23:42:14 -07:00
Nigel Tao 42ed1ad4a6 html: small documentation fix.
R=rsc
CC=golang-dev
https://golang.org/cl/4169058
2011-02-18 10:35:49 +11:00
Nigel Tao a5ff8ad9db html: tokenize HTML comments.
I'm not sure if it's 100% correct wrt the HTML5 specification,
but the test suite has plenty of HTML comment test cases, and
we'll shake out any tokenization bugs as the parser improves its
coverage.

R=gri
CC=golang-dev
https://golang.org/cl/4186055
2011-02-17 10:45:30 +11:00
Nigel Tao fec6ab9726 html: parse "<h1>foo<h2>bar".
R=gri
CC=golang-dev
https://golang.org/cl/3571043
2010-12-15 11:39:56 +11:00
Nigel Tao 71bd053ada html: parse <table><tr><td> tags.
Also, shorten fooInsertionMode to fooIM.

R=gri
CC=golang-dev
https://golang.org/cl/3504042
2010-12-10 12:20:14 +11:00
Nigel Tao 49014c5b12 html: handle unexpected EOF during parsing.
This lets us parse HTML like "<html>foo".

R=gri
CC=golang-dev
https://golang.org/cl/3460043
2010-12-08 08:59:20 +11:00
Nigel Tao 688a83128d html: move the sanity checking of the entity map from runtime
(during init) to test-time (via gotest).

R=gri
CC=golang-dev
https://golang.org/cl/3466044
2010-12-08 07:55:03 +11:00
Ryan Hitchman f503e26379 html: unescape numeric entities, and complete the named entities table, including two-character entities.
Fixes #1233.

R=nigeltao
CC=golang-dev
https://golang.org/cl/3445041
2010-12-07 12:13:47 +11:00
Nigel Tao 08a47d6f60 html: first cut at a parser.
R=gri
CC=golang-dev
https://golang.org/cl/3355041
2010-12-07 12:02:36 +11:00
Adam Langley 3cb4bdb9ce utf8: make EncodeRune's destination the first argument.
R=r
CC=golang-dev
https://golang.org/cl/3364041
2010-11-30 16:59:43 -05:00
Russ Cox 69c4e9380b use append
R=gri, r, r2
CC=golang-dev
https://golang.org/cl/2743042
2010-10-27 19:47:23 -07:00
Robert Griesemer 3478891d12 gofmt -s -w src misc
R=r, rsc
CC=golang-dev
https://golang.org/cl/2662041
2010-10-22 10:06:33 -07:00
Russ Cox 7c9f0f0109 html: disable print
Everything is incomplete.
Let's not make noise like this a habit.

R=nigeltao_gnome
CC=golang-dev
https://golang.org/cl/2272041
2010-09-23 22:05:42 -04:00
Russ Cox da392d9136 build: no required environment variables
R=adg, r, PeterGo
CC=golang-dev
https://golang.org/cl/1942044
2010-08-18 10:08:49 -04:00
Kyle Consalus 8fcdc6a1e2 Small performance improvements to the HTML tokenizer based on your 'TODO's.
R=nigeltao_golang
CC=golang-dev
https://golang.org/cl/1941042
2010-08-12 09:45:34 +10:00
Nigel Tao 56b989f1b9 First cut of an HTML tokenizer (and eventually a parser).
R=r, rsc, gri, rsc1
CC=golang-dev
https://golang.org/cl/1814044
2010-08-10 16:08:21 +10:00
Nigel Tao 43b3a247d3 html: sync testdata/webkit to match WebKit tip.
R=rsc
CC=golang-dev
https://golang.org/cl/1701041
2010-06-15 09:07:47 +10:00
Nigel Tao 64784801cd HTML5 parser test data from WebKit.
R=rsc
CC=golang-dev
https://golang.org/cl/1559041
2010-06-04 17:47:22 -07:00