Commit Graph

46 Commits

Author SHA1 Message Date
Andrew Balholm 750de28d6c html: ignore whitespace before <head> element
Pass tests2.dat, test 47:
" \n "
(That is, two spaces separated by a newline)

| <html>
|   <head>
|   <body>

Also pass tests through test 49:
<!DOCTYPE html><script>
</script>  <title>x</title>  </head>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5422043
2011-11-22 09:27:27 +11:00
Andrew Balholm a1dbfa6f09 html: parse <isindex>
Pass tests2.dat, test 42:
<isindex test=x name=x>

| <html>
|   <head>
|   <body>
|     <form>
|       <hr>
|       <label>
|         "This is a searchable index. Enter search keywords: "
|         <input>
|           name="isindex"
|           test="x"
|       <hr>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5399049
2011-11-17 13:12:13 +11:00
Andrew Balholm 3276afd4d4 html: parse </optgroup> and </option>
Pass tests2.dat, test 35:
<!DOCTYPE html><select><optgroup><option></optgroup><option><select><option>

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <select>
|       <optgroup>
|         <option>
|       <option>
|     <option>

Also pass tests through test 41:
<!DOCTYPE html><!-- XXX - XXX - XXX -->

R=nigeltao, rsc
CC=golang-dev
https://golang.org/cl/5395045
2011-11-17 10:25:33 +11:00
Andrew Balholm 3307597069 html: parse <optgroup> tags
Pass tests2.dat, test 34:
<!DOCTYPE html><select><option><optgroup>

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <select>
|       <option>
|       <optgroup>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5393045
2011-11-16 19:25:55 +11:00
Andrew Balholm 28546ed56a html: parse <caption> elements
Pass tests2.dat, test 33:
<!DOCTYPE html><table><caption>test TEST</caption><td>test

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <table>
|       <caption>
|         "test TEST"
|       <tbody>
|         <tr>
|           <td>
|             "test"

R=nigeltao
CC=golang-dev
https://golang.org/cl/5371099
2011-11-16 12:18:11 +11:00
Andrew Balholm b91d82258f html: auto-close <p> elements when starting <form> element.
Pass tests2.dat, test 26:
<!doctypehtml><p><form>

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <p>
|     <form>

Also pass tests through test 32:
<!DOCTYPE html><!-- X

R=nigeltao
CC=golang-dev
https://golang.org/cl/5369114
2011-11-15 15:31:22 +11:00
Andrew Balholm 3bd5082f57 html: parse and render <plaintext> elements
Pass tests2.dat, test 10:
<table><plaintext><td>

| <html>
|   <head>
|   <body>
|     <plaintext>
|       "<td>"
|     <table>

Also pass tests through test 25:
<!doctypehtml><p><dd>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5369109
2011-11-15 11:39:18 +11:00
Andrew Balholm 06ef97e15d html: auto-close <dd> and <dt> elements
Pass tests2.dat, test 8:
<!DOCTYPE html><dt><div><dd>

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <dt>
|       <div>
|     <dd>

Also pass tests through test 9:
<script></x

R=nigeltao
CC=golang-dev
https://golang.org/cl/5373083
2011-11-13 23:27:20 +11:00
Andrew Balholm 631a575fd9 html: store the current insertion mode in the parser
Currently, the state transition functions in the HTML parser
return the next insertion mode and whether the token is consumed.
This works well except for when one insertion mode needs to use
the rules for another insertion mode. Then the useTheRulesFor
function needs to patch things up. This requires comparing functions
for equality, which is going to stop working.

Adding a field to the parser structure to store the current
insertion mode eliminates the need for useTheRulesFor;
one insertion mode function can now just call the other
directly. The insertion mode will be changed only if it needs to be.

This CL is an alternative to CL 5372078.

R=nigeltao, rsc
CC=golang-dev
https://golang.org/cl/5372079
2011-11-13 12:39:41 +11:00
Andrew Balholm 3df0512469 html: handle end tags in strange places
Pass tests1.dat, test 111:
</strong></b></em></i></u></strike></s></blink></tt></pre></big></small></font></select></h1></h2></h3></h4></h5></h6></body></br></a></img></title></span></style></script></table></th></td></tr></frame></area></link></param></hr></input></col></base></meta></basefont></bgsound></embed></spacer></p></dd></dt></caption></colgroup></tbody></tfoot></thead></address></blockquote></center></dir></div></dl></fieldset></listing></menu></ol></ul></li></nobr></wbr></form></button></marquee></object></html></frameset></head></iframe></image></isindex></noembed></noframes></noscript></optgroup></option></plaintext></textarea>

| <html>
|   <head>
|   <body>
|     <br>
|     <p>

Also pass all the remaining tests in tests1.dat.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5372066
2011-11-12 12:23:30 +11:00
Andrew Balholm 0a61c846ef html: ignore <col> tag outside tables
Pass tests1.dat, test 109:
<table><col><tbody><col><tr><col><td><col></table><col>

| <html>
|   <head>
|   <body>
|     <table>
|       <colgroup>
|         <col>
|       <tbody>
|       <colgroup>
|         <col>
|       <tbody>
|         <tr>
|       <colgroup>
|         <col>
|       <tbody>
|         <tr>
|           <td>
|       <colgroup>
|         <col>

Also pass test 110:
<table><colgroup><tbody><colgroup><tr><colgroup><td><colgroup></table><colgroup>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5369069
2011-11-11 21:44:01 +11:00
Andrew Balholm 83f61a27d6 html: parse column groups
Pass tests1.dat, test 108:
<table><colgroup><col><colgroup><col><col><col><colgroup><col><col><thead><tr><td></table>

| <html>
|   <head>
|   <body>
|     <table>
|       <colgroup>
|         <col>
|       <colgroup>
|         <col>
|         <col>
|         <col>
|       <colgroup>
|         <col>
|         <col>
|       <thead>
|         <tr>
|           <td>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5369061
2011-11-11 11:41:46 +11:00
Andrew Balholm e9e874b7fc html: parse framesets
Pass tests1.dat, test 106:
<frameset><frame><frameset><frame></frameset><noframes></noframes></frameset>

| <html>
|   <head>
|   <frameset>
|     <frame>
|     <frameset>
|       <frame>
|     <noframes>

Also pass test 107:
<h1><table><td><h3></table><h3></h1>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5373050
2011-11-10 23:56:13 +11:00
Andrew Balholm 820523d091 html: correctly parse </html> in <head> element.
Pass tests1.dat, test 92:
<head></html><meta><p>

| <html>
|   <head>
|   <body>
|     <meta>
|     <p>

Also pass tests through test 98:
<p><b><div><marquee></p></b></div>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5359054
2011-11-09 19:18:26 +11:00
Andrew Balholm ce4eec2e0a html: treat <image> as <img>
Pass tests1.dat, test 90:
<p><image></p>

| <html>
|   <head>
|   <body>
|     <p>
|       <img>

Also pass test 91:
<a><table><a></table><p><a><div><a>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5339052
2011-11-09 09:43:55 +11:00
Andrew Balholm f2b602ed42 html: parse <body>, <base>, <link>, <meta>, and <title> tags inside page body
Pass tests1.dat, test 87:
<body><body><base><link><meta><title><p></title><body><p></body>

| <html>
|   <head>
|   <body>
|     <base>
|     <link>
|     <meta>
|     <title>
|       "<p>"
|     <p>

Handling the last <body> tag requires correcting the original insertion mode in useTheRulesFor.

Also pass test 88:
<textarea><p></textarea>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5364047
2011-11-08 17:55:17 +11:00
Gustavo Niemeyer f2dc50b48d html,bzip2,sql: rename Error methods that return error to Err
There are three classes of methods/functions called Error:

a) The Error method in the just introduced error interface
b) Error methods that create or report errors (http.Error, etc)
c) Error methods that return errors previously associated with
   the receiver (Tokenizer.Error, rows.Error, etc).

This CL introduces the convention that methods in case (c)
should be named Err.

The reasoning for the change is:

- The change differentiates the two kinds of APIs based on
  names rather than just on signature, unloading Error a bit
- Err is closer to the err variable name that is so commonly
  used with the intent of verifying an error
- Err is shorter and thus more convenient to be used often
  on error verifications, such as in iterators following the
  convention of the sql package.

R=bradfitz, rsc
CC=golang-dev
https://golang.org/cl/5327064
2011-11-04 09:50:20 -04:00
Andrew Balholm 632a2c59b1 html: properly close <tr> element when an new <tr> starts.
Pass tests1.dat, test 87:
<table><tr><tr><td><td><span><th><span>X</table>

| <html>
|   <head>
|   <body>
|     <table>
|       <tbody>
|         <tr>
|         <tr>
|           <td>
|           <td>
|             <span>
|           <th>
|             <span>
|               "X"

R=nigeltao
CC=golang-dev
https://golang.org/cl/5343041
2011-11-04 15:48:11 +11:00
Andrew Balholm 46308d7d11 html: move <link> element from after <head> into <head>
Pass tests1.dat, test 85:
<head><meta></head><link>

| <html>
|   <head>
|     <meta>
|     <link>
|   <body>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5297079
2011-11-04 09:29:06 +11:00
Andrew Balholm 77aabbf217 html: parse <link> elements in <head>
Pass tests1.dat, test 83:
<title><meta></title><link><title><meta></title>

| <html>
|   <head>
|     <title>
|       "<meta>"
|     <link>
|     <title>
|       "<meta>"
|   <body>

Also pass test 84:
<style><!--</style><meta><script>--><link></script>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5331061
2011-11-03 17:12:13 +11:00
Andrew Balholm cf6a712162 html: properly close <marquee> elements.
Pass tests1.dat, test 80:
<a href=a>aa<marquee>aa<a href=b>bb</marquee>aa

| <html>
|   <head>
|   <body>
|     <a>
|       href="a"
|       "aa"
|       <marquee>
|         "aa"
|         <a>
|           href="b"
|           "bb"
|       "aa"

Also pass tests through test 82:
<!DOCTYPE html><spacer>foo

R=nigeltao
CC=golang-dev
https://golang.org/cl/5319071
2011-11-03 10:11:06 +11:00
Russ Cox c2049d2dfe src/pkg/[a-m]*: gofix -r error -force=error
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/5322051
2011-11-01 22:04:37 -04:00
Andrew Balholm 22ee5ae25a html: stop at scope marker node when generating implied </a> tags
A <a> tag generates implied end tags for any open <a> elements.
But it shouldn't do that when it is inside a table cell the the open <a>
is outside the table.
So stop the search for an open <a> when we reach a scope marker node.

Pass tests1.dat, test 78:
<a href="blah">aba<table><tr><td><a href="foo">br</td></tr>x</table>aoe

| <html>
|   <head>
|   <body>
|     <a>
|       href="blah"
|       "abax"
|       <table>
|         <tbody>
|           <tr>
|             <td>
|               <a>
|                 href="foo"
|                 "br"
|       "aoe"

Also pass test 79:
<table><a href="blah">aba<tr><td><a href="foo">br</td></tr>x</table>aoe

R=nigeltao
CC=golang-dev
https://golang.org/cl/5320063
2011-11-02 11:47:05 +11:00
Andrew Balholm 9db3f78c39 html: process </td> tags; foster parent at most one node per token
Correctly close table cell when </td> is read.

Because of reconstructing the active formatting elements, more than one
node may be created when reading a single token.
If both nodes are foster parented, they will be siblings, but the first
node should be the parent of the second.

Pass tests1.dat, test 77:
<a href="blah">aba<table><a href="foo">br<tr><td></td></tr>x</table>aoe

| <html>
|   <head>
|   <body>
|     <a>
|       href="blah"
|       "aba"
|       <a>
|         href="foo"
|         "br"
|       <a>
|         href="foo"
|         "x"
|       <table>
|         <tbody>
|           <tr>
|             <td>
|     <a>
|       href="foo"
|       "aoe"

R=nigeltao
CC=golang-dev
https://golang.org/cl/5305074
2011-11-01 11:42:54 +11:00
Andrew Balholm 604e10c34d html: adjust bookmark in "adoption agency" algorithm
In the adoption agency algorithm, the formatting element is sometimes
removed from the list of active formatting elements and reinserted at a later index.
In that case, the bookmark showing where it is to be reinserted needs to be moved,
so that its position relative to its neighbors remains the same
(and also so that it doesn't become out of bounds).

Pass tests1.dat, test 70:
<DIV> abc <B> def <I> ghi <P> jkl </B>

| <html>
|   <head>
|   <body>
|     <div>
|       " abc "
|       <b>
|         " def "
|         <i>
|           " ghi "
|       <i>
|         <p>
|           <b>
|             " jkl "

Also pass tests through test 76:
<test attribute---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->

R=nigeltao
CC=golang-dev
https://golang.org/cl/5322052
2011-10-29 10:51:59 +11:00
Andrew Balholm 03f163c7f2 html: don't run "adoption agency" on elements that aren't in scope.
Pass tests1.dat, test 55:
<!DOCTYPE html><font><table></font></table></font>

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <font>
|       <table>

Also pass tests through test 69:
<DIV> abc <B> def <I> ghi <P> jkl

R=nigeltao
CC=golang-dev
https://golang.org/cl/5309074
2011-10-28 16:04:58 +11:00
Andrew Balholm 053549ca1b html: allow whitespace text nodes in <head>
Pass tests1.dat, test 50:
<!DOCTYPE html><script> <!-- </script> --> </script> EOF

| <!DOCTYPE html>
| <html>
|   <head>
|     <script>
|       " <!-- "
|     " "
|   <body>
|     "-->  EOF"

Also pass tests through test 54:
<!DOCTYPE html><title>U-test</title><body><div><p>Test<u></p></div></body>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5311066
2011-10-28 09:06:30 +11:00
Andrew Balholm 833fb4198d html: parse <style> elements inside <head> element.
Also correctly handle EOF inside a <style> element.

Pass tests1.dat, test 49:
<!DOCTYPE html><style> EOF

| <!DOCTYPE html>
| <html>
|   <head>
|     <style>
|       " EOF"
|   <body>

R=nigeltao
CC=golang-dev
https://golang.org/cl/5321057
2011-10-27 10:26:11 +11:00
Andrew Balholm bd07e4f259 html: close <option> element when opening <optgroup>
Pass tests1.dat, test 34:
<!DOCTYPE html>A<option>B<optgroup>C<select>D</option>E

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     "A"
|     <option>
|       "B"
|     <optgroup>
|       "C"
|       <select>
|         "DE"

Also passes tests 35-48. Test 48 is:
</ COM--MENT >

R=nigeltao
CC=golang-dev
https://golang.org/cl/5311063
2011-10-27 09:45:53 +11:00
Andrew Balholm 05ed18f4f6 html: improve parsing of lists
Make a <li> tag close the previous <li> element.
Make a </ul> tag close <li> elements.

Pass tests1.dat, test 33:
<!DOCTYPE html><li>hello<li>world<ul>how<li>do</ul>you</body><!--do-->

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <li>
|       "hello"
|     <li>
|       "world"
|       <ul>
|         "how"
|         <li>
|           "do"
|       "you"
|   <!-- do -->

R=nigeltao
CC=golang-dev
https://golang.org/cl/5321051
2011-10-26 14:02:30 +11:00
Andrew Balholm 6e318bda6c html: improve parsing of tables
When foster parenting, merge adjacent text nodes.
Properly close table row at </tr> tag.

Pass tests1.dat, test 32:
<!-----><font><div>hello<table>excite!<b>me!<th><i>please!</tr><!--X-->

| <!-- - -->
| <html>
|   <head>
|   <body>
|     <font>
|       <div>
|         "helloexcite!"
|         <b>
|           "me!"
|         <table>
|           <tbody>
|             <tr>
|               <th>
|                 <i>
|                   "please!"
|             <!-- X -->

R=nigeltao
CC=golang-dev
https://golang.org/cl/5323048
2011-10-26 11:36:46 +11:00
Nigel Tao 18b025d530 html: remove the Tokenizer.ReturnComments option.
The original intention was to simplify the parser, in making it skip
all comment tokens. However, checking that the Go html package is
100% compatible with the WebKit HTML test suite requires parsing the
comments. There is no longer any real benefit for the option.

R=gri, andybalholm
CC=golang-dev
https://golang.org/cl/5321043
2011-10-25 11:28:07 +11:00
Andrew Balholm 2aa589c843 html: implement foster parenting
Implement the foster-parenting algorithm for content that is inside a table
but not in a cell.

Also fix a bug in reconstructing the active formatting elements.

Pass test 30 in tests1.dat:
<a><table><td><a><table></table><a></tr><a></table><b>X</b>C<a>Y

R=nigeltao
CC=golang-dev
https://golang.org/cl/5309052
2011-10-23 18:36:01 +11:00
Nigel Tao 2f352ae48a html: parse <select> tags.
The additional test case in parse_test.go is:
<select><b><option><select><option></b></select>X

R=andybalholm
CC=golang-dev
https://golang.org/cl/5293051
2011-10-22 20:18:12 +11:00
Nigel Tao 64306c9fd0 html: parse and render comment nodes.
The first additional test case in parse_test.go is:
<!--><div>--<!-->

The second one is unrelated to the comment change, but also passes:
<p><hr></p>

R=andybalholm
CC=golang-dev
https://golang.org/cl/5299047
2011-10-20 11:45:30 +11:00
Nigel Tao b1fd528db5 html: parse raw text and RCDATA elements, such as <script> and <title>.
Pass tests1.dat, test 26:
#data
<script><div></script></div><title><p></title><p><p>
#document
| <html>
|   <head>
|     <script>
|       "<div>"
|     <title>
|       "<p>"
|   <body>
|     <p>
|     <p>

Thanks to Andy Balholm for driving this change.

R=andybalholm
CC=golang-dev
https://golang.org/cl/5301042
2011-10-19 08:03:30 +11:00
Andrew Balholm c64e8e327e html: insert implied <p> and </p> tags
(test # 25 in tests1.dat)
#data
<p><b><div></p></b></div>X
#document
| <html>
|   <head>
|   <body>
|     <p>
|       <b>
|     <div>
|       <b>
|
|           <p>
|           "X"

R=nigeltao
CC=golang-dev
https://golang.org/cl/5254060
2011-10-13 12:40:48 +11:00
Nigel Tao 1d0c141d7d html: parse doctype tokens; merge adjacent text nodes.
The test case input is "<!DOCTYPE html><span><button>foo</span>bar".
The correct parse is:
| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <span>
|       <button>
|         "foobar"

R=gri
CC=golang-dev
https://golang.org/cl/4794063
2011-08-01 10:26:46 +10:00
Nigel Tao 5a141064ed html: parse misnested formatting tags according to the HTML5 spec.
This is the "adoption agency" algorithm.

The test case input is "<a><p>X<a>Y</a>Z</p></a>". The correct parse is:
| <html>
|   <head>
|   <body>
|     <a>
|     <p>
|       <a>
|         "X"
|       <a>
|         "Y"
|       "Z"

R=gri
CC=golang-dev
https://golang.org/cl/4771042
2011-07-21 11:20:54 +10:00
Nigel Tao d360e0213d html: update section references in comments to the latest HTML5 spec.
R=r
CC=golang-dev
https://golang.org/cl/4699048
2011-07-13 16:53:02 +10:00
Yasuhiro Matsumoto 1e6d946594 html: parse start tags that aren't explicitly otherwise dealt with.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4626080
2011-07-06 13:08:52 +10:00
Yasuhiro Matsumoto 054cf72b56 html: fix nesting when parsing a close tag.
R=nigeltao
CC=golang-dev
https://golang.org/cl/4636067
2011-06-30 23:16:33 +10:00
Nigel Tao fec6ab9726 html: parse "<h1>foo<h2>bar".
R=gri
CC=golang-dev
https://golang.org/cl/3571043
2010-12-15 11:39:56 +11:00
Nigel Tao 71bd053ada html: parse <table><tr><td> tags.
Also, shorten fooInsertionMode to fooIM.

R=gri
CC=golang-dev
https://golang.org/cl/3504042
2010-12-10 12:20:14 +11:00
Nigel Tao 49014c5b12 html: handle unexpected EOF during parsing.
This lets us parse HTML like "<html>foo".

R=gri
CC=golang-dev
https://golang.org/cl/3460043
2010-12-08 08:59:20 +11:00
Nigel Tao 08a47d6f60 html: first cut at a parser.
R=gri
CC=golang-dev
https://golang.org/cl/3355041
2010-12-07 12:02:36 +11:00