Commit Graph

32 Commits

Author SHA1 Message Date
Mike Samuel 1f577d26d7 exp/template/html: simplify transition functions
This simplifies transition functions to make it easier to reliably
elide comments in a later CL.

Before:
- transition functions are responsible for detecting special end tags.
After:
- the code to detect special end tags is done in one place.

We were relying on end tags being skipped which meant we were
not noticing comments inside script/style elements that contain no
substitutions.
This change means we will notice all such comments where necessary,
but stripTags will notice none since it does not need to.  This speeds
up stripTags.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5074041
2011-09-21 19:04:41 -07:00
Mike Samuel 1262f6bde7 exp/template/html: fix bug, '<' normalization for text nodes that change context
R=nigeltao
CC=golang-dev
https://golang.org/cl/5080042
2011-09-20 22:55:14 -07:00
Mike Samuel 3a013f1175 exp/template/html: change transition functions to return indices
Formulaic changes to transition functions in preparation for CL 5074041.
This should be completely semantics preserving.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5091041
2011-09-19 20:52:14 -07:00
Mike Samuel 8bc5ef6cd7 exp/template/html: allow commenting out of actions
Instead of erroring on actions inside comments, use existing escaping
pipeline to quash the output of actions inside comments.

If a template maintainer uses a comment to disable template code:

  {{if .}}Hello, {{.}}!{{end}}

->

  <!--{{if true}}Hello, {{.}}!{{end}}-->

will result in

  <!--Hello, !-->

regardless of the value of {{.}}.

In a later CL, comment elision will result in the entire commented-out
section being dropped from the template output.

Any side-effects in pipelines, such as panics, will still be realized.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5078041
2011-09-19 19:52:31 -07:00
Mike Samuel 533b372280 exp/template/html: define isComment helper
Non semantics-changing refactoring in preparation for comment elision.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5071043
2011-09-19 17:27:49 -07:00
Mike Samuel b4e1ca25b1 exp/template/html: allow quotes on either side of conditionals and dynamic HTML names
This addresses several use cases:

(1) <h{{.HeaderLevel}}> used to build hierarchical documents.
(2) <input on{{.EventType}}=...> used in widgets.
(3) <div {{" dir=ltr"}}> used to embed bidi-hints.

It also makes sure that we treat the two templates below the same:

<img src={{if .Avatar}}"{{.Avatar}}"{{else}}"anonymous.png"{{end}}>
<img src="{{if .Avatar}}{{.Avatar}}{{else}}anonymous.png{{end}}">

This splits up tTag into a number of sub-states and adds testcases.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5043042
2011-09-18 19:10:15 -07:00
Mike Samuel 52a46bb773 exp/template/html: normalize '<' in text and RCDATA nodes.
The template

  <{{.}}

would violate the structure preservation property if allowed and not
normalized, because when {{.}} emitted "", the "<" would be part of
a text node, but if {{.}} emitted "a", the "<" would not be part of
a text node.

This change rewrites '<' in text nodes and RCDATA text nodes to
'&lt;' allowing template authors to write the common, and arguably more
readable:

    Your price: {{.P1}} < list price {{.P2}}

while preserving the structure preservation property.

It also lays the groundwork for comment elision, rewriting

    Foo <!-- comment with secret project details --> Bar

to

    Foo  Bar

R=nigeltao
CC=golang-dev
https://golang.org/cl/5043043
2011-09-18 12:04:40 -07:00
Mike Samuel e213a0c0fc exp/template/html: recognize whitespace at start of URLs.
HTML5 uses "Valid URL potentially surrounded by spaces" for
attrs: http://www.w3.org/TR/html5/index.html#attributes-1

    <a href=" {{.}}">

should be escaped to filter out "javascript:..." as data.

R=nigeltao
CC=golang-dev
https://golang.org/cl/5027045
2011-09-18 11:55:14 -07:00
Mike Samuel a399040226 exp/template/html: type fixed point computation in template
I found a simple test case that does require doing the fixed point TODO
in computeOutCtx.

I found a way though to do this and simplify away the escapeRange
hackiness that was added in https://golang.org/cl/5012044/

R=nigeltao
CC=golang-dev
https://golang.org/cl/5015052
2011-09-16 00:34:26 -07:00
Mike Samuel 96f9e8837e exp/template/html: moved error docs out of package docs onto error codes
This replaces the errStr & errLine members of context with a single err
*Error, and introduces a number of const error codes, one per
escape-time failure mode, that can be separately documented.

The changes to the error documentation moved from doc.go to error.go
are cosmetic.

R=r, nigeltao
CC=golang-dev
https://golang.org/cl/5026041
2011-09-15 19:05:33 -07:00
Mike Samuel ce008f8c37 exp/template/html: pre-sanitized content
Not all content is plain text.  Sometimes content comes from a trusted
source, such as another template invocation, an HTML tag whitelister,
etc.

Template authors can deal with over-escaping in two ways.

1) They can encapsulate known-safe content via
   type HTML, type CSS, type URL, and friends in content.go.
2) If they know that the for a particular action never needs escaping
   then they can add |noescape to the pipeline.
   {{.KnownSafeContent | noescape}}
   which will prevent any escaping directives from being added.

This CL defines string type aliases: HTML, CSS, JS, URI, ...
It then modifies stringify to unpack the content type.
Finally it modifies the escaping functions to use the content type and
decline to escape content that does not require it.

There are minor changes to escapeAction and helpers to treat as
equivalent explicit escaping directives such as "html" and "urlquery"
and the escaping directives defined in the contextual autoescape module
and to recognize the special "noescape" directive.

The html escaping functions are rearranged.  Instead of having one
escaping function used in each {{.}} in

    {{.}} : <textarea title="{{.}}">{{.}}</textarea>

a slightly different escaping function is used for each.
When {{.}} binds to a pre-sanitized string of HTML

    `one < <i>two</i> &amp; two < "3"`

we produces something like

     one < <i>two</i> &amp; two < "3" :
     <textarea title="one &lt; two &amp; two &lt; &#34;3&#34;">
       one &lt; &lt;i&gt;two&lt;/i&gt; &amp; two &lt; "3"
     </textarea>

Although escaping is not required in <textarea> normally, if the
substring </textarea> is injected, then it breaks, so we normalize
special characters in RCDATA and do the same to preserve attribute
boundaries.  We also strip tags since developers never intend
typed HTML injected in an attribute to contain tags escaped, but
do occasionally confuse pre-escaped HTML with HTML from a
tag-whitelister.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4962067
2011-09-15 08:51:55 -07:00
Mike Samuel 3eb41fbeb6 exp/template/html: render templates unusable when escaping fails
This moots a caveat in the proposed package documentation by
rendering useless any template that could not be escaped.

From https://golang.org/cl/4969078/
> If EscapeSet returns an error, do not Execute the set; it is not
> safe against injection.
r: [but isn't the returned set nil? i guess you don't overwrite the
r: original if there's a problem, but i think you're in your rights to
r: do so]

R=r
CC=golang-dev
https://golang.org/cl/5020043
2011-09-14 20:40:50 -07:00
Mike Samuel 23fab11c47 exp/template/html: flesh out package documentation.
R=nigeltao, r
CC=golang-dev
https://golang.org/cl/4969078
2011-09-14 14:21:20 -07:00
Mike Samuel 514c9243f2 exp/template/html: check that modified nodes are not shared by templates
R=nigeltao
CC=golang-dev
https://golang.org/cl/5012044
2011-09-14 11:52:03 -07:00
Mike Samuel 15d47ce219 exp/template/html: move transition functions to a separate file
This CL moves code but makes no changes otherwise.

R=nigeltao, r
CC=golang-dev
https://golang.org/cl/5012045
2011-09-13 17:53:55 -07:00
Mike Samuel 4c6454aecf exp/template/html: escape {{template}} calls and sets of templates
This adds support for {{template "callee"}} calls.
It recognizes that calls can appear in many contexts.

{{if .ImageURL}}
    <img src="{{.ImageURL}}" alt="{{template "description"}}">
{{else}}
    <p>{{template "description"}}</p>
{{end}}

calls a template in two different contexts, first in an HTML attribute
context, and second in an HTML text context.

Those two contexts aren't very different, but when linking text
to search terms, the escaping context can be materially different:

<a href="/search?q={{template "tags"}}">{{template "tags"}}</a>

This adds API:
EscapeSet(*template.Set, names ...string) os.Error

takes a set of templates and the names of those which might be called
in the default context as starting points.

It changes the escape* functions to be methods of an object which
maintains a conceptual mapping of
(template names*input context) -> output context.

The actual mapping uses as key a mangled name which combines the
template name with the input context.

The mangled name when the input context is the default context is the
same as the unmangled name.

When a template is called in multiple contexts, we clone the template.

{{define "tagLink"}}
  <a href="/search?q={{template "tags"}}">{{template "tags"}}</a>
{{end}}
{{define "tags"}}
  {{range .Tags}}{{.}},{{end}}
{{end}}

given []string{ "foo", "O'Reilly", "bar" } produces

  <a href="/search?q=foo,O%27Reilly,bar">foo,O&#39;Reilly,bar</a>

This involves rewriting the above to something like

{{define "tagLink"}}
  <a href="/search?q={{template "tags$1"}}">{{template "tags"}}</a>
{{end}}
{{define "tags"}}
  {{range .Tags}}{{. | html}},{{end}}
{{end}}
{{define "tags$1"}}
  {{range .Tags}}{{. | urlquery}},{{end}}
{{end}}

clone.go provides a mechanism for cloning template "tags" to produce
"tags$1".

changes to escape.go implement the new API and context propagation
around the call graph.

context.go includes minor changes to support name mangling and
context_test.go tests those.

js.go contains a bug-fix.

R=nigeltao, r
CC=golang-dev
https://golang.org/cl/4969072
2011-09-13 16:57:39 -07:00
Mike Samuel 0432a23c68 exp/template/html: tolerate '/' ambiguity in JS when it doesn't matter.
Often, division/regexp ambiguity doesn't matter in JS because the next
token is not a slash.

For example, in

  <script>var global{{if .InitVal}} = {{.InitVal}}{{end}}</script>

When there is an initial value, the {{if}} ends with jsCtxDivOp
since a '/' following {{.InitVal}} would be a division operator.
When there is none, the empty {{else}} branch ends with jsCtxRegexp
since a '/' would start a regular expression.  A '/' could result
in a valid program if it were on a new line to allow semicolon
insertion to terminate the VarDeclaration.

There is no '/' though, so we can ignore the ambiguity.

There are cases where a missing semi can result in ambiguity that
we should report.

  <script>
  {{if .X}}var x = {{.X}}{{end}}
  /...{{.Y}}
  </script>

where ... could be /foo/.test(bar) or /divisor.  Disambiguating in
this case is hard and is required to sanitize {{.Y}}.

Note, that in the case where there is a '/' in the script tail but it
is not followed by any interpolation, we already don't care.  So we
are already tolerant of

<script>{{if .X}}var x = {{.X}}{{end}}/a-bunch-of-text</script>

because tJS checks for </script> before looking in /a-bunch-of-text.

This CL
- Adds a jsCtx value: jsCtxUnknown
- Changes joinContext to join contexts that only differ by jsCtx.
- Changes tJS to return an error when a '/' is seen in jsCtxUnknown.
- Adds tests for both the happy and sad cases.

R=nigeltao
CC=golang-dev
https://golang.org/cl/4956077
2011-09-12 16:37:03 -07:00
Mike Samuel 80a5ddbdb1 exp/template/html: fix bug /*/ is not a full JS block comment.
Similar tests for CSS already catch this problem in tCSS.

R=nigeltao
CC=golang-dev
https://golang.org/cl/4967065
2011-09-12 16:01:30 -07:00
Nigel Tao b2b3187f5e exp/template/html: fix JS regexp escape of an empty string.
R=dsymonds
CC=golang-dev, mikesamuel
https://golang.org/cl/4972063
2011-09-12 11:57:34 +10:00
Mike Samuel 1f13423d3e exp/template/html: Grammar rules for HTML comments and special tags.
Augments type context and adds grammatical rules to handle special HTML constructs:
    <!-- comments -->
    <script>raw text</script>
    <textarea>no tags here</textarea>

This CL does not elide comment content.  I recommend we do that but
have not done it in this CL.

I used a codesearch tool over a codebase in another template language.

Based on the below I think we should definitely recognize
  <script>, <style>, <textarea>, and <title>
as each of these appears frequently enough that there are few
template using apps that do not use most of them.

Of the other special tags,
  <xmp>, <noscript>
are used but infrequently, and
  <noframe> and friend, <listing>
do not appear at all.

We could support <xmp> even though it is obsolete in HTML5
because we already have the machinery, but I suggest we do not
support noscript since it is a normal tag in some browser
configurations.

I suggest recognizing and eliding <!-- comments -->
(but not escaping text spans) as they are widely used to
embed comments in template source.  Not eliding them increases
the size of content sent over the network, and risks leaking
code and project internal details.
The template language I tested elides them so there are
no instance of IE conditional compilation directives in the
codebase but that could be a source of confusion.

The codesearch does the equivalent of
$ find . -name \*.file-extension \
  | perl -ne 'print "\L$1\n" while s@<([a-z][a-z0-9])@@i' \
  | sort | uniq -c | sort

The 5 uses of <plaintext> seem to be in tricky code and can be ignored.
The 2 uses of <xmp> appear in the same tricky code and can be ignored.
I also ignored end tags to avoid biasing against unary
elements and threw out some nonsense names since since the
long tail is dominated by uses of < as a comparison operator
in the template languages expression language.

I have added asterisks next to abnormal elements.

  26765 div
   7432 span
   7414 td
   4233 a
   3730 tr
   3238 input
   2102 br
   1756 li
   1755 img
   1674 table
   1388 p
   1311 th
   1064 option
    992 b
    891 label
    714 script *
    519 ul
    446 tbody
    412 button
    381 form
    377 h2
    358 select
    353 strong
    318 h3
    314 body
    303 html
    266 link
    262 textarea *
    261 head
    258 meta
    225 title *
    189 h1
    176 col
    156 style *
    151 hr
    119 iframe
    103 h4
    101 pre
    100 dt
     98 thead
     90 dd
     83 map
     80 i
     69 object
     66 ol
     65 em
     60 param
     60 font
     57 fieldset
     51 string
     51 field
     51 center
     44 bidi
     37 kbd
     35 legend
     30 nobr
     29 dl
     28 var
     26 small
     21 cite
     21 base
     20 embed
     19 colgroup
     12 u
     12 canvas
     10 sup
     10 rect
     10 optgroup
     10 noscript *
      9 wbr
      9 blockquote
      8 tfoot
      8 code
      8 caption
      8 abbr
      7 msg
      6 tt
      6 text
      6 h5
      5 svg
      5 plaintext *
      5 article
      4 shortquote
      4 number
      4 menu
      4 ins
      3 progress
      3 header
      3 content
      3 bool
      3 audio
      3 attribute
      3 acronym
      2 xmp *
      2 overwrite
      2 objects
      2 nobreak
      2 metadata
      2 description
      2 datasource
      2 category
      2 action

R=nigeltao
CC=golang-dev
https://golang.org/cl/4964045
2011-09-09 00:07:40 -07:00
Mike Samuel 4670d9e634 exp/template/html: autoescape actions in HTML style attributes.
This does not wire up <style> elements as that is pending support
for raw text content in CL https://golang.org/cl/4964045/

This CL allows actions to appear in contexts like

selectors:        {{.Tag}}{{.Class}}{{.Id}}
property names:   border-{{.BidiLeadingEdge}}
property values:  color: {{.Color}}
strings:          font-family: "{{font-name}}"
URL strings:      background: "/foo?image={{.ImgQuery}}"
URL literals:     background: url("{{.Image}}")

but disallows actions inside CSS comments and disallows
embedding of JS in CSS entirely.

It is based on the CSS3 lexical grammar with affordances for
common browser extensions including line comments.

R=nigeltao
CC=golang-dev
https://golang.org/cl/4968058
2011-09-09 07:18:20 +10:00
Nigel Tao 2b6d3b498c exp/template/html: string replacement refactoring.
R=mikesamuel
CC=golang-dev
https://golang.org/cl/4968063
2011-09-03 10:30:05 +10:00
Mike Samuel 5edeef214d exp/template/html: non-semantics changing tweaks to js{,_test}.go
R=nigeltao
CC=golang-dev
https://golang.org/cl/4962049
2011-09-02 10:28:00 +10:00
Mike Samuel 0253c688d0 exp/template/html: Implement grammar for JS.
This transitions into a JS state when entering any attribute whose
name starts with "on".

It does not yet enter a JS on entry into a <script> element as script
element handling is introduced in another CL.

R=nigeltao
CC=golang-dev
https://golang.org/cl/4968052
2011-09-01 12:03:40 +10:00
Mike Samuel 22d5f9aae3 exp/template/html: Added handling for URL attributes.
1. adds a urlPart field to context
2. implements tURL to figure out the URL part
3. modifies joinContext to allow common context mismatches
   around branches to be ignored when not material as in
   <a href="/foo{{if .HasQuery}}?q={{.Query}}{{/if}}">
4. adds a pipeline function that filters dynamically inserted
   protocols to prevent code injection via URLs.

R=nigeltao
CC=golang-dev
https://golang.org/cl/4957041
2011-08-30 11:42:30 +10:00
Nigel Tao 1f0d277cc1 exp/template/html: add some tests for ">" attributes.
R=mikesamuel
CC=golang-dev
https://golang.org/cl/4956042
2011-08-25 13:48:21 +10:00
Mike Samuel 42a56d3e81 exp/template/html: Reworked escapeText to recognize attr boundaries.
The following testcases now pass:

`<a href=x` tests that we do not error on partial unquoted attrs.
`<a href=x ` tests that spaces do end unquoted attrs on spaces.
`<a href=''` tests that we recognize the end of single quoted attrs.
`<a href=""` tests that we recognize the end of double quoted attrs.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4932051
2011-08-25 11:24:43 +10:00
Nigel Tao 9969803f6c exp/template/html: differentiate URL-valued attributes (such as href)
from others (such as title) during escaping.

R=r, mikesamuel, dsymonds
CC=golang-dev
https://golang.org/cl/4919042
2011-08-23 13:22:26 +10:00
Mike Samuel e4a89d7cca exp/template/html: defines a parse context for a subset of HTML.
This defines just enough context to distinguish HTML URI attributes
from parsed character data.

It does not affect any public module API as I thought I would get
early comment on style for defining enumerations and tables.

R=rsc, r, nigeltao, r
CC=golang-dev
https://golang.org/cl/4906043
2011-08-18 10:40:29 +10:00
Mike Samuel 7dce257ac8 exp/template/html: rework Reverse(*Template) to do naive autoescaping
Replaces the toy func Reverse(*Template) with one that implements
naive autoescaping.

Now Escape(*Template) walks a template parse tree to find all
template actions and adds the |html command to them if it is not
already present.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/4867049
2011-08-17 16:00:02 +10:00
Rob Pike a22e77e6ae template: move exp/template into template.
(Leave exp/template/html where it is for now.)

R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/4899048
2011-08-17 14:55:57 +10:00
Mike Samuel 595e9d5034 exp/template/html: New package with a toy template transformation.
func Reverse(*Template) *Template
returns a template that produces the reverse of the original
for any input.

Changes outside exp/template/html include:
- Adding a getter for a template's FuncMap so that derived templates
  can inherit function definitions.
- Exported one node factory function, newIdentifier.
  Deriving tempaltes requires constructing new nodes, but I didn't
  export all of them because I think shallow copy functions might
  be more useful for this kind of work.
- Bugfix: Template's Name() method ignores the name field so
  template.New("foo") is a nil dereference instead of "foo".

Caveats: Reverse is a toy.  It is not UTF-8 safe, and does not
preserve order of calls to funcs in FuncMap.

For context, see http://groups.google.com/group/golang-nuts/browse_thread/thread/e8bc7c771aae3f20/b1ac41dc6f609b6e?lnk=gst

R=rsc, r, nigeltao, r
CC=golang-dev
https://golang.org/cl/4808089
2011-08-12 14:34:29 +10:00