HTML5lib Bug

Fixed a bug in the HTML5lib tokenizer (affects S5 slideshows).
Some miscellaneous code cleanup. In particular, don't bother with zapping control characters;
instead, rely on is_utf8? method to raise an exception (which we do anyway).
This commit is contained in:
Jacques Distler 2007-09-06 10:40:48 -05:00
parent f482036683
commit 5b182bd228
6 changed files with 33 additions and 8 deletions

View file

@ -217,7 +217,7 @@ module HTML5
# This method replaces the need for "entityInAttributeValueState".
def process_entity_in_attribute
entity = consume_entity(true)
entity = consume_entity()
if entity
@current_token[:data][-1][1] += entity
else

View file

@ -405,5 +405,25 @@
"name": "xul",
"input": "<p style=\"-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss')\">fubar</p>",
"output": "<p style=''>fubar</p>"
},
{
"name": "quotes_in_attributes",
"input": "<img src='foo' title='\"foo\" bar' />",
"rexml": "<img src='foo' title='\"foo\" bar' />",
"output": "<img title='&quot;foo&quot; bar' src='foo'/>"
},
{
"name": "named_entities_in_attributes",
"input": "<img src='foo' title='&quot;foo&quot; bar' />",
"rexml": "<img src='foo' title='\"foo\" bar' />",
"output": "<img title='&quot;foo&quot; bar' src='foo'/>"
},
{
"name": "NCRs_in_attributes",
"input": "<img src='foo' title='&#x22;foo&#x22; bar' />",
"rexml": "<img src='foo' title='\"foo\" bar' />",
"output": "<img title='&quot;foo&quot; bar' src='foo'/>"
}
]