New Sanitizer Goes Live

The new sanitizer seems to work well (cuts the time required
to produce the Instiki Atom feed in half). Our strategy is to
use HTML5lib for <nowiki> content, but to use the new sanitizer
for content that has been processed by Maruku (and hence is
well-formed).

The one broken unit test won't affect us (since it dealt with
very malformed HTML).
This commit is contained in:
Jacques Distler 2008-05-21 02:06:31 -05:00
parent 800880f382
commit 45405fc97e
8 changed files with 24 additions and 16 deletions

View file

@ -120,7 +120,7 @@ module Sanitizer
# => &lt;script> do_nasty_stuff() &lt;/script>
# sanitize_html('<a href="javascript: sucker();">Click here for $100</a>')
# => <a>Click here for $100</a>
def sanitize_xhtml(html)
def xhtml_sanitize(html)
if html.index("<")
tokenizer = HTML::Tokenizer.new(html.to_utf8)
new_text = ""
@ -149,7 +149,7 @@ module Sanitizer
end
node.attributes.each do |attr,val|
if String === val
node.attributes[attr] = CGI.escapeHTML(val.unescapeHTML)
node.attributes[attr] = CGI.escapeHTML(CGI.unescapeHTML(val))
else
node.attributes.delete attr
end
@ -160,7 +160,7 @@ module Sanitizer
node.to_s.gsub(/</, "&lt;").gsub(/>/, "&gt;")
end
else
CGI.escapeHTML(node.to_s.unescapeHTML)
node.to_s.unescapeHTML.escapeHTML
end
end