New Sanitizer Goes Live

The new sanitizer seems to work well (cuts the time required to produce the Instiki Atom feed in half). Our strategy is to use HTML5lib for <nowiki> content, but to use the new sanitizer for content that has been processed by Maruku (and hence is well-formed). The one broken unit test won't affect us (since it dealt with very malformed HTML).
2008-05-21 02:06:31 -05:00 · 2008-05-21 02:06:31 -05:00 · 45405fc97e
commit 45405fc97e
parent 800880f382
8 changed files with 24 additions and 16 deletions
--- a/lib/sanitizer.rb
+++ b/lib/sanitizer.rb
@ -120,7 +120,7 @@ module Sanitizer
      #    => &lt;script> do_nasty_stuff() &lt;/script>
      #   sanitize_html('<a href="javascript: sucker();">Click here for $100</a>')
      #    => <a>Click here for $100</a>
-      def sanitize_xhtml(html)
+      def xhtml_sanitize(html)
        if html.index("<")
          tokenizer = HTML::Tokenizer.new(html.to_utf8)
          new_text = ""
@ -149,7 +149,7 @@ module Sanitizer
                    end
                    node.attributes.each do |attr,val|
                      if String === val
-                         node.attributes[attr] = CGI.escapeHTML(val.unescapeHTML)
+                         node.attributes[attr] = CGI.escapeHTML(CGI.unescapeHTML(val))
                      else
                        node.attributes.delete attr
                      end
@ -160,7 +160,7 @@ module Sanitizer
                  node.to_s.gsub(/</, "&lt;").gsub(/>/, "&gt;")
                end
              else
-                CGI.escapeHTML(node.to_s.unescapeHTML)
+                node.to_s.unescapeHTML.escapeHTML
            end
          end