800880f382
Start work (which may not pan out) on a new sanitizer. Right now, it passes all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much of anything to ensure well-formedness. This is not an issue for Maruku-processed content, but it is a concern for <nowiki> blocks. (One solution would be to use the HTML5lib parser on <nowiki> blocks.) In any case, this baby is 3 times as fast as the HTML5lib sanitizer.
34 lines
971 B
Ruby
34 lines
971 B
Ruby
#!/usr/bin/env ruby
|
|
|
|
require File.expand_path(File.dirname(__FILE__) + '/../test_helper')
|
|
require 'sanitize'
|
|
|
|
class SanitizeTest < Test::Unit::TestCase
|
|
|
|
include Sanitize
|
|
|
|
def setup
|
|
|
|
end
|
|
|
|
def rexml_doc(string)
|
|
REXML::Document.new(
|
|
"<div xmlns='http://www.w3.org/1999/xhtml'>#{string}</div>")
|
|
end
|
|
|
|
def my_rex(string)
|
|
sanitize_rexml(rexml_doc(string)).gsub(/\A<div xmlns="http:\/\/www.w3.org\/1999\/xhtml">(.*)<\/div>\Z/m, '\1')
|
|
end
|
|
|
|
def test_sanitize_named_entities
|
|
input = '<p>Greek φ, double-struck 𝔸, numeric 𝔸 ⁗</p>'
|
|
output = "<p>Greek \317\225, double-struck \360\235\224\270, numeric \360\235\224\270 \342\201\227</p>"
|
|
output2 = "<p>Greek \317\225, double-struck \360\235\224\270, numeric 𝔸 ⁗</p>"
|
|
assert_equal(output, sanitize_xhtml(input))
|
|
assert_equal(output, sanitize_html(input))
|
|
assert_equal(output, my_rex(input))
|
|
assert_equal(output2, input.to_utf8)
|
|
end
|
|
|
|
|
|
end
|