Rough In New Sanitizer

Start work (which may not pan out) on a new sanitizer. Right now, it passes
all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much
of anything to ensure well-formedness. This is not an issue for Maruku-processed
content, but it is a concern for <nowiki> blocks.

(One solution would be to use the HTML5lib parser on <nowiki> blocks.)

In any case, this baby is 3 times as fast as the HTML5lib sanitizer.
This commit is contained in:
Jacques Distler 2008-05-20 17:02:10 -05:00
parent f8e74e53bd
commit 800880f382
15 changed files with 3657 additions and 12 deletions

View file

@ -158,7 +158,7 @@ class String
#++
#:stopdoc:
MATHML_ENTITIES = {
MATHML_ENTITIES = {
'Alpha' => '&#x0391;',
'Beta' => '&#x0392;',
'Epsilon' => '&#x0395;',
@ -2279,7 +2279,7 @@ class String
'wp' => '&#x02118;',
'wr' => '&#x02240;',
'zeetrf' => '&#x02128;'
}
} unless const_defined? "MATHML_ENTITIES"
#:startdoc:
# Converts XHTML+MathML named entities in string to Numeric Character References