Previously, used a regexp to find and convert named entities in the content.
Now use a more efficient algorithm.
Similar tweak for converting NCRs before checking whether text is valid utf-8.
Sam Ruby has been doing a bang-up job fixing the bugs in REXML.
Who knows when these improvements will trickle down to vendor distributions of Ruby.
In the meantime, let's bundle the latest version of REXML with Instiki.
We check the version number of the bundled REXML against that of the System REXML, and use whichever is later.
Upgraded to Rails 2.0.2, except that we maintain
vendor/rails/actionpack/lib/action_controller/routing.rb
from Rail 1.2.6 (at least for now), so that Routes don't change. We still
get to enjoy Rails's many new features.
Also fixed a bug in Chunk-handling: disable WikiWord processing in tags (for real this time).
Create a test case for utf-8 bug reported by Diego Restrepo. Seems to be related to WikiWord chunk handling.
Add some other tests, and fix a minor bug in vendor/plugins/maruku/lib/maruku/ext/math/latex_fix.rb.
OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.
My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.
Apparently, the form_spam_protect plugin only works with HTTP POST, not GET.
Unsafe operations (save and file-upload) should be POSTs anyway.
Fixed.
Also, two broken tests fixed. Only two Unit Tests now fail: both are minor bugs in XHTMLDiff.
In preparation for adding new tests, let's fix the existing ones.
3 Unit tests and one Functional test still fail.
* Two unit tests are bugs in xhtmldiff
* One is a bug in Maruku
* A file upload functional test fails, for reasons that escape me.