The new sanitizer seems to work well (cuts the time required
to produce the Instiki Atom feed in half). Our strategy is to
use HTML5lib for <nowiki> content, but to use the new sanitizer
for content that has been processed by Maruku (and hence is
well-formed).
The one broken unit test won't affect us (since it dealt with
very malformed HTML).
Start work (which may not pan out) on a new sanitizer. Right now, it passes
all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much
of anything to ensure well-formedness. This is not an issue for Maruku-processed
content, but it is a concern for <nowiki> blocks.
(One solution would be to use the HTML5lib parser on <nowiki> blocks.)
In any case, this baby is 3 times as fast as the HTML5lib sanitizer.
The "optimization" of using arrays instead of regexps to
implement to_utf8 and is_utf8? (and their brethren) is
actually no faster. Go back to the logically-clearer implementation.
Previously, used a regexp to find and convert named entities in the content.
Now use a more efficient algorithm.
Similar tweak for converting NCRs before checking whether text is valid utf-8.
Make remove_orphaned_pages work in a proxied situation.
Also, "fix" a busted functional test. I'm not happy with
this one. We're enforcing plain-text titles (which, I think,
is the correct thing to do), but sending them as type="html",
which then requires double-encoding.
The action_cache plugin is now rather superfluous (Rails has native support for ETags, for instance).
And it wasn't working right with Rails 2.0.x (pages were being cached, and 304s were being returned
as appropriate, but cached pages were not being served).
Sessions are now stored in a cookie (signed and Base-64 encoded).
Form_spam_protection stores form_keys in the session.
Make sure spambots implement both cookies and javascript, by storing hashed (with salt) keys in the session.
In each session, keep only the 30 most recent :form_keys generated by form_spam_protection.
This should be more than enough for ordinary usage, but prevents the session data from
becoming inordinately large.
Also, burnt-orange rulz!
Sam Ruby has been doing a bang-up job fixing the bugs in REXML.
Who knows when these improvements will trickle down to vendor distributions of Ruby.
In the meantime, let's bundle the latest version of REXML with Instiki.
We check the version number of the bundled REXML against that of the System REXML, and use whichever is later.