Just strip out the URI ref, leaving alternates.
Add some tests. Sync with latest HTML5lib (includes above sanitization improvements).
Fix that Tokenizer bug for real this time.
Fixed a bug in the HTML5lib tokenizer (affects S5 slideshows). Some miscellaneous code cleanup. In particular, don't bother with zapping control characters; instead, rely on is_utf8? method to raise an exception (which we do anyway).