Fixed a bug in the HTML5lib tokenizer (affects S5 slideshows). Some miscellaneous code cleanup. In particular, don't bother with zapping control characters; instead, rely on is_utf8? method to raise an exception (which we do anyway).
Do a better job with the wrapper <div>s added by xhtmldiff and Maruku's to_html_tree method. More tests fixed.