Commit graph

29 commits

Author SHA1 Message Date
Jacques Distler
ebc409e1a0 Ensure the_content REALLY is utf-8
Our check that the the_content was valid utf-8 was rather busted.
This one works right. In particular, we needed to expand NCRs before checking.
2008-01-03 15:27:03 -06:00
Jacques Distler
c8196cbe41 More Unicode Fun
From Philip Taylor (via Henri Sivonen): disallow U+fffe and U+ffff.
2008-01-01 22:00:07 -06:00
Jacques Distler
de125367b0 Update RDOC documentation.
Update the documentation for sanitize.rb, to match current behaviour.
2007-10-14 22:22:18 -05:00
Jacques Distler
1911d18f65 Performance
OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.
2007-10-14 21:07:46 -05:00
Jacques Distler
198d7847bd Performance
My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.
2007-10-13 16:32:04 -05:00
Jacques Distler
6fd6be8fea Sanitizer Fix
Whoops! Looks like Ryan changed the API for the HTML5 sanitizer. Bad, bad, bad.
Fixed now.
2007-08-30 16:06:20 -05:00
Jacques Distler
1bc5da0053 Use XHTMLSerializer, where appropriate. 2007-07-04 18:53:03 -05:00
Jacques Distler
8ccaad85a5 Sync with latest HTML5lib and latest Maruku 2007-07-04 17:36:59 -05:00
Jacques Distler
2da672ec5b Many Minor Fixes
Fixed a whole bunch of minor stuff.
Had a go at getting some of the plethora of broken tests to pass.
2007-06-12 17:37:55 -05:00
Jacques Distler
a68d1aa8f3 Sanitizer API documentation now online
See:
   http://golem.ph.utexas.edu/~distler/code/rdoc/sanitize/
2007-06-08 23:51:30 -05:00
Jacques Distler
f818238dd3 Consolidation
Shuffled around a couple of files.
2007-06-08 22:39:37 -05:00
Jacques Distler
3bf560c3b3 Updated to Latest HTML5lib
Synced with latest HTML5lib.
Added some RDoc-compatible documentation to the sanitizer.
2007-06-08 17:26:00 -05:00
Jacques Distler
8badd0766a Enhancements to sanitize.rb
Options, options, ... options.
2007-06-08 01:23:09 -05:00
Jacques Distler
0298868573 Fix S5 Unicode
Make sure sanitize_xhtml and sanitize_html are set to utf-8 encoding.
Also, a stylesheet tweak.
2007-06-07 17:30:42 -05:00
Jacques Distler
e1acebe6e4 Bugfix
Me stoopid.
2007-06-05 18:06:26 -05:00
Jacques Distler
bd8ba1f4b1 REXML Trees
Synced with latest HTML5lib.
Added preliminary support (currently disabled) for sanitizing REXML trees.
2007-06-05 16:34:49 -05:00
Jacques Distler
4dd70af5ae HTML5lib is Back.
Synced with latest version of HTML5lib, which fixes problem with Astral plane characters.
I should really do some tests, but the HTML5lib Sanitizer seems to be 2-5 times slower than the old sanitizer.
2007-05-30 10:45:52 -05:00
Jacques Distler
e1a6827f1f Rollback Switch to HTML5lib
Apparently, HTML5lib does not handle astral plane unicode characters correctly.
Which makes it useless.
Return to the previous sanitizer.
2007-05-29 23:57:39 -05:00
Jacques Distler
6b21ac484f HTML5lib Sanitizer
Replaced native Sanitizer with HTML5lib version.
Synced with latest Maruku.
2007-05-25 20:52:27 -05:00
Jacques Distler
b0e063451f Sanitize Tweak
Add 'cite' to the list of attributes whose values are URI's.
2007-04-28 02:09:21 -05:00
Jacques Distler
9b55a75570 More SVG Elements and Attributes
Added <tspan> and <marker>, as well as a slew of related SVG attributes.
Also an SVG-related stylesheet tweak
2007-04-27 21:52:29 -05:00
Jacques Distler
6ca6525ff7 Add another SVG attribute to Sanitize.
Add 'stroke-opacity' to list of allowed SVG attributes.
2007-04-20 16:09:55 -05:00
Jacques Distler
f208d50032 Bah! 2007-02-24 23:07:25 -06:00
Jacques Distler
507a17aade More lenient URI scheme matching in sanitize. 2007-02-24 22:47:31 -06:00
Jacques Distler
f9dcfa5af0 Make list of attributes whose values are scanned for acceptable URI schemes customizable. 2007-02-24 11:55:40 -06:00
Jacques Distler
d8e06f6db9 Sanitize URI schemes. 2007-02-23 13:34:58 -06:00
Jacques Distler
e179508377 Sanitization now preserves case-sensitive element and attribute names (necessary to support SVG).
Unit tests, galore.
2007-02-23 11:32:06 -06:00
Jacques Distler
2fa1e08c96 Tweak dependencies of sanitize.rb 2007-02-22 01:16:18 -06:00
Jacques Distler
bacae2c468 Finally! XSS-protection, done right.
If you want something done right, ...
2007-02-22 01:06:53 -06:00