Jacques Distler
ebc409e1a0
Ensure the_content REALLY is utf-8
...
Our check that the the_content was valid utf-8 was rather busted.
This one works right. In particular, we needed to expand NCRs before checking.
2008-01-03 15:27:03 -06:00
Jacques Distler
c8196cbe41
More Unicode Fun
...
From Philip Taylor (via Henri Sivonen): disallow U+fffe and U+ffff.
2008-01-01 22:00:07 -06:00
Jacques Distler
de125367b0
Update RDOC documentation.
...
Update the documentation for sanitize.rb, to match current behaviour.
2007-10-14 22:22:18 -05:00
Jacques Distler
1911d18f65
Performance
...
OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.
2007-10-14 21:07:46 -05:00
Jacques Distler
198d7847bd
Performance
...
My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.
2007-10-13 16:32:04 -05:00
Jacques Distler
6fd6be8fea
Sanitizer Fix
...
Whoops! Looks like Ryan changed the API for the HTML5 sanitizer. Bad, bad, bad.
Fixed now.
2007-08-30 16:06:20 -05:00
Jacques Distler
1bc5da0053
Use XHTMLSerializer, where appropriate.
2007-07-04 18:53:03 -05:00
Jacques Distler
8ccaad85a5
Sync with latest HTML5lib and latest Maruku
2007-07-04 17:36:59 -05:00
Jacques Distler
2da672ec5b
Many Minor Fixes
...
Fixed a whole bunch of minor stuff.
Had a go at getting some of the plethora of broken tests to pass.
2007-06-12 17:37:55 -05:00
Jacques Distler
a68d1aa8f3
Sanitizer API documentation now online
...
See:
http://golem.ph.utexas.edu/~distler/code/rdoc/sanitize/
2007-06-08 23:51:30 -05:00
Jacques Distler
f818238dd3
Consolidation
...
Shuffled around a couple of files.
2007-06-08 22:39:37 -05:00
Jacques Distler
3bf560c3b3
Updated to Latest HTML5lib
...
Synced with latest HTML5lib.
Added some RDoc-compatible documentation to the sanitizer.
2007-06-08 17:26:00 -05:00
Jacques Distler
8badd0766a
Enhancements to sanitize.rb
...
Options, options, ... options.
2007-06-08 01:23:09 -05:00
Jacques Distler
0298868573
Fix S5 Unicode
...
Make sure sanitize_xhtml and sanitize_html are set to utf-8 encoding.
Also, a stylesheet tweak.
2007-06-07 17:30:42 -05:00
Jacques Distler
e1acebe6e4
Bugfix
...
Me stoopid.
2007-06-05 18:06:26 -05:00
Jacques Distler
bd8ba1f4b1
REXML Trees
...
Synced with latest HTML5lib.
Added preliminary support (currently disabled) for sanitizing REXML trees.
2007-06-05 16:34:49 -05:00
Jacques Distler
4dd70af5ae
HTML5lib is Back.
...
Synced with latest version of HTML5lib, which fixes problem with Astral plane characters.
I should really do some tests, but the HTML5lib Sanitizer seems to be 2-5 times slower than the old sanitizer.
2007-05-30 10:45:52 -05:00
Jacques Distler
e1a6827f1f
Rollback Switch to HTML5lib
...
Apparently, HTML5lib does not handle astral plane unicode characters correctly.
Which makes it useless.
Return to the previous sanitizer.
2007-05-29 23:57:39 -05:00
Jacques Distler
6b21ac484f
HTML5lib Sanitizer
...
Replaced native Sanitizer with HTML5lib version.
Synced with latest Maruku.
2007-05-25 20:52:27 -05:00
Jacques Distler
b0e063451f
Sanitize Tweak
...
Add 'cite' to the list of attributes whose values are URI's.
2007-04-28 02:09:21 -05:00
Jacques Distler
9b55a75570
More SVG Elements and Attributes
...
Added <tspan> and <marker>, as well as a slew of related SVG attributes.
Also an SVG-related stylesheet tweak
2007-04-27 21:52:29 -05:00
Jacques Distler
6ca6525ff7
Add another SVG attribute to Sanitize.
...
Add 'stroke-opacity' to list of allowed SVG attributes.
2007-04-20 16:09:55 -05:00
Jacques Distler
f208d50032
Bah!
2007-02-24 23:07:25 -06:00
Jacques Distler
507a17aade
More lenient URI scheme matching in sanitize.
2007-02-24 22:47:31 -06:00
Jacques Distler
f9dcfa5af0
Make list of attributes whose values are scanned for acceptable URI schemes customizable.
2007-02-24 11:55:40 -06:00
Jacques Distler
d8e06f6db9
Sanitize URI schemes.
2007-02-23 13:34:58 -06:00
Jacques Distler
e179508377
Sanitization now preserves case-sensitive element and attribute names (necessary to support SVG).
...
Unit tests, galore.
2007-02-23 11:32:06 -06:00
Jacques Distler
2fa1e08c96
Tweak dependencies of sanitize.rb
2007-02-22 01:16:18 -06:00
Jacques Distler
bacae2c468
Finally! XSS-protection, done right.
...
If you want something done right, ...
2007-02-22 01:06:53 -06:00