instiki

Author	SHA1	Message	Date
Jacques Distler	e0df6c8a6a	Updated Tests and Sanitizer Fixes for Revision 439	2009-09-25 15:59:43 -05:00
Jacques Distler	513b2b16c1	Better Put the "safe" XHTML sanitization in lib/santize.rb, rather than in lib/chunks/nowiki.rb. D'oh!	2008-12-01 10:29:46 -06:00
Jacques Distler	2e81ca2d30	Rails 2.2.2 Updated to Rails 2.2.2. Added a couple more Ruby 1.9 fixes, but that's pretty much at a standstill, until one gets Maruku and HTML5lib working right under Ruby 1.9.	2008-11-24 15:53:39 -06:00
Jacques Distler	800880f382	Rough In New Sanitizer Start work (which may not pan out) on a new sanitizer. Right now, it passes all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much of anything to ensure well-formedness. This is not an issue for Maruku-processed content, but it is a concern for <nowiki> blocks. (One solution would be to use the HTML5lib parser on <nowiki> blocks.) In any case, this baby is 3 times as fast as the HTML5lib sanitizer.	2008-05-20 17:02:10 -05:00
Jacques Distler	f8e74e53bd	Rollback The "optimization" of using arrays instead of regexps to implement to_utf8 and is_utf8? (and their brethren) is actually no faster. Go back to the logically-clearer implementation.	2008-05-18 13:22:38 -05:00
Jacques Distler	dfe22be5ff	Minor tweak This is slightly better.	2008-05-17 02:32:20 -05:00
Jacques Distler	41346bf8bd	Efficiency: Entity handling Previously, used a regexp to find and convert named entities in the content. Now use a more efficient algorithm. Similar tweak for converting NCRs before checking whether text is valid utf-8.	2008-05-17 01:43:11 -05:00
Jacques Distler	ebc409e1a0	Ensure the_content REALLY is utf-8 Our check that the the_content was valid utf-8 was rather busted. This one works right. In particular, we needed to expand NCRs before checking.	2008-01-03 15:27:03 -06:00
Jacques Distler	c8196cbe41	More Unicode Fun From Philip Taylor (via Henri Sivonen): disallow U+fffe and U+ffff.	2008-01-01 22:00:07 -06:00
Jacques Distler	de125367b0	Update RDOC documentation. Update the documentation for sanitize.rb, to match current behaviour.	2007-10-14 22:22:18 -05:00
Jacques Distler	1911d18f65	Performance OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.	2007-10-14 21:07:46 -05:00
Jacques Distler	198d7847bd	Performance My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.	2007-10-13 16:32:04 -05:00
Jacques Distler	6fd6be8fea	Sanitizer Fix Whoops! Looks like Ryan changed the API for the HTML5 sanitizer. Bad, bad, bad. Fixed now.	2007-08-30 16:06:20 -05:00
Jacques Distler	1bc5da0053	Use XHTMLSerializer, where appropriate.	2007-07-04 18:53:03 -05:00
Jacques Distler	8ccaad85a5	Sync with latest HTML5lib and latest Maruku	2007-07-04 17:36:59 -05:00
Jacques Distler	2da672ec5b	Many Minor Fixes Fixed a whole bunch of minor stuff. Had a go at getting some of the plethora of broken tests to pass.	2007-06-12 17:37:55 -05:00
Jacques Distler	a68d1aa8f3	Sanitizer API documentation now online See: http://golem.ph.utexas.edu/~distler/code/rdoc/sanitize/	2007-06-08 23:51:30 -05:00
Jacques Distler	f818238dd3	Consolidation Shuffled around a couple of files.	2007-06-08 22:39:37 -05:00
Jacques Distler	3bf560c3b3	Updated to Latest HTML5lib Synced with latest HTML5lib. Added some RDoc-compatible documentation to the sanitizer.	2007-06-08 17:26:00 -05:00
Jacques Distler	8badd0766a	Enhancements to sanitize.rb Options, options, ... options.	2007-06-08 01:23:09 -05:00
Jacques Distler	0298868573	Fix S5 Unicode Make sure sanitize_xhtml and sanitize_html are set to utf-8 encoding. Also, a stylesheet tweak.	2007-06-07 17:30:42 -05:00
Jacques Distler	e1acebe6e4	Bugfix Me stoopid.	2007-06-05 18:06:26 -05:00
Jacques Distler	bd8ba1f4b1	REXML Trees Synced with latest HTML5lib. Added preliminary support (currently disabled) for sanitizing REXML trees.	2007-06-05 16:34:49 -05:00
Jacques Distler	4dd70af5ae	HTML5lib is Back. Synced with latest version of HTML5lib, which fixes problem with Astral plane characters. I should really do some tests, but the HTML5lib Sanitizer seems to be 2-5 times slower than the old sanitizer.	2007-05-30 10:45:52 -05:00
Jacques Distler	e1a6827f1f	Rollback Switch to HTML5lib Apparently, HTML5lib does not handle astral plane unicode characters correctly. Which makes it useless. Return to the previous sanitizer.	2007-05-29 23:57:39 -05:00
Jacques Distler	6b21ac484f	HTML5lib Sanitizer Replaced native Sanitizer with HTML5lib version. Synced with latest Maruku.	2007-05-25 20:52:27 -05:00
Jacques Distler	b0e063451f	Sanitize Tweak Add 'cite' to the list of attributes whose values are URI's.	2007-04-28 02:09:21 -05:00
Jacques Distler	9b55a75570	More SVG Elements and Attributes Added <tspan> and <marker>, as well as a slew of related SVG attributes. Also an SVG-related stylesheet tweak	2007-04-27 21:52:29 -05:00
Jacques Distler	6ca6525ff7	Add another SVG attribute to Sanitize. Add 'stroke-opacity' to list of allowed SVG attributes.	2007-04-20 16:09:55 -05:00
Jacques Distler	f208d50032	Bah!	2007-02-24 23:07:25 -06:00
Jacques Distler	507a17aade	More lenient URI scheme matching in sanitize.	2007-02-24 22:47:31 -06:00
Jacques Distler	f9dcfa5af0	Make list of attributes whose values are scanned for acceptable URI schemes customizable.	2007-02-24 11:55:40 -06:00
Jacques Distler	d8e06f6db9	Sanitize URI schemes.	2007-02-23 13:34:58 -06:00
Jacques Distler	e179508377	Sanitization now preserves case-sensitive element and attribute names (necessary to support SVG). Unit tests, galore.	2007-02-23 11:32:06 -06:00
Jacques Distler	2fa1e08c96	Tweak dependencies of sanitize.rb	2007-02-22 01:16:18 -06:00
Jacques Distler	bacae2c468	Finally! XSS-protection, done right. If you want something done right, ...	2007-02-22 01:06:53 -06:00

36 commits