Commit graph

172 commits

Author SHA1 Message Date
Jacques Distler 800880f382 Rough In New Sanitizer
Start work (which may not pan out) on a new sanitizer. Right now, it passes
all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much
of anything to ensure well-formedness. This is not an issue for Maruku-processed
content, but it is a concern for <nowiki> blocks.

(One solution would be to use the HTML5lib parser on <nowiki> blocks.)

In any case, this baby is 3 times as fast as the HTML5lib sanitizer.
2008-05-20 17:02:10 -05:00
Jacques Distler f8e74e53bd Rollback
The "optimization" of using arrays instead of regexps to
implement to_utf8 and is_utf8? (and their brethren) is 
actually no faster. Go back to the logically-clearer implementation.
2008-05-18 13:22:38 -05:00
Jacques Distler dfe22be5ff Minor tweak
This is slightly better.
2008-05-17 02:32:20 -05:00
Jacques Distler 41346bf8bd Efficiency: Entity handling
Previously, used a regexp to find and convert named entities in the content.
Now use a more efficient algorithm.
Similar tweak for converting NCRs before checking whether text is valid utf-8.
2008-05-17 01:43:11 -05:00
Jacques Distler 5ca0760f7c Efficiency: Sanitize Once
Envoke the HTML5lib Sanitizer just once (when the content is finally rendered),
rather than each time it passes through the chunk-handler.
2008-05-15 01:22:13 -05:00
Jacques Distler 6359d06ed1 Bug in Include Chunk-handler
Fix the chunk-handler for [[!include ...]] so that it behaves as expected.
2008-01-16 11:28:43 -06:00
Jacques Distler 4586614914 Misc Cleanup
Cleaned up some dependencies, and added a mime_types.yml file for Mongrel-compatibility.
2008-01-14 14:46:38 -06:00
Jacques Distler ebc409e1a0 Ensure the_content REALLY is utf-8
Our check that the the_content was valid utf-8 was rather busted.
This one works right. In particular, we needed to expand NCRs before checking.
2008-01-03 15:27:03 -06:00
Jacques Distler c8196cbe41 More Unicode Fun
From Philip Taylor (via Henri Sivonen): disallow U+fffe and U+ffff.
2008-01-01 22:00:07 -06:00
Jacques Distler 6873fc8026 Upgrade to Rails 2.0.2
Upgraded to Rails 2.0.2, except that we maintain

   vendor/rails/actionpack/lib/action_controller/routing.rb

from Rail 1.2.6 (at least for now), so that Routes don't change. We still
get to enjoy Rails's many new features.

Also fixed a bug in Chunk-handling: disable WikiWord processing in tags (for real this time).
2007-12-21 01:48:59 -06:00
Jacques Distler 0f6889e09f Fix Unicode bug
Fix Diego Restrepo's bug (see Rev 184).
Update to latest HTML5lib.
2007-12-17 03:17:43 -06:00
Jacques Distler 207fb1f7f2 New Version
Sync with Latest Instiki Trunk.
Migrate to Rails 1.2.5.
Bump version number.
2007-10-15 12:16:54 -05:00
Jacques Distler de125367b0 Update RDOC documentation.
Update the documentation for sanitize.rb, to match current behaviour.
2007-10-14 22:22:18 -05:00
Jacques Distler 1911d18f65 Performance
OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.
2007-10-14 21:07:46 -05:00
Jacques Distler 198d7847bd Performance
My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.
2007-10-13 16:32:04 -05:00
Jacques Distler 5dd75d4cb0 File Upload Links
I like this a little better.
2007-10-09 23:56:55 -05:00
Jacques Distler fbdf4c5dfe Fix Broken Test
Was not picking up user-supplied alt text in [[filename|Alt text:pic]].
Fixed.
2007-10-09 11:02:44 -05:00
Jacques Distler 0eb723e125 Accessibility: Use Uploaded File Descriptions
The file upload dialog asks for a description of the image or file to be uploaded. Use this as the default alt-text for the image and as a title attribute for a file link.
2007-10-09 02:51:38 -05:00
Jacques Distler be8bb3d06d InterWeb Links
From Jason Blevins:  [[Web Name:Page Name]] or [[Web Name:Page Name|alternate label]] produce inter-Web links on the same Instiki installation.
2007-10-06 16:04:11 -05:00
Jacques Distler 3a3cfeaa9b Drop URI Chunk-handling
The URIChunk and LocalURICunk handlers were

1) Slow
2) Buggy (prone to produce ill-formed pages in edge cases)
3) Of dubious utility

So I ditched them. No auto-linked URLs, but who cares?
2007-10-05 16:25:41 -05:00
Jacques Distler 08857ebe8e Fix Markdown (non-math) Engine, Tweak Themes
More tweaks to the supplied S5 themes.
Fixed a minor regression in the non-Math Markdown engine.
2007-09-14 18:09:24 -05:00
Jacques Distler 54aada824c Use Standard PageRenderer for S5 Content
From Jason Blevins: use the standard PageRenderer class to render S5 content. This way, WikiWords (etc) are processed in S5 slideshows.
2007-09-14 10:43:03 -05:00
Jacques Distler 119ab342dc Security: Sanitize <nowiki>
Another XSS hole: the contents of <nowiki>...</nowiki> was not being sanitized.
2007-09-10 22:35:50 -05:00
Jacques Distler 9035c98dc5 Bugfix: Category listings
Fixed bug where clicking on a category link would stomp on the "All Pages" listing.
2007-09-09 23:20:06 -05:00
Jacques Distler 5b182bd228 HTML5lib Bug
Fixed a bug in the HTML5lib tokenizer (affects S5 slideshows).
Some miscellaneous code cleanup. In particular, don't bother with zapping control characters;
instead, rely on is_utf8? method to raise an exception (which we do anyway).
2007-09-06 10:40:48 -05:00
Jacques Distler 5ff1b7f6da XSS Security Fix
There  was a XSS vulnerability in the handling of categories. Now they are escaped.
2007-09-02 00:33:28 -05:00
Jacques Distler 6fd6be8fea Sanitizer Fix
Whoops! Looks like Ryan changed the API for the HTML5 sanitizer. Bad, bad, bad.
Fixed now.
2007-08-30 16:06:20 -05:00
Jacques Distler 1bc5da0053 Use XHTMLSerializer, where appropriate. 2007-07-04 18:53:03 -05:00
Jacques Distler 8ccaad85a5 Sync with latest HTML5lib and latest Maruku 2007-07-04 17:36:59 -05:00
Jacques Distler 3de374d6c1 More fixes, sync with HTML5lib
Do a better job with the wrapper <div>s added by xhtmldiff and Maruku's to_html_tree method.
More tests fixed.
2007-06-13 23:05:15 -05:00
Jacques Distler 3ca33e52b5 Cleanup
Got rid of redcloth_for_tex.
Fixed almost all the busted tests.
2007-06-13 01:56:44 -05:00
Jacques Distler 2da672ec5b Many Minor Fixes
Fixed a whole bunch of minor stuff.
Had a go at getting some of the plethora of broken tests to pass.
2007-06-12 17:37:55 -05:00
Jacques Distler a68d1aa8f3 Sanitizer API documentation now online
See:
   http://golem.ph.utexas.edu/~distler/code/rdoc/sanitize/
2007-06-08 23:51:30 -05:00
Jacques Distler f818238dd3 Consolidation
Shuffled around a couple of files.
2007-06-08 22:39:37 -05:00
Jacques Distler 3bf560c3b3 Updated to Latest HTML5lib
Synced with latest HTML5lib.
Added some RDoc-compatible documentation to the sanitizer.
2007-06-08 17:26:00 -05:00
Jacques Distler 8badd0766a Enhancements to sanitize.rb
Options, options, ... options.
2007-06-08 01:23:09 -05:00
Jacques Distler 0298868573 Fix S5 Unicode
Make sure sanitize_xhtml and sanitize_html are set to utf-8 encoding.
Also, a stylesheet tweak.
2007-06-07 17:30:42 -05:00
Jacques Distler e1acebe6e4 Bugfix
Me stoopid.
2007-06-05 18:06:26 -05:00
Jacques Distler f0cf0ec625 Sanitize REML trees
OK. Enabled sanitization of rexml trees instead of strings.
My timing tests seem to be erratic. Can't tell whether this is really faster.
2007-06-05 17:13:44 -05:00
Jacques Distler bd8ba1f4b1 REXML Trees
Synced with latest HTML5lib.
Added preliminary support (currently disabled) for sanitizing REXML trees.
2007-06-05 16:34:49 -05:00
Jacques Distler 4dd70af5ae HTML5lib is Back.
Synced with latest version of HTML5lib, which fixes problem with Astral plane characters.
I should really do some tests, but the HTML5lib Sanitizer seems to be 2-5 times slower than the old sanitizer.
2007-05-30 10:45:52 -05:00
Jacques Distler e1a6827f1f Rollback Switch to HTML5lib
Apparently, HTML5lib does not handle astral plane unicode characters correctly.
Which makes it useless.
Return to the previous sanitizer.
2007-05-29 23:57:39 -05:00
Jacques Distler 6b21ac484f HTML5lib Sanitizer
Replaced native Sanitizer with HTML5lib version.
Synced with latest Maruku.
2007-05-25 20:52:27 -05:00
Jacques Distler b0e063451f Sanitize Tweak
Add 'cite' to the list of attributes whose values are URI's.
2007-04-28 02:09:21 -05:00
Jacques Distler 9b55a75570 More SVG Elements and Attributes
Added <tspan> and <marker>, as well as a slew of related SVG attributes.
Also an SVG-related stylesheet tweak
2007-04-27 21:52:29 -05:00
Jacques Distler 6ca6525ff7 Add another SVG attribute to Sanitize.
Add 'stroke-opacity' to list of allowed SVG attributes.
2007-04-20 16:09:55 -05:00
Jacques Distler 0db06a9fa3 To be really XML-safe, don't emit XHTML+MathML named entities. (Ported MathML::Entities to Ruby.) 2007-03-29 03:30:10 -05:00
Jacques Distler 7adac51d6d Sync with latest Instiki trunk. Changes:
1) Upgrade Rails to 1.2.3
2) Revert RedCloth to previous version (who %#$@ cares?)
3) Preserve the Rails Security fix  to vendor/rails/actionpack/lib/action_controller/caching.rb from Revision 80.
2007-03-18 11:56:12 -05:00
Jacques Distler d74116dc67 Ensure that input is bona fide utf-8. 2007-03-07 21:06:39 -06:00
Jacques Distler f208d50032 Bah! 2007-02-24 23:07:25 -06:00
Jacques Distler 507a17aade More lenient URI scheme matching in sanitize. 2007-02-24 22:47:31 -06:00
Jacques Distler f9dcfa5af0 Make list of attributes whose values are scanned for acceptable URI schemes customizable. 2007-02-24 11:55:40 -06:00
Jacques Distler d8e06f6db9 Sanitize URI schemes. 2007-02-23 13:34:58 -06:00
Jacques Distler e179508377 Sanitization now preserves case-sensitive element and attribute names (necessary to support SVG).
Unit tests, galore.
2007-02-23 11:32:06 -06:00
Jacques Distler 2fa1e08c96 Tweak dependencies of sanitize.rb 2007-02-22 01:16:18 -06:00
Jacques Distler bacae2c468 Finally! XSS-protection, done right.
If you want something done right, ...
2007-02-22 01:06:53 -06:00
Jacques Distler 0aafedb2df More XSS fixes.
Started fixing file uploads.
2007-02-21 12:10:47 -06:00
Jacques Distler 88c6f27e14 Bah! *Someone* will care about those other Text-filters. 2007-02-20 08:18:48 -06:00
Jacques Distler e727507ac8 Zap gremlins.
Close cross-site scripting hole.
2007-02-19 23:15:39 -06:00
Jacques Distler fc15848517 Configure equation-numbering as we like it. 2007-02-14 22:19:37 -06:00
Jacques Distler ff63e894b2 Sync with latest Maruku.
Finally able to ditch BlueCloth completely.
2007-02-14 20:32:24 -06:00
Jacques Distler d4b947462b Whoops! Missed one. 2007-02-10 23:17:16 -06:00
Jacques Distler 63e217bcfd Moved Maruku (and its dependencies) and XHTMLDiff (and its dependencies) to vendor/plugins/ .
Synced with Instiki SVN.
2007-02-10 23:03:15 -06:00
Jacques Distler 0ac586ee25 Sync with latest Maruku. 2007-02-04 19:36:33 -06:00
Jacques Distler 8c52f28864 Replaced diff.rb with xhtmldiff.rb, which (unlike its predecessor) produces well-formed redline documents. 2007-02-03 22:52:48 -06:00
Jacques Distler 86e9c70a26 Fix regression in Maruku. 2007-02-02 01:00:02 -06:00
Jacques Distler f406318168 Sync with Maruku. 2007-01-24 17:14:50 -06:00
Jacques Distler 488dd334f7 Support for IE+MathPlayer.
Sync with latest Maruku.
2007-01-24 10:53:10 -06:00
Jacques Distler 1c05a94d1b Updated to latest Maruku. 2007-01-23 09:26:45 -06:00
Jacques Distler ceb0931bb3 Sync to lastest Maruku. Tweak to CSS stylesheet. 2007-01-22 11:34:51 -06:00
Jacques Distler b19e1e4f47 Bring up to current. 2007-01-22 08:36:51 -06:00
Jacques Distler 69b62b6f33 Checkout of Instiki Trunk 1/21/2007. 2007-01-22 07:43:50 -06:00