Commit graph

175 commits

Author SHA1 Message Date
Jacques Distler ca1e8de89c Minor Cleanups
Remove a no-longer-needed function.
' -> &39;
Fix regexp for tag chunk.
2008-05-22 02:46:45 -05:00
Jacques Distler f6508de6dd Whoops!
In some circumstances, the new Sanitizer was double-escaping text nodes.
Fixed (with unit test).
2008-05-21 14:14:43 -05:00
Jacques Distler 45405fc97e New Sanitizer Goes Live
The new sanitizer seems to work well (cuts the time required
to produce the Instiki Atom feed in half). Our strategy is to
use HTML5lib for <nowiki> content, but to use the new sanitizer
for content that has been processed by Maruku (and hence is
well-formed).

The one broken unit test won't affect us (since it dealt with
very malformed HTML).
2008-05-21 02:06:31 -05:00
Jacques Distler 800880f382 Rough In New Sanitizer
Start work (which may not pan out) on a new sanitizer. Right now, it passes
all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much
of anything to ensure well-formedness. This is not an issue for Maruku-processed
content, but it is a concern for <nowiki> blocks.

(One solution would be to use the HTML5lib parser on <nowiki> blocks.)

In any case, this baby is 3 times as fast as the HTML5lib sanitizer.
2008-05-20 17:02:10 -05:00
Jacques Distler f8e74e53bd Rollback
The "optimization" of using arrays instead of regexps to
implement to_utf8 and is_utf8? (and their brethren) is 
actually no faster. Go back to the logically-clearer implementation.
2008-05-18 13:22:38 -05:00
Jacques Distler dfe22be5ff Minor tweak
This is slightly better.
2008-05-17 02:32:20 -05:00
Jacques Distler 41346bf8bd Efficiency: Entity handling
Previously, used a regexp to find and convert named entities in the content.
Now use a more efficient algorithm.
Similar tweak for converting NCRs before checking whether text is valid utf-8.
2008-05-17 01:43:11 -05:00
Jacques Distler 5ca0760f7c Efficiency: Sanitize Once
Envoke the HTML5lib Sanitizer just once (when the content is finally rendered),
rather than each time it passes through the chunk-handler.
2008-05-15 01:22:13 -05:00
Jacques Distler 6359d06ed1 Bug in Include Chunk-handler
Fix the chunk-handler for [[!include ...]] so that it behaves as expected.
2008-01-16 11:28:43 -06:00
Jacques Distler 4586614914 Misc Cleanup
Cleaned up some dependencies, and added a mime_types.yml file for Mongrel-compatibility.
2008-01-14 14:46:38 -06:00
Jacques Distler ebc409e1a0 Ensure the_content REALLY is utf-8
Our check that the the_content was valid utf-8 was rather busted.
This one works right. In particular, we needed to expand NCRs before checking.
2008-01-03 15:27:03 -06:00
Jacques Distler c8196cbe41 More Unicode Fun
From Philip Taylor (via Henri Sivonen): disallow U+fffe and U+ffff.
2008-01-01 22:00:07 -06:00
Jacques Distler 6873fc8026 Upgrade to Rails 2.0.2
Upgraded to Rails 2.0.2, except that we maintain

   vendor/rails/actionpack/lib/action_controller/routing.rb

from Rail 1.2.6 (at least for now), so that Routes don't change. We still
get to enjoy Rails's many new features.

Also fixed a bug in Chunk-handling: disable WikiWord processing in tags (for real this time).
2007-12-21 01:48:59 -06:00
Jacques Distler 0f6889e09f Fix Unicode bug
Fix Diego Restrepo's bug (see Rev 184).
Update to latest HTML5lib.
2007-12-17 03:17:43 -06:00
Jacques Distler 207fb1f7f2 New Version
Sync with Latest Instiki Trunk.
Migrate to Rails 1.2.5.
Bump version number.
2007-10-15 12:16:54 -05:00
Jacques Distler de125367b0 Update RDOC documentation.
Update the documentation for sanitize.rb, to match current behaviour.
2007-10-14 22:22:18 -05:00
Jacques Distler 1911d18f65 Performance
OK. This is a better way: define a custom TreeWalker which converts named entities to utf-8 as it goes. This avoids having to do an extra tree traversal in sanitize_rexml, AND avoids the trainwreck that is html5/inputstream.rb.
2007-10-14 21:07:46 -05:00
Jacques Distler 198d7847bd Performance
My REXML::Element.to_ncr (and REXML::Element.to_utf8) is horribly slow. For long documents, it proves more efficient to serialize to a string, apply String.to_ncr (or String.to_utf8) and then Sanitize the string.
2007-10-13 16:32:04 -05:00
Jacques Distler 5dd75d4cb0 File Upload Links
I like this a little better.
2007-10-09 23:56:55 -05:00
Jacques Distler fbdf4c5dfe Fix Broken Test
Was not picking up user-supplied alt text in [[filename|Alt text:pic]].
Fixed.
2007-10-09 11:02:44 -05:00
Jacques Distler 0eb723e125 Accessibility: Use Uploaded File Descriptions
The file upload dialog asks for a description of the image or file to be uploaded. Use this as the default alt-text for the image and as a title attribute for a file link.
2007-10-09 02:51:38 -05:00
Jacques Distler be8bb3d06d InterWeb Links
From Jason Blevins:  [[Web Name:Page Name]] or [[Web Name:Page Name|alternate label]] produce inter-Web links on the same Instiki installation.
2007-10-06 16:04:11 -05:00
Jacques Distler 3a3cfeaa9b Drop URI Chunk-handling
The URIChunk and LocalURICunk handlers were

1) Slow
2) Buggy (prone to produce ill-formed pages in edge cases)
3) Of dubious utility

So I ditched them. No auto-linked URLs, but who cares?
2007-10-05 16:25:41 -05:00
Jacques Distler 08857ebe8e Fix Markdown (non-math) Engine, Tweak Themes
More tweaks to the supplied S5 themes.
Fixed a minor regression in the non-Math Markdown engine.
2007-09-14 18:09:24 -05:00
Jacques Distler 54aada824c Use Standard PageRenderer for S5 Content
From Jason Blevins: use the standard PageRenderer class to render S5 content. This way, WikiWords (etc) are processed in S5 slideshows.
2007-09-14 10:43:03 -05:00
Jacques Distler 119ab342dc Security: Sanitize <nowiki>
Another XSS hole: the contents of <nowiki>...</nowiki> was not being sanitized.
2007-09-10 22:35:50 -05:00
Jacques Distler 9035c98dc5 Bugfix: Category listings
Fixed bug where clicking on a category link would stomp on the "All Pages" listing.
2007-09-09 23:20:06 -05:00
Jacques Distler 5b182bd228 HTML5lib Bug
Fixed a bug in the HTML5lib tokenizer (affects S5 slideshows).
Some miscellaneous code cleanup. In particular, don't bother with zapping control characters;
instead, rely on is_utf8? method to raise an exception (which we do anyway).
2007-09-06 10:40:48 -05:00
Jacques Distler 5ff1b7f6da XSS Security Fix
There  was a XSS vulnerability in the handling of categories. Now they are escaped.
2007-09-02 00:33:28 -05:00
Jacques Distler 6fd6be8fea Sanitizer Fix
Whoops! Looks like Ryan changed the API for the HTML5 sanitizer. Bad, bad, bad.
Fixed now.
2007-08-30 16:06:20 -05:00
Jacques Distler 1bc5da0053 Use XHTMLSerializer, where appropriate. 2007-07-04 18:53:03 -05:00
Jacques Distler 8ccaad85a5 Sync with latest HTML5lib and latest Maruku 2007-07-04 17:36:59 -05:00
Jacques Distler 3de374d6c1 More fixes, sync with HTML5lib
Do a better job with the wrapper <div>s added by xhtmldiff and Maruku's to_html_tree method.
More tests fixed.
2007-06-13 23:05:15 -05:00
Jacques Distler 3ca33e52b5 Cleanup
Got rid of redcloth_for_tex.
Fixed almost all the busted tests.
2007-06-13 01:56:44 -05:00
Jacques Distler 2da672ec5b Many Minor Fixes
Fixed a whole bunch of minor stuff.
Had a go at getting some of the plethora of broken tests to pass.
2007-06-12 17:37:55 -05:00
Jacques Distler a68d1aa8f3 Sanitizer API documentation now online
See:
   http://golem.ph.utexas.edu/~distler/code/rdoc/sanitize/
2007-06-08 23:51:30 -05:00
Jacques Distler f818238dd3 Consolidation
Shuffled around a couple of files.
2007-06-08 22:39:37 -05:00
Jacques Distler 3bf560c3b3 Updated to Latest HTML5lib
Synced with latest HTML5lib.
Added some RDoc-compatible documentation to the sanitizer.
2007-06-08 17:26:00 -05:00
Jacques Distler 8badd0766a Enhancements to sanitize.rb
Options, options, ... options.
2007-06-08 01:23:09 -05:00
Jacques Distler 0298868573 Fix S5 Unicode
Make sure sanitize_xhtml and sanitize_html are set to utf-8 encoding.
Also, a stylesheet tweak.
2007-06-07 17:30:42 -05:00
Jacques Distler e1acebe6e4 Bugfix
Me stoopid.
2007-06-05 18:06:26 -05:00
Jacques Distler f0cf0ec625 Sanitize REML trees
OK. Enabled sanitization of rexml trees instead of strings.
My timing tests seem to be erratic. Can't tell whether this is really faster.
2007-06-05 17:13:44 -05:00
Jacques Distler bd8ba1f4b1 REXML Trees
Synced with latest HTML5lib.
Added preliminary support (currently disabled) for sanitizing REXML trees.
2007-06-05 16:34:49 -05:00
Jacques Distler 4dd70af5ae HTML5lib is Back.
Synced with latest version of HTML5lib, which fixes problem with Astral plane characters.
I should really do some tests, but the HTML5lib Sanitizer seems to be 2-5 times slower than the old sanitizer.
2007-05-30 10:45:52 -05:00
Jacques Distler e1a6827f1f Rollback Switch to HTML5lib
Apparently, HTML5lib does not handle astral plane unicode characters correctly.
Which makes it useless.
Return to the previous sanitizer.
2007-05-29 23:57:39 -05:00
Jacques Distler 6b21ac484f HTML5lib Sanitizer
Replaced native Sanitizer with HTML5lib version.
Synced with latest Maruku.
2007-05-25 20:52:27 -05:00
Jacques Distler b0e063451f Sanitize Tweak
Add 'cite' to the list of attributes whose values are URI's.
2007-04-28 02:09:21 -05:00
Jacques Distler 9b55a75570 More SVG Elements and Attributes
Added <tspan> and <marker>, as well as a slew of related SVG attributes.
Also an SVG-related stylesheet tweak
2007-04-27 21:52:29 -05:00
Jacques Distler 6ca6525ff7 Add another SVG attribute to Sanitize.
Add 'stroke-opacity' to list of allowed SVG attributes.
2007-04-20 16:09:55 -05:00
Jacques Distler 0db06a9fa3 To be really XML-safe, don't emit XHTML+MathML named entities. (Ported MathML::Entities to Ruby.) 2007-03-29 03:30:10 -05:00
Jacques Distler 7adac51d6d Sync with latest Instiki trunk. Changes:
1) Upgrade Rails to 1.2.3
2) Revert RedCloth to previous version (who %#$@ cares?)
3) Preserve the Rails Security fix  to vendor/rails/actionpack/lib/action_controller/caching.rb from Revision 80.
2007-03-18 11:56:12 -05:00
Jacques Distler d74116dc67 Ensure that input is bona fide utf-8. 2007-03-07 21:06:39 -06:00
Jacques Distler f208d50032 Bah! 2007-02-24 23:07:25 -06:00
Jacques Distler 507a17aade More lenient URI scheme matching in sanitize. 2007-02-24 22:47:31 -06:00
Jacques Distler f9dcfa5af0 Make list of attributes whose values are scanned for acceptable URI schemes customizable. 2007-02-24 11:55:40 -06:00
Jacques Distler d8e06f6db9 Sanitize URI schemes. 2007-02-23 13:34:58 -06:00
Jacques Distler e179508377 Sanitization now preserves case-sensitive element and attribute names (necessary to support SVG).
Unit tests, galore.
2007-02-23 11:32:06 -06:00
Jacques Distler 2fa1e08c96 Tweak dependencies of sanitize.rb 2007-02-22 01:16:18 -06:00
Jacques Distler bacae2c468 Finally! XSS-protection, done right.
If you want something done right, ...
2007-02-22 01:06:53 -06:00
Jacques Distler 0aafedb2df More XSS fixes.
Started fixing file uploads.
2007-02-21 12:10:47 -06:00
Jacques Distler 88c6f27e14 Bah! *Someone* will care about those other Text-filters. 2007-02-20 08:18:48 -06:00
Jacques Distler e727507ac8 Zap gremlins.
Close cross-site scripting hole.
2007-02-19 23:15:39 -06:00
Jacques Distler fc15848517 Configure equation-numbering as we like it. 2007-02-14 22:19:37 -06:00
Jacques Distler ff63e894b2 Sync with latest Maruku.
Finally able to ditch BlueCloth completely.
2007-02-14 20:32:24 -06:00
Jacques Distler d4b947462b Whoops! Missed one. 2007-02-10 23:17:16 -06:00
Jacques Distler 63e217bcfd Moved Maruku (and its dependencies) and XHTMLDiff (and its dependencies) to vendor/plugins/ .
Synced with Instiki SVN.
2007-02-10 23:03:15 -06:00
Jacques Distler 0ac586ee25 Sync with latest Maruku. 2007-02-04 19:36:33 -06:00
Jacques Distler 8c52f28864 Replaced diff.rb with xhtmldiff.rb, which (unlike its predecessor) produces well-formed redline documents. 2007-02-03 22:52:48 -06:00
Jacques Distler 86e9c70a26 Fix regression in Maruku. 2007-02-02 01:00:02 -06:00
Jacques Distler f406318168 Sync with Maruku. 2007-01-24 17:14:50 -06:00
Jacques Distler 488dd334f7 Support for IE+MathPlayer.
Sync with latest Maruku.
2007-01-24 10:53:10 -06:00
Jacques Distler 1c05a94d1b Updated to latest Maruku. 2007-01-23 09:26:45 -06:00
Jacques Distler ceb0931bb3 Sync to lastest Maruku. Tweak to CSS stylesheet. 2007-01-22 11:34:51 -06:00
Jacques Distler b19e1e4f47 Bring up to current. 2007-01-22 08:36:51 -06:00
Jacques Distler 69b62b6f33 Checkout of Instiki Trunk 1/21/2007. 2007-01-22 07:43:50 -06:00