instiki

Author	SHA1	Message	Date
Jacques Distler	d3e79ea84a	Make truncate() Unicode-aware	2009-12-14 17:41:28 -06:00
Jacques Distler	faac8951a3	More Ruby 1.9 String Encoding Fun	2009-12-08 08:50:01 -06:00
Jacques Distler	171c12d2c1	Efficiency This version of String#purify is 12% faster, under Ruby 1.9, than before.	2009-12-05 10:50:58 -06:00
Jacques Distler	34b63a8375	Fix a Ruby 1.9 Character Encoding Bug Wow, this stuff is complicated! Some things really want to be UTF-8; others really want to be byte strings.	2009-12-01 12:03:15 -06:00
Jacques Distler	a6429f8c22	Ruby 1.9 Compatibility Completely removed the html5lib sanitizer. Fixed the string-handling to work in both Ruby 1.8.x and 1.9.2. There are still, inexplicably, two functional tests that fail. But the rest seems to work quite well.	2009-11-30 16:28:18 -06:00
Jacques Distler	371aab6f96	Sync with Latest itex2MML and MathML::Entities Support the latest changes in http://www.w3.org/TR/2009/WD-xml-entity-names-20091117/	2009-11-18 12:04:07 -06:00
Jacques Distler	e0df6c8a6a	Updated Tests and Sanitizer Fixes for Revision 439	2009-09-25 15:59:43 -05:00
Jacques Distler	b438bc64f6	Update More MathML Entity Mappings Bring up-to-date with Editor's copy of XML Entity definitions for Characters (W3C Working Draft 13 September 2009) http://www.w3.org/2003/entities/2007doc/overview.html	2009-09-25 14:34:22 -05:00
Jacques Distler	31ed55f055	Update MathML Entity Mappings Update list of XHTML+MathML named entities to match http://www.w3.org/TR/2008/WD-xml-entity-names-20080721/	2009-09-24 16:21:22 -05:00
Jacques Distler	7185af32fc	Fix an Eyesore That just looked sloppy. I blame copy/paste.	2009-09-09 15:01:25 -05:00
Jacques Distler	3ff68ef42f	Don't Expand NCRs That operation is not idempotent (among other defects). Instead, just check that the NCRs corespond to valid utf-8. (Reported by Andrew Stacey)	2009-09-09 09:16:00 -05:00
Jacques Distler	116255dc0d	Purify Categories Apply the same methodology, as in Revision 432, to the category chunk-handler. This completes the replacement of all the code that looks like if string.is_utf8? do something else complain end with code that looks like string.purify do something	2009-09-07 20:38:09 -05:00
Jacques Distler	c79fef9c01	Clean, rather than Complain Previously, if the user tried to submit content which was malformed utf-8, Instiki would complain loudly to him. A slightly more user-friendly approach was suggested by the latest Rails 2.3.4, and a conversation with Sam Ruby (who suggested some improvements). Now, instead of complaining, we remove the offending bytes, leaving a well-formed utf-8 string, which we pretend is what the user meant to submit.	2009-09-07 16:02:36 -05:00
Jacques Distler	52c1f74ecc	Add a couple of XSS tests. Some more tests from Clint Ruoho. The main branch of Instiki (and, I guess, the old sanitizer) are vulnerable. Also: under Ruby 1.8.x, CGI.unescapeHTML screws up horribly decoding NCRs which represent high-bit ASCII characters. UTF-8 agrees with 7-bit ASCII, but CGI.unescapeHTML doesn't seem to know that they disagree for i>127.	2009-01-05 16:25:27 -06:00
Jacques Distler	a503e2b8ac	Gentler Be a little gentler in recovering from Instiki::ValidationErrors, when saving a page. Previously, we threw away all the user's changes upon the redirect. Now we attempt to salvage what he wrote.	2008-12-17 00:07:21 -06:00
Jacques Distler	2e81ca2d30	Rails 2.2.2 Updated to Rails 2.2.2. Added a couple more Ruby 1.9 fixes, but that's pretty much at a standstill, until one gets Maruku and HTML5lib working right under Ruby 1.9.	2008-11-24 15:53:39 -06:00
Jacques Distler	ca1e8de89c	Minor Cleanups Remove a no-longer-needed function. ' -> &39; Fix regexp for tag chunk.	2008-05-22 02:46:45 -05:00
Jacques Distler	f6508de6dd	Whoops! In some circumstances, the new Sanitizer was double-escaping text nodes. Fixed (with unit test).	2008-05-21 14:14:43 -05:00
Jacques Distler	45405fc97e	New Sanitizer Goes Live The new sanitizer seems to work well (cuts the time required to produce the Instiki Atom feed in half). Our strategy is to use HTML5lib for <nowiki> content, but to use the new sanitizer for content that has been processed by Maruku (and hence is well-formed). The one broken unit test won't affect us (since it dealt with very malformed HTML).	2008-05-21 02:06:31 -05:00
Jacques Distler	800880f382	Rough In New Sanitizer Start work (which may not pan out) on a new sanitizer. Right now, it passes all but 1 of the HTML5lib Sanitizer's unit tests. But it doesn't do much of anything to ensure well-formedness. This is not an issue for Maruku-processed content, but it is a concern for <nowiki> blocks. (One solution would be to use the HTML5lib parser on <nowiki> blocks.) In any case, this baby is 3 times as fast as the HTML5lib sanitizer.	2008-05-20 17:02:10 -05:00

20 commits