Better algorithm to deal with encodings. Moved fallback rescue message from view to encode library.

This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
This commit is contained in:
Gabriel Mazetto 2012-05-26 20:15:06 -03:00
parent 48a36851e6
commit 50c2c16a4d
2 changed files with 7 additions and 4 deletions

View file

@ -8,7 +8,7 @@
%strong.cgray= commit.author_name %strong.cgray= commit.author_name
– –
= image_tag gravatar_icon(commit.author_email), :class => "avatar", :width => 16 = image_tag gravatar_icon(commit.author_email), :class => "avatar", :width => 16
%span.row_title= truncate(commit.safe_message, :length => 50) rescue "--broken encoding" %span.row_title= truncate(commit.safe_message, :length => 50)
%span.right.cgray %span.right.cgray
= time_ago_in_words(commit.committed_date) = time_ago_in_words(commit.committed_date)

View file

@ -8,16 +8,19 @@ module Gitlabhq
def utf8 message def utf8 message
return nil unless message return nil unless message
encoding = detect_encoding(message) detect = CharlockHolmes::EncodingDetector.detect(message) rescue {}
if encoding
# It's better to default to UTF-8 as sometimes it's wrongly detected as another charset
if detect[:encoding] && detect[:confidence] == 100
CharlockHolmes::Converter.convert(message, encoding, 'UTF-8') CharlockHolmes::Converter.convert(message, encoding, 'UTF-8')
else else
message message
end.force_encoding("utf-8") end.force_encoding("utf-8")
# Prevent app from crash cause of # Prevent app from crash cause of
# encoding errors # encoding errors
rescue rescue
"" "--broken encoding: #{encoding}"
end end
def detect_encoding message def detect_encoding message