mojomojo-1.10-2.fc21 fails to build because 'Unicode wikilinks' t/unicode.t test fails with current Encode: $ CATALYST_CONFIG=t/var/mojomojo.yml prove -l -v t/unicode.t [...] ok 6 - POST /.jsrpc/render ok 7 - basic Unicode: page content [error] Caught exception in MojoMojo::Controller::Jsrpc->render "Cannot decode string with wide characters at /usr/lib64/perl5/vendor_perl/Encode.pm line 215." not ok 8 - Unicode wikilinks # Failed test 'Unicode wikilinks' # at t/unicode.t line 52. # got: "<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Trans"... # length: 3850 # expected: "<p><span class="newWikiWord"><a title="Not found. "... # length: 133 # strings begin to differ at char 2 (line 1 column 2) ok 9 - restore original formatter # Looks like you failed 1 test of 9. $ rpm -q perl-Encode perl-Encode-2.59-1.fc21.x86_64
The test is: $test = 'Unicode wikilinks'; my $unicode_string = 'განეკუთვნება'; $content = "[[$unicode_string]]"; $mech->post('/.jsrpc/render', { content => $content }); $mech->content_is(<<"HTML", $test); <p><span class="newWikiWord"><a title="Not found. Click to create this page." href="/$unicode_string.edit">$unicode_string?</a></span></p> HTML The Encode::decode_utf8() complains on: "\x{10d2}\x{10d0}\x{10dc}\x{10d4}\x{10d9}\x{10e3}\x{10d7}\x{10d5}\x{10dc}\x{10d4}\x{10d1}\x{10d0}" $ perl -MEncode -e 'decode_utf8(qq{\x{10d2}\x{10d0}\x{10dc}\x{10d4}\x{10d9}\x{10e3}\x{10d7}\x{10d5}\x{10dc}\x{10d4}\x{10d1}\x{10d0}}, 1)' Cannot decode string with wide characters at /usr/lib64/perl5/vendor_perl/Encode.pm line 215. The \x{} notation corresponds to the 'განეკუთვნება' string. It looks like the unicode string is double-decoded, second decoding is performed on string with UTF-8 flag up instead of on bit-stream.
This bug is triggered by upgrading Encode from 2.52 to 2.53, more precisely by commit: commit ff65c71aa64c0efd285e6905ac68ba4e2cb25541 Author: Tatsuhiko Miyagawa <miyagawa> Date: Sun Aug 25 19:02:16 2013 -0700 Do not short-circuit decode_utf8 with utf8 flags diff --git a/Encode.pm b/Encode.pm index aea404a..5cee760 100644 --- a/Encode.pm +++ b/Encode.pm @@ -209,7 +209,6 @@ my $utf8enc; sub decode_utf8($;$) { my ( $octets, $check ) = @_; - return $octets if is_utf8($octets); return undef unless defined $octets; $octets .= '' if ref $octets; $check ||= 0; The former behavior was to return success on UTF-8-flagged string immediately. Now, it checks the argument is correct UTF-8 byte-string.
Reported to mojomojo upstream as <https://github.com/mojomojo/mojomojo/issues/121>.
Fedora 20 is affected too.
Created attachment 890741 [details] Prpoposed fix
mojomojo-1.10-3.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/mojomojo-1.10-3.fc20
mojomojo-1.10-3.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.