Description of problem: Publican is using Locale::PO to load UTF8 PO files and compares strings against those from UTF8 XML files. Because Locale::PO is not setting the encoding to UTF8 when reading the PO file wide characters cause the strings to not match. Version-Release number of selected component (if applicable): perl-Locale-PO-0.21-2.fc12.noarch How reproducible: When using wide characters in PO files. Steps to Reproduce: 1. Create a publican book containing a wide character, e.g. โ 2. Translate the book to another language 3. build the translated XML Actual results: Strings with wide characters do not match, leading to translated content being excluded from the translated output.. Expected results: Strings match, translators happy. Additional info: The patch at https://rt.cpan.org/Public/Bug/Display.html?id=54064 resolves this issue.
I don't think Locale::PO is at fault here. It makes no claim to support any form of automatic encoding detection or conversion. It would appear to be the responsibility of the calling code to interpret the PO header and react accordingly. It's also important to note that according to the gettext manual, ยง11.2.4 [1], "the msgid argument to gettext is not subject to character set conversion. Also, when gettext does not find a translation for msgid, it returns msgid unchanged โ independently of the current output character set. It is therefore recommended that all msgids be US-ASCII strings." Maybe you can work around this limitation using the -C flag or PERL_UNICODE environment variable to persuade Locale::PO (and everything else) to read/write everything using :utf8 by default. [1] http://www.gnu.org/software/gettext/manual/gettext.html#Charset-conversion
Hi Iain, I understand your point about encoding conversion but is it not the case that by not checking for UTF-8 on import the module is in fact doing a conversion to a perl string? Jeff has provided a patch from upstream.
Sorry, all, I'm not trying to be difficult, but I know that upstream is hesitant when it comes to changing existing behaviour (see his comments in pod regarding quoted vs. non-quoted strings), so I'm also very reluctant to introduce a patch that could easily break things for existing code that expects to get unencoded strings. If there's no movement on this upstream, I'll happily consider a patch that extends existing behaviour. Maybe a new set of load_file/save_file methods that handle automatic detection of encoding; or an optional parameter to existing methods; or allow file handles to be passed instead of file names; or whatever.
Upstream has now implemented support for loading PO files in any encoding. Builds coming soon....
perl-Locale-PO-0.23-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/perl-Locale-PO-0.23-1.fc18
Package perl-Locale-PO-0.23-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing perl-Locale-PO-0.23-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-2594/perl-Locale-PO-0.23-1.fc18 then log in and leave karma (feedback).
perl-Locale-PO-0.23-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.