Bug 570979 - UTF8 PO files not being read as UTF8
Summary: UTF8 PO files not being read as UTF8
Alias: None
Product: Fedora
Classification: Fedora
Component: perl-Locale-PO
Version: 12
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Iain Arnell
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-03-06 03:41 UTC by Jeff Fearn ๐Ÿž
Modified: 2013-02-26 02:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2013-02-26 02:53:04 UTC
Type: ---

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
CPAN 54064 0 None None None Never

Description Jeff Fearn ๐Ÿž 2010-03-06 03:41:06 UTC
Description of problem:
Publican is using Locale::PO to load UTF8 PO files and compares strings against those from UTF8 XML files. Because Locale::PO is not setting the encoding to UTF8 when reading the PO file wide characters cause the strings to not match.

Version-Release number of selected component (if applicable):

How reproducible:
When using wide characters in PO files.

Steps to Reproduce:
1. Create a publican book containing a wide character, e.g. โ€”
2. Translate the book to another language
3. build the translated XML
Actual results:
Strings with wide characters do not match, leading to translated content being excluded from the translated output..

Expected results:
Strings match, translators happy.

Additional info:
The patch at https://rt.cpan.org/Public/Bug/Display.html?id=54064 resolves this issue.

Comment 1 Iain Arnell 2010-03-07 09:55:35 UTC
I don't think Locale::PO is at fault here. It makes no claim to support any form of automatic encoding detection or conversion. It would appear to be the responsibility of the calling code to interpret the PO header and react accordingly.

It's also important to note that according to the gettext manual, ยง11.2.4 [1], "the msgid argument to gettext  is not subject to character set conversion. Also, when gettext does not find a translation for msgid, it returns msgid unchanged โ€“ independently of the current output character set. It is therefore recommended that all msgids be US-ASCII strings."

Maybe you can work around this limitation using the -C flag or PERL_UNICODE environment variable to persuade Locale::PO (and everything else) to read/write everything using :utf8 by default.

[1] http://www.gnu.org/software/gettext/manual/gettext.html#Charset-conversion

Comment 2 Paul Gampe 2010-03-07 22:52:19 UTC
Hi Iain, I understand your point about encoding conversion but is it not the case that by not checking for UTF-8 on import the module is in fact doing a conversion to a perl string?  

Jeff has provided a patch from upstream.

Comment 3 Iain Arnell 2010-03-08 16:07:52 UTC
Sorry, all, I'm not trying to be difficult, but I know that upstream is hesitant when it comes to changing existing behaviour (see his comments in pod regarding quoted vs. non-quoted strings), so I'm also very reluctant to introduce a patch that could easily break things for existing code that expects to get unencoded strings.

If there's no movement on this upstream, I'll happily consider a patch that extends existing behaviour. Maybe a new set of load_file/save_file methods that handle automatic detection of encoding; or an optional parameter to existing methods; or allow file handles to be passed instead of file names; or whatever.

Comment 4 Iain Arnell 2013-02-15 16:38:27 UTC
Upstream has now implemented support for loading PO files in any encoding. Builds coming soon....

Comment 5 Fedora Update System 2013-02-15 16:53:02 UTC
perl-Locale-PO-0.23-1.fc18 has been submitted as an update for Fedora 18.

Comment 6 Fedora Update System 2013-02-17 03:26:44 UTC
Package perl-Locale-PO-0.23-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing perl-Locale-PO-0.23-1.fc18'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2013-02-26 02:53:06 UTC
perl-Locale-PO-0.23-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.