Bug 982891 - Default encoding of PO files should be UTF-8 in Windows
Summary: Default encoding of PO files should be UTF-8 in Windows
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Zanata
Classification: Retired
Component: Component-Maven
Version: 3.0
Hardware: All
OS: Windows
unspecified
high
Target Milestone: ---
: ---
Assignee: Sean Flanigan
QA Contact: Zanata-QA Mailling List
URL:
Whiteboard:
: 915886 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-10 05:33 UTC by Matthew Riek
Modified: 2015-07-31 01:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-31 01:12:51 UTC
Embargoed:


Attachments (Terms of Use)
Example hack to work around the problem on windows. (4.12 KB, text/x-csrc)
2013-07-10 05:33 UTC, Matthew Riek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 748727 0 unspecified CLOSED US31 As a translator I want the appropriate character encoding for my language to be used so that the content is saved ... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 795597 0 urgent CLOSED mvn zanata:pull writes ? for Unicode characters (project-type: properties) 2021-02-22 00:41:40 UTC

Internal Links: 748727 795597

Description Matthew Riek 2013-07-10 05:33:03 UTC
Created attachment 771391 [details]
Example hack to work around the problem on windows.

Description of problem:

pull command for gettext file format pulls files in host OS default encoding which on windows platforms is not utf8.  this corrupts the language translation strings on pull.

Version-Release number of selected component (if applicable):

3.0.0

How reproducible:

100%

Steps to Reproduce:
1. on windows, push a gettext project with some nice utf8 characters (some Japanese for example)
2. on windows, pull the gettext project.

Actual results:

pulled .po file has '?' where there should have been nice Japanese characters

Expected results:

same characters that were pushed.

Additional info:

See attachment for temporary work around.  Not suggesting this be used of course (it's horrid), I have attached purely for further context.  Ideally, we could configure on the Zanata web page, or in the arguments to the pull command the desired pulled encoding.

Comment 1 Ding-Yi Chen 2013-07-10 06:32:12 UTC
Hi Matthew,

It is necessary to configure the locale of database to UTF-8.

If your database is already UTF-8, please tell us:

1. The name, version of database you used.

2. The client type (maven, python, or java) and its version.

3. Zanata server version.

4. Your windows region setting (locale) and version.


With the data you provide, we have more chance to reproduce the bug.
Regards,

Comment 2 Matthew Riek 2013-07-10 06:45:45 UTC
Thank you.

I have found out I was being mislead by running the zanata client in eclipse verses the command line.  From eclipse, things worked.  From the command line in windows it was failing.  Eclipse was setting the java encoding to UTF8.  I added:

-Dfile.encoding=UTF-8 

to my command line and things now work fine there too.  So, ignore my bug report I think.

Best regards,

Matt.

Comment 3 Sean Flanigan 2013-07-10 07:05:40 UTC
It looks like we still have a few places in the client which use the class FileWriter, which uses the platform default encoding.  We should change these to specify the encoding explicitly.  We actually write out "UTF-8" in the PO header, so the fact that we don't write UTF-8 is a bug.

Comment 4 Ding-Yi Chen 2013-07-10 07:15:37 UTC
Test items:

1. By default, the pulled po file should be in UTF-8 encoding.

2. By specifying the locale and character encoding, the output file should be in the specified encoding.

Comment 5 Ding-Yi Chen 2013-09-26 08:47:38 UTC
*** Bug 915886 has been marked as a duplicate of this bug. ***

Comment 6 Zanata Migrator 2015-07-31 01:12:51 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-356


Note You need to log in before you can comment on or make changes to this bug.