Bug 982891

Summary: Default encoding of PO files should be UTF-8 in Windows
Product: [Retired] Zanata Reporter: Matthew Riek <mriek>
Component: Component-MavenAssignee: Sean Flanigan <sflaniga>
Status: CLOSED UPSTREAM QA Contact: Zanata-QA Mailling List <zanata-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.0CC: ghynxmail, mriek, sflaniga
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-31 01:12:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example hack to work around the problem on windows. none

Description Matthew Riek 2013-07-10 05:33:03 UTC
Created attachment 771391 [details]
Example hack to work around the problem on windows.

Description of problem:

pull command for gettext file format pulls files in host OS default encoding which on windows platforms is not utf8.  this corrupts the language translation strings on pull.

Version-Release number of selected component (if applicable):

3.0.0

How reproducible:

100%

Steps to Reproduce:
1. on windows, push a gettext project with some nice utf8 characters (some Japanese for example)
2. on windows, pull the gettext project.

Actual results:

pulled .po file has '?' where there should have been nice Japanese characters

Expected results:

same characters that were pushed.

Additional info:

See attachment for temporary work around.  Not suggesting this be used of course (it's horrid), I have attached purely for further context.  Ideally, we could configure on the Zanata web page, or in the arguments to the pull command the desired pulled encoding.

Comment 1 Ding-Yi Chen 2013-07-10 06:32:12 UTC
Hi Matthew,

It is necessary to configure the locale of database to UTF-8.

If your database is already UTF-8, please tell us:

1. The name, version of database you used.

2. The client type (maven, python, or java) and its version.

3. Zanata server version.

4. Your windows region setting (locale) and version.


With the data you provide, we have more chance to reproduce the bug.
Regards,

Comment 2 Matthew Riek 2013-07-10 06:45:45 UTC
Thank you.

I have found out I was being mislead by running the zanata client in eclipse verses the command line.  From eclipse, things worked.  From the command line in windows it was failing.  Eclipse was setting the java encoding to UTF8.  I added:

-Dfile.encoding=UTF-8 

to my command line and things now work fine there too.  So, ignore my bug report I think.

Best regards,

Matt.

Comment 3 Sean Flanigan 2013-07-10 07:05:40 UTC
It looks like we still have a few places in the client which use the class FileWriter, which uses the platform default encoding.  We should change these to specify the encoding explicitly.  We actually write out "UTF-8" in the PO header, so the fact that we don't write UTF-8 is a bug.

Comment 4 Ding-Yi Chen 2013-07-10 07:15:37 UTC
Test items:

1. By default, the pulled po file should be in UTF-8 encoding.

2. By specifying the locale and character encoding, the output file should be in the specified encoding.

Comment 5 Ding-Yi Chen 2013-09-26 08:47:38 UTC
*** Bug 915886 has been marked as a duplicate of this bug. ***

Comment 6 Zanata Migrator 2015-07-31 01:12:51 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-356