Bug 1107936
Summary: | iconv and uconv gives different results when converting GB18030 encoded files | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Peng Wu <pwu> | ||||||||
Component: | icu | Assignee: | Eike Rathke <erack> | ||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 29 | CC: | codonell, denis.arnaud_fedora, erack, fweimer, jakub, law, mfabian, mnewsome, petersen, pfrankli | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-11-27 20:05:22 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Peng Wu
2014-06-11 04:52:01 UTC
Created attachment 907479 [details]
The original file
Created attachment 907480 [details]
The iconv converted file
Created attachment 907481 [details]
The uconv converted file
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The problem still exists unchanged in Fedora 22. Which one is correct? Emacs agrees with uconv. The glibc mapping has these mappings in localedata/charmaps/GB18030: % <UE78D> /xa6/xd9 <Private Use> % <UE78E> /xa6/xda <Private Use> % <UE78F> /xa6/xdb <Private Use> % <UE790> /xa6/xdc <Private Use> % <UE791> /xa6/xdd <Private Use> % <UE792> /xa6/xde <Private Use> % <UE793> /xa6/xdf <Private Use> % <UE794> /xa6/xec <Private Use> % <UE795> /xa6/xed <Private Use> % <UE796> /xa6/xf3 <Private Use> … <UFE10> /xa6/xd9 PRESENTATION FORM FOR VERTICAL COMMA <UFE11> /xa6/xdb PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA <UFE12> /xa6/xda PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP <UFE13> /xa6/xdc PRESENTATION FORM FOR VERTICAL COLON <UFE14> /xa6/xdd PRESENTATION FORM FOR VERTICAL SEMICOLON <UFE15> /xa6/xde PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK <UFE16> /xa6/xdf PRESENTATION FORM FOR VERTICAL QUESTION MARK <UFE17> /xa6/xec PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET <UFE18> /xa6/xed PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET <UFE19> /xa6/xf3 PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS Wikipedia links to this XML file, which obviously agrees with the uconv output: http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml (In reply to Florian Weimer from comment #6) > Which one is correct? Emacs agrees with uconv. As with linguistics, they are both correct :-) > The glibc mapping has these mappings in localedata/charmaps/GB18030: > > % <UE78D> /xa6/xd9 <Private Use> > % <UE78E> /xa6/xda <Private Use> > % <UE78F> /xa6/xdb <Private Use> > % <UE790> /xa6/xdc <Private Use> > % <UE791> /xa6/xdd <Private Use> > % <UE792> /xa6/xde <Private Use> > % <UE793> /xa6/xdf <Private Use> > % <UE794> /xa6/xec <Private Use> > % <UE795> /xa6/xed <Private Use> > % <UE796> /xa6/xf3 <Private Use> > … > <UFE10> /xa6/xd9 PRESENTATION FORM FOR VERTICAL COMMA > <UFE11> /xa6/xdb PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA > <UFE12> /xa6/xda PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL > STOP > <UFE13> /xa6/xdc PRESENTATION FORM FOR VERTICAL COLON > <UFE14> /xa6/xdd PRESENTATION FORM FOR VERTICAL SEMICOLON > <UFE15> /xa6/xde PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK > <UFE16> /xa6/xdf PRESENTATION FORM FOR VERTICAL QUESTION MARK > <UFE17> /xa6/xec PRESENTATION FORM FOR VERTICAL LEFT WHITE > LENTICULAR BRACKET > <UFE18> /xa6/xed PRESENTATION FORM FOR VERTICAL RIGHT WHITE > LENTICULAR BRAKCET > <UFE19> /xa6/xf3 PRESENTATION FORM FOR VERTICAL HORIZONTAL > ELLIPSIS The GB 18030-2005 standard still-uses some private-use-area (PUA) code points for some idiograms. The above non-PUA code-points (which differ from the published standard) are correct for GB 18030-2005 compliance. The PUA code points, in Unicode 4.1 or newer, can be used as non-PUA equivalents. It is highly recommended that the Unicode 4.1 code-points be used for anyone mapping GB 18030-2005 to UTF-8 and is best-practice (see note below). > Wikipedia links to this XML file, which obviously agrees with the uconv > output: > http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030- > 2000.xml This is the old standard (confirmed by verifying /xa8/xbc still maps to the old PUA <UE7C7>, fixed in GB 18030-2005), but even in the old standard the above PUA code points are defined for the idiograms. In summary: - glibc supports GB 18030-2005 and contains corrections for the most recent version. - Following best practice glibc converts those GB 18030-2005 idiograms that would have used PUA code points into their equilvalent non-PUA Unicode 4.1 code points. - uconv uses the exact PUA code points as the standard suggests and this causes the difference, and is not recommended. I recommend a bug be filed against uconv to follow best practice and use Unicode 4.1 code points to avoid the problematic PUA code points defined in the original standard. Note this is the recommended practice in "CJKV Information Processing" by Dr. Ken Lunde, who is probably the world-leading expert on the topic. Moving to icu. Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. I believe this is still not fixed in icu. This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'. This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. This bug still exists in Fedora 29. Changed version to Fedora 29. This message is a reminder that Fedora 29 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '29'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 29 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |