Bug 1961109
Summary: | glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Denis Volkov <dvolkov> | |
Component: | glibc | Assignee: | Arjun Shankar <ashankar> | |
Status: | CLOSED ERRATA | QA Contact: | Martin Coufal <mcoufal> | |
Severity: | high | Docs Contact: | Jacob Taylor Valdez <jvaldez> | |
Priority: | high | |||
Version: | 8.4 | CC: | alanm, ashankar, codonell, dbodnarc, dj, fweimer, jvaldez, mcoufal, mnewsome, pfrankli, sipoyare, skolosov, tulioqm | |
Target Milestone: | beta | Keywords: | Bugfix, Patch, Triaged, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glibc-2.28-201.el8 | Doc Type: | Bug Fix | |
Doc Text: |
.The mapping for the `0xBC` code point for some IBM character sets is now `U+00AF MACRON`
Previously, the `IBM256`, `IBM277`, `IBM278`, `IBM280`, `IBM284`, `IBM297`, and `IBM424` character sets encoded the `EBCDIC` code point `0xBC` as the Unicode character `U+203E OVERLINE`. As a result, when using the `iconv` program provided by `glibc`, converting text in those character sets containing the `0xBC` code point failed for non-Unicode character sets such as `ISO-8859-1` because they could not encode the `U+203E OVERLINE` character.
With this update, the bug has been fixed. As a result, input in the `IBM277`, `IBM278`, `IBM280`, `IBM284`, and `IBM297` character sets can be converted to `ISO-8859-1` in all cases. For the `IBM256` and `IBM424` character sets, conversion no longer fails if the input text contains the 0xBC code point and the respective output is `U+00AF MACRON`.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2084564 (view as bug list) | Environment: | ||
Last Closed: | 2022-11-08 10:43:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2084564 |
Description
Denis Volkov
2021-05-17 09:34:17 UTC
As far as I can tell, the official mapping is for the 0xBC codepoint in IBM284 is U+203E: $ printf '\xbc' | iconv -f IBM284 -t UTF-16BE | xxd 00000000: 203e > This is based on the ICU table published here: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM284-2.1.2.ucm And I think IBM treats these ICU tables as the canonical reference nowadays. However, I note that OpenJDK in fact maps 0xbc to U+00AF, and we should at least be internally consistent in our product. IBM provides a Host Code Page Reference [1]. According to this document, in IBM284, 0xBC is mapped to IBM GCGID SM150000, which isn't documented. But according to the following IBMi 7.3 document [2] GCGID SM150000 is mapped to U+00AF. [1] https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf [2] https://www.ibm.com/docs/en/i/7.3?topic=information-mapping-locale-symbolic-names Interestingly, 0xBC is mapped to GCGID SM150000 in other code pages too, e.g. IBM500 and IBM871. The glibc ICU tables for these code pages do map 0xBC to U+00AF: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM500-2.1.2.ucm#L209 https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM871-2.1.2.ucm#L209 It seems the issue is just seen in IBM2XX "family", e.g.: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM280-2.1.2.ucm#L289 https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM297-2.1.2.ucm#L289 Uhm, and I actually fixed this for IBM273 a while back: commit 14beef7575099f6373f9a45b4656f1e3675f7372 Author: Florian Weimer <fweimer> Date: Thu Jun 14 22:34:09 2018 +0200 localedata: Make IBM273 compatible with ISO-8859-1 [BZ #23290] Reviewed-by: Carlos O'Donell <carlos> diff --git a/localedata/charmaps/IBM273 b/localedata/charmaps/IBM273 index c3f70e2a6f..4401101b50 100644 --- a/localedata/charmaps/IBM273 +++ b/localedata/charmaps/IBM273 @@ -194,7 +194,7 @@ CHARMAP <U00BE> /xb9 VULGAR FRACTION THREE QUARTERS <U00AC> /xba NOT SIGN <U007C> /xbb VERTICAL LINE -<U203E> /xbc OVERLINE +<U00AF> /xbc MACRON <U00A8> /xbd DIAERESIS <U00B4> /xbe ACUTE ACCENT <U00D7> /xbf MULTIPLICATION SIGN Thanks for jogging my memory! I think we should fix the remaining codepages this time. Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-May/126441.html Upstream commit: commit f17164bd51db31f47fbbdae826c63b6d78184c45 Author: Florian Weimer <fweimer> Date: Tue May 18 07:21:33 2021 +0200 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424 in the same way that IBM273 was updated for bug 23290. IBM256 and IBM424 still have holes after this change, so HAS_HOLES is not updated. Reviewed-by: Siddhesh Poyarekar <siddhesh> I have put an unsupported, untested build for testing purposes here: https://people.redhat.com/~fweimer/hSaS4M2B3iMN/glibc-2.28-158.el8.0.bz1961109.0/ It backports the upstream commit. Update from customer - he tested the build, it fixed the problem for them. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glibc bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7684 |