Hide Forgot
Description of problem: `iconv` throws error when trying to convert macron (overscore) to EBCDIC-CP-ES (IBM284), even though it exists in char table (checked at https://www.compart.com/en/unicode/charsets/IBM284, hex 0xBC) Version-Release number of selected component (if applicable): glibc-common-2.28-127.el8_3.2.x86_64 (8.3) glibc-common-2.28-151.el8.x86_64 (nightly) How reproducible: $ echo "AF" | xxd -r -p |iconv -f iso8859-1 -t EBCDIC-CP-ES iconv: illegal input sequence at position 0 $ echo ¯|iconv -f utf8 -t EBCDIC-CP-ES iconv: illegal input sequence at position 0
As far as I can tell, the official mapping is for the 0xBC codepoint in IBM284 is U+203E: $ printf '\xbc' | iconv -f IBM284 -t UTF-16BE | xxd 00000000: 203e > This is based on the ICU table published here: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM284-2.1.2.ucm And I think IBM treats these ICU tables as the canonical reference nowadays. However, I note that OpenJDK in fact maps 0xbc to U+00AF, and we should at least be internally consistent in our product.
IBM provides a Host Code Page Reference [1]. According to this document, in IBM284, 0xBC is mapped to IBM GCGID SM150000, which isn't documented. But according to the following IBMi 7.3 document [2] GCGID SM150000 is mapped to U+00AF. [1] https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf [2] https://www.ibm.com/docs/en/i/7.3?topic=information-mapping-locale-symbolic-names
Interestingly, 0xBC is mapped to GCGID SM150000 in other code pages too, e.g. IBM500 and IBM871. The glibc ICU tables for these code pages do map 0xBC to U+00AF: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM500-2.1.2.ucm#L209 https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM871-2.1.2.ucm#L209 It seems the issue is just seen in IBM2XX "family", e.g.: https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM280-2.1.2.ucm#L289 https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM297-2.1.2.ucm#L289
Uhm, and I actually fixed this for IBM273 a while back: commit 14beef7575099f6373f9a45b4656f1e3675f7372 Author: Florian Weimer <fweimer> Date: Thu Jun 14 22:34:09 2018 +0200 localedata: Make IBM273 compatible with ISO-8859-1 [BZ #23290] Reviewed-by: Carlos O'Donell <carlos> diff --git a/localedata/charmaps/IBM273 b/localedata/charmaps/IBM273 index c3f70e2a6f..4401101b50 100644 --- a/localedata/charmaps/IBM273 +++ b/localedata/charmaps/IBM273 @@ -194,7 +194,7 @@ CHARMAP <U00BE> /xb9 VULGAR FRACTION THREE QUARTERS <U00AC> /xba NOT SIGN <U007C> /xbb VERTICAL LINE -<U203E> /xbc OVERLINE +<U00AF> /xbc MACRON <U00A8> /xbd DIAERESIS <U00B4> /xbe ACUTE ACCENT <U00D7> /xbf MULTIPLICATION SIGN Thanks for jogging my memory! I think we should fix the remaining codepages this time.
Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-May/126441.html
Upstream commit: commit f17164bd51db31f47fbbdae826c63b6d78184c45 Author: Florian Weimer <fweimer> Date: Tue May 18 07:21:33 2021 +0200 localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882] This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424 in the same way that IBM273 was updated for bug 23290. IBM256 and IBM424 still have holes after this change, so HAS_HOLES is not updated. Reviewed-by: Siddhesh Poyarekar <siddhesh> I have put an unsupported, untested build for testing purposes here: https://people.redhat.com/~fweimer/hSaS4M2B3iMN/glibc-2.28-158.el8.0.bz1961109.0/ It backports the upstream commit.
Update from customer - he tested the build, it fixed the problem for them.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glibc bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7684