Bug 1961109 - glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)
Summary: glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: 8.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: ---
Assignee: Arjun Shankar
QA Contact: Martin Coufal
Jacob Taylor Valdez
URL:
Whiteboard:
Depends On:
Blocks: 2084564
TreeView+ depends on / blocked
 
Reported: 2021-05-17 09:34 UTC by Denis Volkov
Modified: 2022-11-09 10:17 UTC (History)
13 users (show)

Fixed In Version: glibc-2.28-201.el8
Doc Type: Bug Fix
Doc Text:
.The mapping for the `0xBC` code point for some IBM character sets is now `U+00AF MACRON` Previously, the `IBM256`, `IBM277`, `IBM278`, `IBM280`, `IBM284`, `IBM297`, and `IBM424` character sets encoded the `EBCDIC` code point `0xBC` as the Unicode character `U+203E OVERLINE`. As a result, when using the `iconv` program provided by `glibc`, converting text in those character sets containing the `0xBC` code point failed for non-Unicode character sets such as `ISO-8859-1` because they could not encode the `U+203E OVERLINE` character. With this update, the bug has been fixed. As a result, input in the `IBM277`, `IBM278`, `IBM280`, `IBM284`, and `IBM297` character sets can be converted to `ISO-8859-1` in all cases. For the `IBM256` and `IBM424` character sets, conversion no longer fails if the input text contains the 0xBC code point and the respective output is `U+00AF MACRON`.
Clone Of:
: 2084564 (view as bug list)
Environment:
Last Closed: 2022-11-08 10:43:11 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1591268 1 None None None 2021-05-18 04:24:20 UTC
Red Hat Product Errata RHBA-2022:7684 0 None None None 2022-11-08 10:43:27 UTC
Sourceware 27882 0 P2 ASSIGNED Use U+00AF MACRON in more EBCDIC charsets 2021-08-17 21:25:09 UTC

Description Denis Volkov 2021-05-17 09:34:17 UTC
Description of problem:
`iconv` throws error when trying to convert macron (overscore) to EBCDIC-CP-ES (IBM284), even though it exists in char table (checked at https://www.compart.com/en/unicode/charsets/IBM284, hex 0xBC)

Version-Release number of selected component (if applicable):
glibc-common-2.28-127.el8_3.2.x86_64 (8.3)
glibc-common-2.28-151.el8.x86_64 (nightly)


How reproducible:

    $ echo "AF" | xxd -r -p |iconv -f iso8859-1 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

    $ echo ¯|iconv -f utf8 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

Comment 1 Florian Weimer 2021-05-17 09:44:31 UTC
As far as I can tell, the official mapping is for the 0xBC codepoint in IBM284 is U+203E:

$ printf '\xbc' | iconv -f IBM284 -t UTF-16BE | xxd
00000000: 203e                                      >

This is based on the ICU table published here:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM284-2.1.2.ucm

And I think IBM treats these ICU tables as the canonical reference nowadays. However, I note that OpenJDK in fact maps 0xbc to U+00AF, and we should at least be internally consistent in our product.

Comment 2 Tulio Magno Quites Machado Filho 2021-05-17 20:06:34 UTC
IBM provides a Host Code Page Reference [1]. According to this document, in IBM284, 0xBC
is mapped to IBM GCGID SM150000, which isn't documented.

But according to the following IBMi 7.3 document [2] GCGID SM150000 is mapped to U+00AF.

[1] https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf
[2] https://www.ibm.com/docs/en/i/7.3?topic=information-mapping-locale-symbolic-names

Comment 3 Tulio Magno Quites Machado Filho 2021-05-17 20:37:13 UTC
Interestingly, 0xBC is mapped to GCGID SM150000 in other code pages too, e.g. IBM500 and IBM871.

The glibc ICU tables for these code pages do map 0xBC to U+00AF:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM500-2.1.2.ucm#L209
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM871-2.1.2.ucm#L209

It seems the issue is just seen in IBM2XX "family", e.g.:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM280-2.1.2.ucm#L289
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM297-2.1.2.ucm#L289

Comment 4 Florian Weimer 2021-05-18 04:23:38 UTC
Uhm, and I actually fixed this for IBM273 a while back:

commit 14beef7575099f6373f9a45b4656f1e3675f7372
Author: Florian Weimer <fweimer>
Date:   Thu Jun 14 22:34:09 2018 +0200

    localedata: Make IBM273 compatible with ISO-8859-1 [BZ #23290]
    
    Reviewed-by: Carlos O'Donell <carlos>

diff --git a/localedata/charmaps/IBM273 b/localedata/charmaps/IBM273
index c3f70e2a6f..4401101b50 100644
--- a/localedata/charmaps/IBM273
+++ b/localedata/charmaps/IBM273
@@ -194,7 +194,7 @@ CHARMAP
 <U00BE>     /xb9         VULGAR FRACTION THREE QUARTERS
 <U00AC>     /xba         NOT SIGN
 <U007C>     /xbb         VERTICAL LINE
-<U203E>     /xbc         OVERLINE
+<U00AF>     /xbc         MACRON
 <U00A8>     /xbd         DIAERESIS
 <U00B4>     /xbe         ACUTE ACCENT
 <U00D7>     /xbf         MULTIPLICATION SIGN

Thanks for jogging my memory!

I think we should fix the remaining codepages this time.

Comment 5 Florian Weimer 2021-05-18 05:00:38 UTC
Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-May/126441.html

Comment 7 Florian Weimer 2021-05-27 12:13:14 UTC
Upstream commit:

commit f17164bd51db31f47fbbdae826c63b6d78184c45
Author: Florian Weimer <fweimer>
Date:   Tue May 18 07:21:33 2021 +0200

    localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882]
    
    This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424
    in the same way that IBM273 was updated for bug 23290.
    
    IBM256 and IBM424 still have holes after this change, so HAS_HOLES
    is not updated.
    
    Reviewed-by: Siddhesh Poyarekar <siddhesh>

I have put an unsupported, untested build for testing purposes here:

https://people.redhat.com/~fweimer/hSaS4M2B3iMN/glibc-2.28-158.el8.0.bz1961109.0/

It backports the upstream commit.

Comment 8 Denis Volkov 2021-06-01 08:50:44 UTC
Update from customer - he tested the build, it fixed the problem for them.

Comment 35 errata-xmlrpc 2022-11-08 10:43:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7684


Note You need to log in before you can comment on or make changes to this bug.