1961109 – glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1961109 - glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)

Summary: glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	8.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	Arjun Shankar
QA Contact:	Martin Coufal
Docs Contact:	Jacob Taylor Valdez
URL:
Whiteboard:
Depends On:
Blocks:	2084564
TreeView+	depends on / blocked

Reported:	2021-05-17 09:34 UTC by Denis Volkov
Modified:	2024-10-01 18:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glibc-2.28-201.el8
Doc Type:	Bug Fix
Doc Text:	.The mapping for the `0xBC` code point for some IBM character sets is now `U+00AF MACRON` Previously, the `IBM256`, `IBM277`, `IBM278`, `IBM280`, `IBM284`, `IBM297`, and `IBM424` character sets encoded the `EBCDIC` code point `0xBC` as the Unicode character `U+203E OVERLINE`. As a result, when using the `iconv` program provided by `glibc`, converting text in those character sets containing the `0xBC` code point failed for non-Unicode character sets such as `ISO-8859-1` because they could not encode the `U+203E OVERLINE` character. With this update, the bug has been fixed. As a result, input in the `IBM277`, `IBM278`, `IBM280`, `IBM284`, and `IBM297` character sets can be converted to `ISO-8859-1` in all cases. For the `IBM256` and `IBM424` character sets, conversion no longer fails if the input text contains the 0xBC code point and the respective output is `U+00AF MACRON`.
Clone Of:
Clones:	2084564 (view as bug list)
Environment:
Last Closed:	2022-11-08 10:43:11 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1591268	1	None	None	None	2021-05-18 04:24:20 UTC
Red Hat Product Errata	RHBA-2022:7684	0	None	None	None	2022-11-08 10:43:27 UTC
Sourceware	27882	0	P2	ASSIGNED	Use U+00AF MACRON in more EBCDIC charsets	2021-08-17 21:25:09 UTC

Description Denis Volkov 2021-05-17 09:34:17 UTC

Description of problem:
`iconv` throws error when trying to convert macron (overscore) to EBCDIC-CP-ES (IBM284), even though it exists in char table (checked at https://www.compart.com/en/unicode/charsets/IBM284, hex 0xBC)

Version-Release number of selected component (if applicable):
glibc-common-2.28-127.el8_3.2.x86_64 (8.3)
glibc-common-2.28-151.el8.x86_64 (nightly)


How reproducible:

    $ echo "AF" | xxd -r -p |iconv -f iso8859-1 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

    $ echo ¯|iconv -f utf8 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

Comment 1 Florian Weimer 2021-05-17 09:44:31 UTC

As far as I can tell, the official mapping is for the 0xBC codepoint in IBM284 is U+203E:

$ printf '\xbc' | iconv -f IBM284 -t UTF-16BE | xxd
00000000: 203e                                      >

This is based on the ICU table published here:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM284-2.1.2.ucm

And I think IBM treats these ICU tables as the canonical reference nowadays. However, I note that OpenJDK in fact maps 0xbc to U+00AF, and we should at least be internally consistent in our product.

Comment 2 Tulio Magno Quites Machado Filho 2021-05-17 20:06:34 UTC

IBM provides a Host Code Page Reference [1]. According to this document, in IBM284, 0xBC
is mapped to IBM GCGID SM150000, which isn't documented.

But according to the following IBMi 7.3 document [2] GCGID SM150000 is mapped to U+00AF.

[1] https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf
[2] https://www.ibm.com/docs/en/i/7.3?topic=information-mapping-locale-symbolic-names

Comment 3 Tulio Magno Quites Machado Filho 2021-05-17 20:37:13 UTC

Interestingly, 0xBC is mapped to GCGID SM150000 in other code pages too, e.g. IBM500 and IBM871.

The glibc ICU tables for these code pages do map 0xBC to U+00AF:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM500-2.1.2.ucm#L209
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM871-2.1.2.ucm#L209

It seems the issue is just seen in IBM2XX "family", e.g.:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM280-2.1.2.ucm#L289
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM297-2.1.2.ucm#L289

Comment 4 Florian Weimer 2021-05-18 04:23:38 UTC

Uhm, and I actually fixed this for IBM273 a while back:

commit 14beef7575099f6373f9a45b4656f1e3675f7372
Author: Florian Weimer <fweimer>
Date:   Thu Jun 14 22:34:09 2018 +0200

    localedata: Make IBM273 compatible with ISO-8859-1 [BZ #23290]
    
    Reviewed-by: Carlos O'Donell <carlos>

diff --git a/localedata/charmaps/IBM273 b/localedata/charmaps/IBM273
index c3f70e2a6f..4401101b50 100644
--- a/localedata/charmaps/IBM273
+++ b/localedata/charmaps/IBM273
@@ -194,7 +194,7 @@ CHARMAP
 <U00BE>     /xb9         VULGAR FRACTION THREE QUARTERS
 <U00AC>     /xba         NOT SIGN
 <U007C>     /xbb         VERTICAL LINE
-<U203E>     /xbc         OVERLINE
+<U00AF>     /xbc         MACRON
 <U00A8>     /xbd         DIAERESIS
 <U00B4>     /xbe         ACUTE ACCENT
 <U00D7>     /xbf         MULTIPLICATION SIGN

Thanks for jogging my memory!

I think we should fix the remaining codepages this time.

Comment 5 Florian Weimer 2021-05-18 05:00:38 UTC

Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-May/126441.html

Comment 7 Florian Weimer 2021-05-27 12:13:14 UTC

Upstream commit:

commit f17164bd51db31f47fbbdae826c63b6d78184c45
Author: Florian Weimer <fweimer>
Date:   Tue May 18 07:21:33 2021 +0200

    localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882]
    
    This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424
    in the same way that IBM273 was updated for bug 23290.
    
    IBM256 and IBM424 still have holes after this change, so HAS_HOLES
    is not updated.
    
    Reviewed-by: Siddhesh Poyarekar <siddhesh>

I have put an unsupported, untested build for testing purposes here:

https://people.redhat.com/~fweimer/hSaS4M2B3iMN/glibc-2.28-158.el8.0.bz1961109.0/

It backports the upstream commit.

Comment 8 Denis Volkov 2021-06-01 08:50:44 UTC

Update from customer - he tested the build, it fixed the problem for them.

Comment 35 errata-xmlrpc 2022-11-08 10:43:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7684

Note You need to log in before you can comment on or make changes to this bug.