Bug 1961109

Summary:	glibc: iconv: missing macron (unicode 0xAF) in EBCDIC-CP-ES (IBM284)
Product:	Red Hat Enterprise Linux 8	Reporter:	Denis Volkov <dvolkov>
Component:	glibc	Assignee:	Arjun Shankar <ashankar>
Status:	CLOSED ERRATA	QA Contact:	Martin Coufal <mcoufal>
Severity:	high	Docs Contact:	Jacob Taylor Valdez <jvaldez>
Priority:	high
Version:	8.4	CC:	alanm, ashankar, codonell, dbodnarc, dj, fweimer, jvaldez, mcoufal, mnewsome, pfrankli, sipoyare, skolosov, tulioqm
Target Milestone:	beta	Keywords:	Bugfix, Patch, Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glibc-2.28-201.el8	Doc Type:	Bug Fix
Doc Text:	.The mapping for the `0xBC` code point for some IBM character sets is now `U+00AF MACRON` Previously, the `IBM256`, `IBM277`, `IBM278`, `IBM280`, `IBM284`, `IBM297`, and `IBM424` character sets encoded the `EBCDIC` code point `0xBC` as the Unicode character `U+203E OVERLINE`. As a result, when using the `iconv` program provided by `glibc`, converting text in those character sets containing the `0xBC` code point failed for non-Unicode character sets such as `ISO-8859-1` because they could not encode the `U+203E OVERLINE` character. With this update, the bug has been fixed. As a result, input in the `IBM277`, `IBM278`, `IBM280`, `IBM284`, and `IBM297` character sets can be converted to `ISO-8859-1` in all cases. For the `IBM256` and `IBM424` character sets, conversion no longer fails if the input text contains the 0xBC code point and the respective output is `U+00AF MACRON`.	Story Points:	---
Clone Of:
Clones:	2084564 (view as bug list)		Environment:
Last Closed:	2022-11-08 10:43:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2084564

Description Denis Volkov 2021-05-17 09:34:17 UTC

Description of problem:
`iconv` throws error when trying to convert macron (overscore) to EBCDIC-CP-ES (IBM284), even though it exists in char table (checked at https://www.compart.com/en/unicode/charsets/IBM284, hex 0xBC)

Version-Release number of selected component (if applicable):
glibc-common-2.28-127.el8_3.2.x86_64 (8.3)
glibc-common-2.28-151.el8.x86_64 (nightly)


How reproducible:

    $ echo "AF" | xxd -r -p |iconv -f iso8859-1 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

    $ echo ¯|iconv -f utf8 -t EBCDIC-CP-ES 
    iconv: illegal input sequence at position 0

Comment 1 Florian Weimer 2021-05-17 09:44:31 UTC

As far as I can tell, the official mapping is for the 0xBC codepoint in IBM284 is U+203E:

$ printf '\xbc' | iconv -f IBM284 -t UTF-16BE | xxd
00000000: 203e                                      >

This is based on the ICU table published here:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM284-2.1.2.ucm

And I think IBM treats these ICU tables as the canonical reference nowadays. However, I note that OpenJDK in fact maps 0xbc to U+00AF, and we should at least be internally consistent in our product.

Comment 2 Tulio Magno Quites Machado Filho 2021-05-17 20:06:34 UTC

IBM provides a Host Code Page Reference [1]. According to this document, in IBM284, 0xBC
is mapped to IBM GCGID SM150000, which isn't documented.

But according to the following IBMi 7.3 document [2] GCGID SM150000 is mapped to U+00AF.

[1] https://www.ibm.com/docs/en/SSEQ5Y_12.0.0/com.ibm.pcomm.doc/reference/pdf/hcp_referenceV58.pdf
[2] https://www.ibm.com/docs/en/i/7.3?topic=information-mapping-locale-symbolic-names

Comment 3 Tulio Magno Quites Machado Filho 2021-05-17 20:37:13 UTC

Interestingly, 0xBC is mapped to GCGID SM150000 in other code pages too, e.g. IBM500 and IBM871.

The glibc ICU tables for these code pages do map 0xBC to U+00AF:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM500-2.1.2.ucm#L209
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM871-2.1.2.ucm#L209

It seems the issue is just seen in IBM2XX "family", e.g.:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM280-2.1.2.ucm#L289
https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/glibc-IBM297-2.1.2.ucm#L289

Comment 4 Florian Weimer 2021-05-18 04:23:38 UTC

Uhm, and I actually fixed this for IBM273 a while back:

commit 14beef7575099f6373f9a45b4656f1e3675f7372
Author: Florian Weimer <fweimer>
Date:   Thu Jun 14 22:34:09 2018 +0200

    localedata: Make IBM273 compatible with ISO-8859-1 [BZ #23290]
    
    Reviewed-by: Carlos O'Donell <carlos>

diff --git a/localedata/charmaps/IBM273 b/localedata/charmaps/IBM273
index c3f70e2a6f..4401101b50 100644
--- a/localedata/charmaps/IBM273
+++ b/localedata/charmaps/IBM273
@@ -194,7 +194,7 @@ CHARMAP
 <U00BE>     /xb9         VULGAR FRACTION THREE QUARTERS
 <U00AC>     /xba         NOT SIGN
 <U007C>     /xbb         VERTICAL LINE
-<U203E>     /xbc         OVERLINE
+<U00AF>     /xbc         MACRON
 <U00A8>     /xbd         DIAERESIS
 <U00B4>     /xbe         ACUTE ACCENT
 <U00D7>     /xbf         MULTIPLICATION SIGN

Thanks for jogging my memory!

I think we should fix the remaining codepages this time.

Comment 5 Florian Weimer 2021-05-18 05:00:38 UTC

Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-May/126441.html

Comment 7 Florian Weimer 2021-05-27 12:13:14 UTC

Upstream commit:

commit f17164bd51db31f47fbbdae826c63b6d78184c45
Author: Florian Weimer <fweimer>
Date:   Tue May 18 07:21:33 2021 +0200

    localedata: Use U+00AF MACRON in more EBCDIC charsets [BZ #27882]
    
    This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424
    in the same way that IBM273 was updated for bug 23290.
    
    IBM256 and IBM424 still have holes after this change, so HAS_HOLES
    is not updated.
    
    Reviewed-by: Siddhesh Poyarekar <siddhesh>

I have put an unsupported, untested build for testing purposes here:

https://people.redhat.com/~fweimer/hSaS4M2B3iMN/glibc-2.28-158.el8.0.bz1961109.0/

It backports the upstream commit.

Comment 8 Denis Volkov 2021-06-01 08:50:44 UTC

Update from customer - he tested the build, it fixed the problem for them.

Comment 35 errata-xmlrpc 2022-11-08 10:43:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7684