Red Hat Bugzilla – Bug 397021
Problems converting to iso-2022-jp//translit
Last modified: 2008-05-21 12:52:50 EDT
Description of problem:
While I'm logging this problem against RHEL5, it's actually present in all the
version of glibc that I've been able to track down: everything from 2.3.2 (on
RHEL3) to 2.7 (on Fedora 8).
Steps to Reproduce:
1. With a UTF-8 locale, eg en_GB.UTF-8:
echo £€ | iconv -t iso2022jp//translit | iconv -f iso2022jp
iconv -t iso2022jp//translit < attachment
echo -e '\xe3\x88\xb1' | iconv -t iso2022jp//translit
The attachment is a series of UTF-8 characters some of which can be translated
to iso-2022-jp some (the numbered bullets, for example) cannot.
£鍍iconv: illegal input sequence at position 7
and "(^[$B3t^[(B)" is repeated forever -- iconv never completes.
no output, iconv just consumes 100% CPU until you get bored :-)
The first command should produce "£EUR" because while there's a sterling
symbol in iso-2022-jp there isn't a Euro symbol. The illegal input sequence is
as a result of not shifting back to ASCII after putting out the sequenc that
represents a sterling symbol. You can see what happens if you look at the
output from just converting a £ to iso-2022-jp and then at the combined output.
The second command is seriously problematic. In a program that is converting
a fairly short string in a buffer to another in a buffer that grows as needed,
the target buffer will grow arbitrarily large, or it would if the OOM killer
didn't step in.
The third command extract just one character from the UTF8 sequence
(represented as three bytes) and iconv spins with this.
I strongly suspect all three problems are different aspects of the same bug.
This bug has been around for quite a while and it wasn't until a collegue in
Japan was testing support for some of the more unusual characters used in
Japanese text (that aren't actually in ISO-2022-JP but are in a common
extension, CP50221 aka ISO-2022-JP-MS). This rather unfortunately behaviour
has been causing chaos!
Created attachment 267661 [details]
UTF8 character sequence used in describing the problem.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.