Bug 397021

Summary: Problems converting to iso-2022-jp//translit
Product: Red Hat Enterprise Linux 5 Reporter: John Haxby <jch>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: low    
Version: 5.1   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0083 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 16:52:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
UTF8 character sequence used in describing the problem. none

Description John Haxby 2007-11-23 16:03:46 UTC
Description of problem:

While I'm logging this problem against RHEL5, it's actually present in all the
version of glibc that I've been able to track down: everything from 2.3.2 (on
RHEL3) to 2.7 (on Fedora 8).


How reproducible:

Always.

Steps to Reproduce:
1. With a UTF-8 locale, eg en_GB.UTF-8:

    echo £€ | iconv -t iso2022jp//translit | iconv -f iso2022jp
    iconv -t iso2022jp//translit < attachment
    echo -e '\xe3\x88\xb1' | iconv -t iso2022jp//translit

The attachment is a series of UTF-8 characters some of which can be translated
to iso-2022-jp some (the numbered bullets, for example) cannot.
  
Actual results:

   First command:
     £鍍iconv: illegal input sequence at position 7

   Second command:
     ^[$BF|K\8l^[(B ^[$B5!<o0MB8J8;z(1)(2)(3)iiiiiiIIIIII(^[$B3t^[(B)

   and "(^[$B3t^[(B)" is repeated forever -- iconv never completes.

   Third command:
     no output, iconv just consumes 100% CPU until you get bored :-)


Expected results:

   The first command should produce "£EUR" because while there's a sterling
symbol in iso-2022-jp there isn't a Euro symbol.  The illegal input sequence is
as a result of not shifting back to ASCII after putting out the sequenc that
represents a sterling symbol.   You can see what happens if you look at the
output from just converting a £ to iso-2022-jp and then at the combined output.

   The second command is seriously problematic.  In a program that is converting
a fairly short string in a buffer to another in a buffer that grows as needed,
the target buffer will grow arbitrarily large, or it would if the OOM killer
didn't step in.

   The third command extract just one character from the UTF8 sequence
(represented as three bytes) and iconv spins with this.

Additional info:

I strongly suspect all three problems are different aspects of the same bug. 
This bug has been around for quite a while and it wasn't until a collegue in
Japan was testing support for some of the more unusual characters used in
Japanese text (that aren't actually in ISO-2022-JP but are in a common
extension, CP50221 aka ISO-2022-JP-MS).   This rather unfortunately behaviour
has been causing chaos!

Comment 1 John Haxby 2007-11-23 16:03:46 UTC
Created attachment 267661 [details]
UTF8 character sequence used in describing the problem.

Comment 3 RHEL Program Management 2008-01-08 14:04:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 errata-xmlrpc 2008-05-21 16:52:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0083.html