65251 – Bash and readline don't support multibyte locales

Bug 65251 - Bash and readline don't support multibyte locales

Summary: Bash and readline don't support multibyte locales

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	distribution
Sub Component:
Version:	7.3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	wdovlrrw
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	65252 79579
TreeView+	depends on / blocked

Reported:	2002-05-20 22:27 UTC by Owen Taylor
Modified:	2007-04-18 16:42 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-01-15 13:48:01 UTC
Embargoed:

Attachments	(Terms of Use)

Description Owen Taylor 2002-05-20 22:27:54 UTC

Readline doesn't have any idea of anything but single-byte encodings.
This means that if you are running in in a locale such as en_US.UTF-8
and have multibyte characters on the command line, then editing
will edit partial bits of characters, resulting in incorrect
behavior and invalid text.

A partial patch for bash support UTF-8 is found at:

 http://www.tldp.org/HOWTO/Unicode-HOWTO-4.html#ss4.1

I haven't tried it out or studied it in detail. 

Explicit UTF-8 has:

 - The advantage of knowing the encoding so its easier to deal
   with things like invalid sequences in a robust manner.

 - The disadvantage, compared to using generic functions
   like mblen, mbtowc, of working only in UTF-8 locales, 
   and not in other multibyte locales like ja_JP.eucJP.

(I'm filing this against bash because it has its own copy of readline
and bash is the most important place this will be noticed. The
same thing seems to apply to readline as well. I don't think
any support for multibyte encodings is really needed in bash
other than in readline.)

Comment 1 Owen Taylor 2002-05-23 19:14:46 UTC

There is a big, ugly, but plausible-looking multibyte-support-for-bash patch at:

http://oss.software.ibm.com/developer/opensource/linux/patches/?patch_id=34

Comment 2 Karsten Hopp 2002-07-13 00:02:56 UTC

Phil has looked into this and hopefully fixed it. CC:'d him for confirmation.

Comment 3 Jay Turner 2003-01-10 13:41:33 UTC

What's the status of this?  Can we close it out?

Comment 4 Miloslav Trmac 2003-01-10 13:44:58 UTC

I'm running RHL 8.0 with bash-2.05b-7 (IIRC) from rawhide without problems
with UTF-8.

Comment 5 Jay Turner 2003-01-15 13:48:01 UTC

Closing out based on feedback from reporter.

Note You need to log in before you can comment on or make changes to this bug.