Readline doesn't have any idea of anything but single-byte encodings. This means that if you are running in in a locale such as en_US.UTF-8 and have multibyte characters on the command line, then editing will edit partial bits of characters, resulting in incorrect behavior and invalid text. A partial patch for bash support UTF-8 is found at: http://www.tldp.org/HOWTO/Unicode-HOWTO-4.html#ss4.1 I haven't tried it out or studied it in detail. Explicit UTF-8 has: - The advantage of knowing the encoding so its easier to deal with things like invalid sequences in a robust manner. - The disadvantage, compared to using generic functions like mblen, mbtowc, of working only in UTF-8 locales, and not in other multibyte locales like ja_JP.eucJP. (I'm filing this against bash because it has its own copy of readline and bash is the most important place this will be noticed. The same thing seems to apply to readline as well. I don't think any support for multibyte encodings is really needed in bash other than in readline.)
There is a big, ugly, but plausible-looking multibyte-support-for-bash patch at: http://oss.software.ibm.com/developer/opensource/linux/patches/?patch_id=34
Phil has looked into this and hopefully fixed it. CC:'d him for confirmation.
What's the status of this? Can we close it out?
I'm running RHL 8.0 with bash-2.05b-7 (IIRC) from rawhide without problems with UTF-8.
Closing out based on feedback from reporter.