Readline doesn't have any idea of anything but single-byte encodings.
This means that if you are running in in a locale such as en_US.UTF-8
and have multibyte characters on the command line, then editing
will edit partial bits of characters, resulting in incorrect
behavior and invalid text.
A partial patch for bash support UTF-8 is found at:
I haven't tried it out or studied it in detail.
Explicit UTF-8 has:
- The advantage of knowing the encoding so its easier to deal
with things like invalid sequences in a robust manner.
- The disadvantage, compared to using generic functions
like mblen, mbtowc, of working only in UTF-8 locales,
and not in other multibyte locales like ja_JP.eucJP.
(I'm filing this against bash because it has its own copy of readline
and bash is the most important place this will be noticed. The
same thing seems to apply to readline as well. I don't think
any support for multibyte encodings is really needed in bash
other than in readline.)
There is a big, ugly, but plausible-looking multibyte-support-for-bash patch at:
Phil has looked into this and hopefully fixed it. CC:'d him for confirmation.
What's the status of this? Can we close it out?
I'm running RHL 8.0 with bash-2.05b-7 (IIRC) from rawhide without problems
Closing out based on feedback from reporter.