Red Hat Bugzilla – Bug 142265
readline can not edit utf-8 multi-byte characters (backspace)
Last modified: 2007-11-30 17:10:56 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Description of problem:
With readline-4.3-13 (FC3's current) erasing utf-8 multi-byte
characters with backspace fails.
Version-Release number of selected component (if applicable):
How reproducible: Always
Steps to Reproduce:
1. install FC3 and update
2. start python/use bash to 'read' variables
3. type some utf-8 multi-byte characters and use backspace
Actual Results: backspace erases byte by byte
Expected Results: backspace erasing character by character
The following example can reproduce it (tested with konsole and
gnome-terminal, xterm shows nothing, the word 'ÐÑÐ¾Ð±Ð°' is in Bulgarian
and means 'Test'):
--- cut ---
$ echo -n "Test: "; read a; echo "[$a]"
<You can paste this, then hit backspace 4 times to get the next line>
<Press Enter here and you get>
--- cut ---
If I erase one character I get '[ÐÑÐ¾Ð±ï¿½'. PÃ¡draig Brady noted that
readline 5 probably fixes this -
(the message has charset="utf-8", but the cyrilic letters are
scrambled), http://cnswww.cns.cwru.edu/php/chet/readline/CHANGES for
readline 5 info. This affects bash (read function only, command
editing works), python and probably all other console applications
that use readline.
Two points to make to clarify this issue:
1. You are not using readline at all when you use bash's read builtin like that.
You must pass the -e option for that: "read -e a"
2. The bash package (only) in FC3 is already using readline 5.
So, combining the fact that "read -e" with bash *does* handle UTF-8 correctly
(try it) and it fails as you say in Python, it looks like readline 5 does fix this.
Actually, hang on -- does python use readline? It doesn't link to it.
Oh, never mind, it dlopens it or something similar.
Yes, I confirm comment # 1, 'read -e -p "Test: " a; echo "[$a]"' works fine.
What do bash's read (without -e), pdksh and zsh for example (which even prints
every utf-8 multi-byte character as two) use, their own code? Does this mean
every single console application has to be checked? vi works, jed fails...
Can't speak for other applications, but "read" without -e on bash just invokes
the read() system call, so you just get tty line discipline.
Note "read -e" works for bash-2.05b-20.1 (redhat 9) also.
I don't think that Was that using readline 5?
What did you do exactly to get python to fail.
python on rh9 seems to work for me at least?
Reply to comment # 6:
1. start python on the command line in say konsole.
2. Copy and paste this on the prompt '>>> ' (you should see "Tect",
but it consists of cyrilic letters): "Ð¢ÐµÑÑ"
3. Start erasing with backspace - you should be able to erase '>>> '
too. If you paste 2 letters - you should be able to erase it to '>>'.
I found this problem playing with trac-admin.
Can't reproduce this on RH9:
Hang on a minute, I think this is an optical illusion.
Python doesn't (of its own accord) call setlocale(LC_CTYPE,""), so of _course_
it won't handle multibyte issues when you just invoke it and start typing.
However, if you try this:
>>> import locale
>>> locale.setlocale (locale.LC_CTYPE, "")
then backspace until the prompt, you'll find that you cannot delete it.
Multibyte is now being handled correctly.
So python is a bad choice for demonstrating readline bugs.
Similarly, /usr/bin/ftp seems to neglect to call setlocale(LC_CTYPE,""), so
that's an ftp bug.
However, lftp *does* do the right thing, and shows that readline is not the culprit.
The only bug here is in ftp.
Created attachment 108633 [details]
..and here's the fix.
Fixed in CVS.
(In reply to comment #9)
> Hang on a minute, I think this is an optical illusion.
> Python doesn't (of its own accord) call setlocale(LC_CTYPE,""), so of _course_
> it won't handle multibyte issues when you just invoke it and start typing.
> However, if you try this:
> >>> import locale
> >>> locale.setlocale (locale.LC_CTYPE, "")
> >>> Ð¢ÐµÑÑ
> then backspace until the prompt, you'll find that you cannot delete it.
> Multibyte is now being handled correctly.
A complicated story. I was told it could be readline in the -devel list
AFAIR, that's why I filled it against readline...
I don't agree it's ftp bug only, I think if fedora's default encoding is
UTF-8, then all programs should use/default to it. If I'm right isn't this a
'distribution' bug or do I/we have to test every single program (this part can
not be avoided) and fill separate bugs for each? Why on RH9 python "works"? I'm
a bit confused... what now?
Python cannot call setlocale() on its own -- it isn't allowed to. That's up to
the Python *script* that it runs. Python is just the interpreter.
Feel free to file bugs against any other readline-using applications that
exhibit this bad behaviour, but bear in mind that program interpreters are
sometimes a special case.
About Comment #9: it's not /usr/bin/ftp, it works fine for me (ftp-0.17-22),
/usr/kerberos/bin/ftp is not. This means the component is krb5-workstation, not
ftp, right (or maybe bouth?)?
What about bash, does the direct read() system call make it buggy?
Doncho: ftp-0.17-22 fails for me, so I think you are testing in a different way.
A fix has been committed to CVS.
I think that the krb5 ftp intentionally avoids linking to readline (but may be
No, bash is not buggy: see the documentation and understand the -e parameter to
the read builtin.