Bug 75430

Summary: Accented characters and the like in utf8 console
Product: [Fedora] Fedora Reporter: Carlos Rodrigues <cefrodrigues>
Component: kbdAssignee: Miloslav Trmač <mitr>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: leonid
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-02-18 14:59:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Rodrigues 2002-10-08 14:31:41 UTC
Description of Problem:
When using the console with a utf8 locale I cannot use accented characters.
Additionally, when using characters like &ccedil; I have to press backspace 
twice to delete them. The first backspace press eliminates the character 
from screen but if I try to run something, i.e. ls, it fails. This is not 
the whole story, backspacing those characters also erases part of the bash 
prompt.

Version-Release number of selected component (if applicable):


How Reproducible:
Always

Steps to Reproduce:
1. try to use an accented character (eg. &aacute;)
2. press &ccedil; twice
3. backspace until console beep

Actual Results:
no character appears after step 1
after 3 the bash prompt is partially erased

Expected Results:
It shou work like in 7.3

Additional Information:
I tested this using en_US.UTF-8 and pt_PT.UTF-8 locales

Comment 1 Leonid Mamtchenkov 2002-11-08 15:47:47 UTC
I have a very similar problem with ru_RU.utf8 .
I have also noticed something strange while trying to fix the problem.  In
/etc/profile.d/lang.sh matching for utf is done this way:
case $LANG in
  *.utf8*|*.UTF-8*)
...

while with my poor understanding of this I would expect something like:
case $LANG in
  *.utf8*|*.UTF8*)
...

or 

case $LANG in
  *.utf-8*|*.UTF-8*)
...

or evencase $LANG in
  *.utf8*|*.UTF8*|*.utf-8*|*.UTF-8*)
...

I am not insisting though :)

Comment 2 Eido Inoue 2003-01-06 02:59:47 UTC
accented characters (and cyrillic) in utf-8 are two bytes long, compared to one
byte in latin-* and koi8. The "needing to backspace more than once to erase a
char" is not a kbd issue, but rather an I18N issue with whatever shell you are
using.


Comment 3 Carlos Rodrigues 2003-01-06 03:26:49 UTC
The need to backspace more than once was a sidenote and it is now fixed in
bash-2.05b-7. The main issue was that I couldn't use accented characters with an
utf8 locale, and I still can't.

Comment 4 Eido Inoue 2003-01-06 18:52:49 UTC
while in the console, please tell me the locale you're running in (type
"locale"), the keymap you have loaded, whether you are in unicode mode or not,
and the scancodes that are generated when you press the keys (type "showkey")

Comment 5 Carlos Rodrigues 2003-01-06 20:18:01 UTC
This is the output of "locale" (note that the problem reported also happens with
a full en_US.UTF-8 or pt_PT.UTF-8 locale):

LANG=en_US.UTF-8
LC_CTYPE=pt_PT.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=pt_PT.UTF-8
LC_MONETARY=pt_PT.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=pt_PT.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT=pt_PT.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

My /etc/sysconfig/keyboard file contains this:

KEYBOARDTYPE="pc"
KEYTABLE="pt-latin1"

I am using unicode.

The scancodes generated for some accented characters are for example:

&atilde; -> 43,30 ("tilde", "a")
&aacute; -> 27,30 ("acute", "a")
&agrave; -> 42,27,30 ("shift", "grave", "a")

Additionally my /etc/sysconfig/i18n file contains:

SUPPORTED="en_US.UTF-8:en_US:en:pt_PT.UTF-8:pt_PT:pt"
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

LC_COLLATE="pt_PT.UTF-8"
LC_CTYPE="pt_PT.UTF-8"
LC_MONETARY="pt_PT.UTF-8"
LC_PAPER="pt_PT.UTF-8"
LC_MEASUREMENT="pt_PT.UTF-8"


Comment 6 Carlos Rodrigues 2003-04-14 00:33:45 UTC
In Red Hat 9 I still can't use accented characters on the console.

Comment 7 vek 2003-05-21 12:30:18 UTC
In the kernel there is no tranlsation from 8bit character value
found in the compose key translation table to utf-8 characers.
It is easy to add that, but that would work only for latin1 
characters and fail misserably for other characters, especially
non-latin sets.

If you trap the output that is actualy produced by the compose sequences
you will see that you actualy get valid 8bit non-utf8 characters, which the
console will refuse to display if it is un unicode mode.



Villy

Comment 8 Eido Inoue 2003-07-14 16:32:26 UTC
non-ASCII input into the console won't be supported on UTF-8 based systems (RHL
8+) until the kernel supports it. See the release notes in the next beta for
more info. Of course, input through X will always be supported.

Comment 9 Carlos Rodrigues 2004-07-10 17:27:46 UTC
unfortunately FC2 still has this problem... which is bad news for
those who want to write in their native language using the console...
Some of us still like using the console, it is unfortunate that is is
currently broken.

Comment 10 Christopher Beland 2005-01-20 04:24:05 UTC
I think I am experiencing the same problem...

Fedora Core 3
gnome-terminal-2.7.3-1

I have "echo São_Paulo" written at the terminal prompt.  (São as I see
it right now is S, a with a tilde above it, and then o.)  If I scroll
all the way to the left using the left arrow key, and then scroll back
to the right, the command will be re-written, shifted one character to
the left, one character at a time as I hit the right arrow key.  As I
am scrolling left, the cursor actually appears to skip the ã
(a-tilde).  The situation gets much worse if I try to delete
characters to the left of the ã (a-tilde).  I'm not sure what encoding
the original source of the text used; LANG and all LC_* vars are set to en_US.UTF-8,
Terminal -> Set Character Encoding is set to Current Locale (UTF-8),
and the letter does display correctly as an a-tilde in my terminal.


Comment 11 Miloslav Trmač 2005-02-18 14:59:46 UTC
Christopher, that is a bug in your shell, unrelated to the rest of this report.

The original bug is caused by missing compose support for UTF-8 in the kernel,
tracked in bug 143014.



*** This bug has been marked as a duplicate of 143014 ***