Bug 81325

Summary:	8-bit characters do not work on the console or xterm
Product:	[Retired] Red Hat Public Beta	Reporter:	Keith Briscoe <cheeth>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED NOTABUG	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	phoebe	CC:	notting
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-01-09 03:59:55 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Keith Briscoe 2003-01-08 03:27:10 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021218

Description of problem:
I don't know if this is a bug or feature, but I do know this is the wrong
component.  Please reassign to the right component.

Any character with the high bit set will show the standard "unprintable
character" symbol.  I'm sure this is related to the switch to UTF-8, but I don't
know what correct behavior is--I simply know that what used to work no longer works.

According to Unicode.org, the first 256 glyphs in Unicode exactly match the
glyphs in ISO-8859-1.  So, it seems that codes 128-255 should be usable.

If UTF-8 somehow reserves this bit, it would make sense that those characters
would go away.  But even if this is the case, I do not see any Unicode
characters when I try to look at 8-bit characters.  If the code is not correct
to generate a Unicode character on the console, would it not help compatibility
to fall back to ISO-8859-1 in these cases?

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
cat a file containing 8-bit text
    

Actual Results:  Nothing but unprintable character symbols wherever the high bit
is set.

Expected Results:  "Latin Extended" characters (primarily accented characters in
Unicode/ISO-8859-1) should show up.

Additional info:

Comment 1 Bill Nottingham 2003-01-08 05:17:00 UTC

What does your /etc/sysconfig/i18n look like?

Comment 2 Keith Briscoe 2003-01-08 06:49:14 UTC

Contents below:

LANG="en_US.UTF-8"
SUPPORTED="en_US.UTF-8:en_US:en"
SYSFONT="latarcyrheb-sun16"

Bog standard config, out of the box.  I'll come up with a good demo file to
demonstrate the problem (but I'm sleepy now and it'll have to wait)

Comment 3 Miloslav Trmac 2003-01-08 18:29:48 UTC

No. The fact that Unicode and ISO-8859-1 contain the characters
at the same position doesn't mean that the encoding is the same.
Unicode is encoded using UTF-8, which uses a different way
to represent characters withcodes > 127
(If it were compatible with iso-8859-1, there would be no way to
add other characters!).
NOTABUG IMHO, you can use iconv to convert the file, or revert to
non-UTF-8 locale.

Comment 4 Keith Briscoe 2003-01-09 02:31:24 UTC

Thanks very much.  I know just enough about this to know what you're talking
about, but not much more ;)

I'll check out the iconv program.  Looks very useful.  I agree that this is
probably NOTABUG.