Bug 23479

Summary: question marks instead of accented characters - OUTPUT_CHARSET
Product: [Retired] Red Hat Linux Reporter: stano
Component: installerAssignee: Matt Wilson <msw>
Status: CLOSED WONTFIX QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-05-09 16:18:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description stano 2001-01-06 10:45:30 UTC
The glibc developers made the decision not to display
accented (national) characters and display question marks
instead, if the application does not initialize its locale.

While this is basically correct and I can understand this
from the glibc point of view, there is a lot of applications
that do not bother with setlocale call and as the glibc
messages are localised, they are subsequently mangled.
To fix all of the apps will take a looong time.

This change already has generated a bunch of questions
on local mailing lists.

The workaround is simple - set OUTPUT_CHARSET to desired
character set. Maybe the installer should set this in
/etc/sysconfig/i18n along with other variables (at least
optionally after confirmation, maybe there are cases where
this is not the best idea).

Reproducing:
unsetenv all LC_, LANG, OUTPUT_CHARSET env. variables,
if present

% gcc /nonexistent
gcc: /nonexistent: No such file or directory
gcc: No input files
% setenv LANG sk_SK
% gcc /nonexistent
gcc: /nonexistent: Adres?r alebo s?bor neexistuje
gcc: No input files
% setenv OUTPUT_CHARSET sk_SK
% gcc /nonexistent
gcc: /nonexistent: Adresar alebo szbor neexistuje
gcc: No input files

Comment 1 stano 2001-01-06 10:49:14 UTC
eh - the bugzilla form messes the accented characters too -
the 'a' and 'u' are with accute accent in the next to last line:
.. Adresa'r alebo su'bor ...

Comment 2 Michael Fulbright 2001-01-09 21:32:01 UTC
Assigning to a developer.

Comment 3 Matt Wilson 2001-01-09 21:49:53 UTC
setting OUTPUT_CHARSET globally will break things.  If you do something like
"LANG=ja_JP.eucJP gnome-termintal" when it is set to OUTPUT_CHARSET=ISO-8859-2
globally you'll get '?' everywhere for the menus.

That is, OUTPUT_CHARSET overrides the default charset when programs properly
call setlocale()


Comment 4 stano 2001-01-10 08:56:21 UTC
Hmm, this is what I was afraid of and I don't see any easy way to fix this in a 
way that pleases everyone :-(

Is the _optional_ OUTPUT_CHARSET setting in the installer a possible 
alternative? Most users of iso8859-1,2 (dunno about cyrillic environments) 
won't be using the scenario you have described and if they set the LANG 
manually before calling some program, they can also unset the OUTPUT_CHARSET.

Anyway, it would be good to mention this behaviour somewhere where the user 
selecting another locale than US one does see it.

Comment 5 Matt Wilson 2001-01-10 20:04:23 UTC
The only real fix is to have any program that uses i18n (even strerror et al)
call setlocale.


Comment 6 Brent Fox 2001-04-25 22:12:45 UTC
Is this issue considered one that we won't fix?

Comment 7 Brent Fox 2001-05-09 16:18:36 UTC
I guess so.