Bug 19495

Summary: Translated messages sometimes presented with wrong character set
Product: [Retired] Red Hat Linux Reporter: Göran Uddeborg <goeran>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Aaron Brown <abrown>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-11-21 21:46:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Göran Uddeborg 2000-10-20 22:17:26 UTC
I'm using the Swedish locale; LANG=sv_SE.  With the recent glibc, I've
started getting error messages like

    zic: Can't open #: AAtkomst nekas

That is not correct Swedish, it should be spelled with E (A-ring) as the
first character.  This bug doesn't afflict commands.  for example:

    iconv: kan inte vppna utfil: Etkomst nekas

is quite correct.

Warning: what follows is my analysis, it might be completely wrong.

The message catalog appears correct.  The problem seems to be commands
which does

    setlocale(LC_MESSAGES, "");

rather than

    setlocale(LC_ALL, "");

Zic from glibc-2.1.94-3 is one example, less from less-346-2 is another. 
The problem seems to be quite common.

I understand why this happens.  Since the command doesn't set the LC_CTYPE
category, gettext() will convert the string from the character set in the
message catalog (ISO-8859-1).

But I'm not quite sure which part is in error here.  Are all programs
setting LC_MESSAGES without setting LC_CTYPE incorrect?  Or should
setlocale(LC_MESSAGES, "") imply setlocale(LC_CTYPE, "")?  The former seems
more regular, and has some support in the specification.  ("If different
character sets are used by the locale categories, the results achieved by
an application utilising these categories are undefined." in
http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html for
example.)  On the other hand the latter would mean defining a lot of
programs as incorrect; and one could argue that using LC_MESSAGES or most
other categories means one will use characters, so LC_CTYPE should also be
set.

(In either case there is a bug in the glibc PACKAGE.  Either in the libc
LIBRARY or in the zic PROGRAM, but they both belong in glibc-2.1.94-3.)

Comment 1 Jakub Jelinek 2000-10-24 08:35:13 UTC
I've fixed this in zic and zdump, the fix is in current CVS glibc and
will appear in the next glibc errata.
As for less, less needs to be fixed as well.