19495 – Translated messages sometimes presented with wrong character set

Bug 19495 - Translated messages sometimes presented with wrong character set

Summary: Translated messages sometimes presented with wrong character set

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	glibc
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Aaron Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-10-20 22:17 UTC by Göran Uddeborg
Modified:	2016-11-24 15:25 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2000-11-21 21:46:00 UTC
Embargoed:

Attachments	(Terms of Use)

Description Göran Uddeborg 2000-10-20 22:17:26 UTC

I'm using the Swedish locale; LANG=sv_SE.  With the recent glibc, I've
started getting error messages like

    zic: Can't open #: AAtkomst nekas

That is not correct Swedish, it should be spelled with E (A-ring) as the
first character.  This bug doesn't afflict commands.  for example:

    iconv: kan inte vppna utfil: Etkomst nekas

is quite correct.

Warning: what follows is my analysis, it might be completely wrong.

The message catalog appears correct.  The problem seems to be commands
which does

    setlocale(LC_MESSAGES, "");

rather than

    setlocale(LC_ALL, "");

Zic from glibc-2.1.94-3 is one example, less from less-346-2 is another. 
The problem seems to be quite common.

I understand why this happens.  Since the command doesn't set the LC_CTYPE
category, gettext() will convert the string from the character set in the
message catalog (ISO-8859-1).

But I'm not quite sure which part is in error here.  Are all programs
setting LC_MESSAGES without setting LC_CTYPE incorrect?  Or should
setlocale(LC_MESSAGES, "") imply setlocale(LC_CTYPE, "")?  The former seems
more regular, and has some support in the specification.  ("If different
character sets are used by the locale categories, the results achieved by
an application utilising these categories are undefined." in
http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html for
example.)  On the other hand the latter would mean defining a lot of
programs as incorrect; and one could argue that using LC_MESSAGES or most
other categories means one will use characters, so LC_CTYPE should also be
set.

(In either case there is a bug in the glibc PACKAGE.  Either in the libc
LIBRARY or in the zic PROGRAM, but they both belong in glibc-2.1.94-3.)

Comment 1 Jakub Jelinek 2000-10-24 08:35:13 UTC

I've fixed this in zic and zdump, the fix is in current CVS glibc and
will appear in the next glibc errata.
As for less, less needs to be fixed as well.

Note You need to log in before you can comment on or make changes to this bug.