130118 – CUPS lpr incorrect charset detection

Bug 130118 - CUPS lpr incorrect charset detection

Summary: CUPS lpr incorrect charset detection

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cups
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tim Waugh
QA Contact:
Docs Contact:
URL:	http://www.cups.org/str.php?L856+P0+S...
Whiteboard:
Depends On:
Blocks:	FC6Target
TreeView+	depends on / blocked

Reported:	2004-08-17 10:10 UTC by Jan "Yenya" Kasprzak
Modified:	2007-11-30 22:10 UTC (History)
CC List:	1 user (show)
Fixed In Version:	1.2.2-10
Clone Of:
Environment:
Last Closed:	2006-08-17 12:00:33 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
text file containing all 4 CJK locale characters (58 bytes, text/plain) 2005-04-13 01:45 UTC, Lawrence Lim	no flags	Details
View All

Description Jan "Yenya" Kasprzak 2004-08-17 10:10:04 UTC

Description of problem:
When printing plain text files, CUPS lpr tries to determine
the charset of the file from the environment variables
LC_MESSAGES (LC_ALL locale should be used instead) and LANG,
and even the parsing of the LANG and LC_MESSAGES is incorrect.
The value from the system locale LC_CTYPE should be used
instead, i.e. lpr should call

setlocale(LC_ALL, "");
char *charset = nl_langinfo(CODESET);

Another problem is that they use the fixed list of charsets,
and search in it by case-insensitive compare, while the compare
should be both case-insensitive and alphanumeric-only - i.e. both
"iso-8859-2" and "ISO8859-2" (note the missing hyphen and different
case) should refer to the same charset.

Version-Release number of selected component (if applicable):
cups-1.1.20-11.1

How reproducible:
100%

Steps to Reproduce:
1. create a text file in ISO-8859-2 encoding
2. export LC_CTYPE=cs_CZ
3. run "locale charmap" to verify that the charmap is ISO-8859-2
4. lpr text_file.iso8859-2
5. LANG=cs_CZ lpr text_file.iso8859-2
6. LANG=cs_CZ.ISO8859-2 lpr text_file.iso8859-2
7. LANG=cs_CZ.ISO-8859-2 lpr text_file.iso8859-2

Actual results:
only the last copy of the file is printed with correct characters

Expected results:
all four copies should be printed with correct characters

Additional info:
I have reported this upstream, so hopefully they will fix the
problem:
http://www.cups.org/str.php?L856+P0+S-2+C0+I0+E0+Q

Comment 2 Lawrence Lim 2005-04-13 01:45:21 UTC

Created attachment 113073 [details]
text file containing all 4 CJK locale characters

Comment 3 Leon Ho 2006-07-12 02:05:34 UTC

Can you try the latest rawhide? Things of handling i18n printing is improved there.

Comment 4 Jan "Yenya" Kasprzak 2006-07-12 15:24:51 UTC

The problem is still here (cups-1.2.1-18.i386). It is a bit different than
before (worse, I have to say):

- the "cs_CZ" locale now defaults to the UTF-8 encoding, so this test is not
valid anymore.
- with the other values of LC_CTYPE - cs_CZ.ISO8859-2 and cs_CZ.ISO-8859-2 it
does not work either (LANG=C, so everything except LC_CTYPE is set to C).
- when I set LANG=cs_CZ.ISO8859-2 or LANG=cs_CZ.ISO-8859-2, it works.

So it seems that cups looks to some other category than it should. According to
my two years old bug-report to the CUPS bug tracking system, it used to use
LC_MESSAGES category instead of LC_CTYPE it should use. But I have verified (by
setting LANG=C, LC_MESSAGES=cs_CZ.ISO-8859-2) that this is not the case.

Feel free to request more info.

Comment 5 Akira TAGOH 2006-07-13 03:02:45 UTC

this somewhat seems to be relevant to Bug#197577. Though CUPS should looks at
LC_CTYPE as the applications that relies on the locale usually does, and then
LANG if it's not.
paps - which works as the text plain filter for CUPS now - needs to support the
document-charset attribute. after that, that may helps you too.

Comment 6 Tim Waugh 2006-07-13 08:57:49 UTC

I think that document-charset is something that CUPS is meant to support itself,
rather than the filters.  I'll double-check upstream:

  http://www.cups.org/str.php?L1819

Comment 7 Tim Waugh 2006-07-19 11:06:27 UTC

SJIS-type encodings have now been added (1.2.1-21).

Comment 8 Akira TAGOH 2006-07-19 12:36:57 UTC

looks like you are confused to close a bug. this bug is irrelevant to sjis at
all. but if the way to specify the document encoding is documented somewhere or
provide the certain way for that, this may be duplicate of Bug#197577 I suppose.

or this complains that LC_CTYPE should works to detect the charset instead of
LC_MESSAGES anyway, this bug should be still kept as a separate bug IMHO. then
maybe retitling a bug would be better.

Comment 9 Tim Waugh 2006-08-16 14:26:56 UTC

Reported upstream:
  http://cups.org/str.php?L1915

Comment 10 Tim Waugh 2006-08-17 12:00:33 UTC

This seems to work fine here with 1.2.2-10.

Note You need to log in before you can comment on or make changes to this bug.