Bug 197577

Summary:	'LC_ALL=ja_JP.sjis lpr testpage.sjis' fails to print
Product:	[Fedora] Fedora	Reporter:	Akira TAGOH <tagoh>
Component:	paps	Assignee:	Tim Waugh <twaugh>
Status:	CLOSED CANTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	eng-i18n-bugs
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-08-18 13:00:08 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	150223

Description Akira TAGOH 2006-07-04 09:50:41 UTC

Description of problem:
right now CUPS has only a way to specify the character sets with lpr options,
such as lpr -o "document-format=text/plain;charset=sjis" sjis.txt. but it looks
like too long and enforcing to describe a MIME type too is complex a bit and
hard for the newbie's knowledge I guess. the enrivonment variables may be a
little hard as well though, it may be better than that.
However it doesn't works in some  cases. I mean, CUPS tries to guess a charset
from locale's charset fields though,  for example, CUPS doesn't detect sjis from
LANG=ja_JP.sjis. aside from that, there are no common sense to specify
iso-2022-jp for locale. so I'm thinking that using CHARSET directly may be
better. but CUPS overwrites CHARSET. so there are no better way to set it now.

Version-Release number of selected component (if applicable):
cups-1.2.1-16

How reproducible:
always

Steps to Reproduce:
1.LANG=ja_JP.sjis lpr testpage.sjis
2.CHARSET=sjis LANG=ja_JP.UTF-8 lpr testpage.sjis
3.
  
Actual results:
either testcases prints out the empty page.

Expected results:
I don't mind about 1. whether it works or not. but if CHARSET is already set,
CUPS shouldn't set anything from LANG etc.

Additional info:

Comment 1 Tim Waugh 2006-07-04 13:03:18 UTC

I think this is because CUPS doesn't support SJIS.  In particular, it does not
appear in the locale_encodings[] array in cups/language.c.

Reported upstream:

  http://cups.org/str.php?L1819

Comment 2 Akira TAGOH 2006-07-05 02:44:23 UTC

this isn't an issue for only sjis. as I mentioned in the initial report, think
about iso-2022-jp case too. IMHO the above suggestion should be reasonable
rather than cutting from LANG with the unusual locale-like strings.
adding any encodings to locale_encodings[] doesn't matter. actually 'lpr -o
"document-format=text/plain;charset=iso-2022-jp" testpage.jis' works and fine
for sjis as well. but I'd suggest much better way to specify charset behind the
above reasons.

Comment 3 Akira TAGOH 2006-07-05 02:51:12 UTC

In addition, in the printing area, IMHO the encoding what people wants to print
out  shouldn't be necessarily the same to the current locale. I guess CUPS may
wants to keep the backword compatibility with LPRng though, relying on the
locale anyway doesn't make sense to me.

Comment 4 Tim Waugh 2006-07-12 16:10:09 UTC

The easier way to specify the document encoding is:

lp -o document-charset=windows-932 ...

The problem with SJIS in particular is that it is not an ISO-registered name. 
Apparently windows-932 (which is registered) corresponds to Shift-JIS.  The
upstream CUPS maintainer will add some special-casing for 'SJIS' etc.

See http://www.cups.org/str.php?L1819, second comment.

Comment 5 Akira TAGOH 2006-07-13 01:01:42 UTC

hmm, does this mean that referring to the document-charset attribute would be
better than CHARSET environment variable? and the document-charset would
overwrites CHARSET for the expected encoding?

Comment 6 Tim Waugh 2006-07-13 08:49:17 UTC

What CHARSET environment variable?  I've never heard of it before this bug
report, and assumed it was something that you were proposing should be introduced.

Comment 7 Akira TAGOH 2006-07-13 09:10:18 UTC

CUPS is set a charset that is actually came from current locale to CHARSET and
then invokes filters, though. it may be an internal environment variables
though, apparently texttops is referring to CHARSET to determine a document
charset instead of looking up the "document-charset" attribute.
That's why I'm proposing this. after some testing, "document-charset" attribute
doesn't affects CHARSET. so I wanted to know which one should be used if both
are specified - presumably filters should proiritize "document-charset" if it's
there.

Comment 8 Tim Waugh 2006-07-14 12:47:15 UTC

Oh, that CHARSET; yes, that's a CUPS-internal one only.

'document-charset' certainly *ought* to work because it is in the IPP standard.
 It looks like CUPS isn't honouring that.

I've filed a bug upstream about this:

  http://cups.org/str.php?L1841

Comment 9 Akira TAGOH 2006-07-17 03:54:58 UTC

ok, I see. thanks for the clarification.

Comment 10 Tim Waugh 2006-07-19 11:08:17 UTC

Looks like I was told wrong about document-charset after all. :-(

Comment 11 Tim Waugh 2006-08-16 13:44:14 UTC

The thing stopping this command:

  LC_ALL=ja_JP.sjis lpr testpage.sjis

from printing anything seems to be that CUPS is translating it to 'windows-932'
instead of 'windows-31j'.  I've asked about this upstream:

  http://cups.org/newsgroups.php?s9784+gcups.general+v9793+T0

Comment 12 Akira TAGOH 2006-08-17 03:37:09 UTC

well, windows-932 should be much familiar with CP932, which iconv also supports.
just FYI.

Comment 13 Akira TAGOH 2006-08-17 03:41:18 UTC

one more thing... in a narrow sense CP932 and Shift_JIS is different encoding. I
mean there are some incompatible codepoints between both.

Comment 14 Tim Waugh 2006-08-17 11:31:25 UTC

I've modified paps to maps windows-932 to windows-31j, so hopefully that will
take care of it.

Comment 15 Tim Waugh 2006-08-17 11:58:47 UTC

Okay, fixed in paps-0.6.6-12.  Works fine here.

Comment 16 Akira TAGOH 2006-08-18 09:46:26 UTC

Well, again, how about comment #2, I mean the case for iso-2022-jp aka jis?
there are no locales' charset for iso-2022-jp though, how can I specify to give
an hint to cups which charset the document may prefers?
I'm still thinking that making CHARSET freely available would be much better
than adding any charset tables to CUPS individually - it sounded like
'document-charset' attribute is the way to go. but was it not the right solution?

Comment 17 Tim Waugh 2006-08-18 10:39:06 UTC

I just tried this:

iconv -f sjis -t iso-2022-jp testpage.sjis > testpage.iso-2022-jp
lpr -o'document-format=text/plain;charset=iso-2022-jp' testpage.iso-2022-jp

and it worked fine.

Comment 18 Tim Waugh 2006-08-18 11:04:21 UTC

Incidentally, 'LC_CTYPE=ja_JP.sjis locale charmap' doesn't say 'SJIS' but
instead gives an error that the locale doesn't exist -- and the same for
ja_JP.iso-2022-jp.

What locale are you setting for ISO-2022-JP as the charset encoding?

Comment 19 Akira TAGOH 2006-08-18 12:47:48 UTC

(In reply to comment #17)
> I just tried this:
> 
> iconv -f sjis -t iso-2022-jp testpage.sjis > testpage.iso-2022-jp
> lpr -o'document-format=text/plain;charset=iso-2022-jp' testpage.iso-2022-jp
> 
> and it worked fine.

Yes, it should works as I described the initial report. and what I was saying is
that it's little complex to specify the encoding for the document ;)

(In reply to comment #18)
> Incidentally, 'LC_CTYPE=ja_JP.sjis locale charmap' doesn't say 'SJIS' but
> instead gives an error that the locale doesn't exist -- and the same for
> ja_JP.iso-2022-jp.

Yes, because sjis locale isn't in the locale archives by default. you will need
to do localedef and apparently sjis charmap table is broken ;) but anyway.

> What locale are you setting for ISO-2022-JP as the charset encoding?

That's why I reopened this again.. no valid locale for ISO-2022-JP IIRC. so the
guess from locale doesn't help in this case and that's why I really need the
kind of CHARSET environment variable to specify the document encoding directly,
with the easier way.

Comment 20 Tim Waugh 2006-08-18 13:00:08 UTC

No, the only way to do this is 'document-format=text/plain;charset=...'. 
Otherwise the locale is used.

I don't expect there would be any buy-in at all from upstream for an extra
environment variable for the systemv/berkeley utilities to check -- they mostly
only *exist* for compatibility with older systems.