Red Hat Bugzilla – Bug 197577
'LC_ALL=ja_JP.sjis lpr testpage.sjis' fails to print
Last modified: 2007-11-30 17:11:36 EST
Description of problem:
right now CUPS has only a way to specify the character sets with lpr options,
such as lpr -o "document-format=text/plain;charset=sjis" sjis.txt. but it looks
like too long and enforcing to describe a MIME type too is complex a bit and
hard for the newbie's knowledge I guess. the enrivonment variables may be a
little hard as well though, it may be better than that.
However it doesn't works in some cases. I mean, CUPS tries to guess a charset
from locale's charset fields though, for example, CUPS doesn't detect sjis from
LANG=ja_JP.sjis. aside from that, there are no common sense to specify
iso-2022-jp for locale. so I'm thinking that using CHARSET directly may be
better. but CUPS overwrites CHARSET. so there are no better way to set it now.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.LANG=ja_JP.sjis lpr testpage.sjis
2.CHARSET=sjis LANG=ja_JP.UTF-8 lpr testpage.sjis
either testcases prints out the empty page.
I don't mind about 1. whether it works or not. but if CHARSET is already set,
CUPS shouldn't set anything from LANG etc.
I think this is because CUPS doesn't support SJIS. In particular, it does not
appear in the locale_encodings array in cups/language.c.
this isn't an issue for only sjis. as I mentioned in the initial report, think
about iso-2022-jp case too. IMHO the above suggestion should be reasonable
rather than cutting from LANG with the unusual locale-like strings.
adding any encodings to locale_encodings doesn't matter. actually 'lpr -o
"document-format=text/plain;charset=iso-2022-jp" testpage.jis' works and fine
for sjis as well. but I'd suggest much better way to specify charset behind the
In addition, in the printing area, IMHO the encoding what people wants to print
out shouldn't be necessarily the same to the current locale. I guess CUPS may
wants to keep the backword compatibility with LPRng though, relying on the
locale anyway doesn't make sense to me.
The easier way to specify the document encoding is:
lp -o document-charset=windows-932 ...
The problem with SJIS in particular is that it is not an ISO-registered name.
Apparently windows-932 (which is registered) corresponds to Shift-JIS. The
upstream CUPS maintainer will add some special-casing for 'SJIS' etc.
See http://www.cups.org/str.php?L1819, second comment.
hmm, does this mean that referring to the document-charset attribute would be
better than CHARSET environment variable? and the document-charset would
overwrites CHARSET for the expected encoding?
What CHARSET environment variable? I've never heard of it before this bug
report, and assumed it was something that you were proposing should be introduced.
CUPS is set a charset that is actually came from current locale to CHARSET and
then invokes filters, though. it may be an internal environment variables
though, apparently texttops is referring to CHARSET to determine a document
charset instead of looking up the "document-charset" attribute.
That's why I'm proposing this. after some testing, "document-charset" attribute
doesn't affects CHARSET. so I wanted to know which one should be used if both
are specified - presumably filters should proiritize "document-charset" if it's
Oh, that CHARSET; yes, that's a CUPS-internal one only.
'document-charset' certainly *ought* to work because it is in the IPP standard.
It looks like CUPS isn't honouring that.
I've filed a bug upstream about this:
ok, I see. thanks for the clarification.
Looks like I was told wrong about document-charset after all. :-(
The thing stopping this command:
LC_ALL=ja_JP.sjis lpr testpage.sjis
from printing anything seems to be that CUPS is translating it to 'windows-932'
instead of 'windows-31j'. I've asked about this upstream:
well, windows-932 should be much familiar with CP932, which iconv also supports.
one more thing... in a narrow sense CP932 and Shift_JIS is different encoding. I
mean there are some incompatible codepoints between both.
I've modified paps to maps windows-932 to windows-31j, so hopefully that will
take care of it.
Okay, fixed in paps-0.6.6-12. Works fine here.
Well, again, how about comment #2, I mean the case for iso-2022-jp aka jis?
there are no locales' charset for iso-2022-jp though, how can I specify to give
an hint to cups which charset the document may prefers?
I'm still thinking that making CHARSET freely available would be much better
than adding any charset tables to CUPS individually - it sounded like
'document-charset' attribute is the way to go. but was it not the right solution?
I just tried this:
iconv -f sjis -t iso-2022-jp testpage.sjis > testpage.iso-2022-jp
lpr -o'document-format=text/plain;charset=iso-2022-jp' testpage.iso-2022-jp
and it worked fine.
Incidentally, 'LC_CTYPE=ja_JP.sjis locale charmap' doesn't say 'SJIS' but
instead gives an error that the locale doesn't exist -- and the same for
What locale are you setting for ISO-2022-JP as the charset encoding?
(In reply to comment #17)
> I just tried this:
> iconv -f sjis -t iso-2022-jp testpage.sjis > testpage.iso-2022-jp
> lpr -o'document-format=text/plain;charset=iso-2022-jp' testpage.iso-2022-jp
> and it worked fine.
Yes, it should works as I described the initial report. and what I was saying is
that it's little complex to specify the encoding for the document ;)
(In reply to comment #18)
> Incidentally, 'LC_CTYPE=ja_JP.sjis locale charmap' doesn't say 'SJIS' but
> instead gives an error that the locale doesn't exist -- and the same for
Yes, because sjis locale isn't in the locale archives by default. you will need
to do localedef and apparently sjis charmap table is broken ;) but anyway.
> What locale are you setting for ISO-2022-JP as the charset encoding?
That's why I reopened this again.. no valid locale for ISO-2022-JP IIRC. so the
guess from locale doesn't help in this case and that's why I really need the
kind of CHARSET environment variable to specify the document encoding directly,
with the easier way.
No, the only way to do this is 'document-format=text/plain;charset=...'.
Otherwise the locale is used.
I don't expect there would be any buy-in at all from upstream for an extra
environment variable for the systemv/berkeley utilities to check -- they mostly
only *exist* for compatibility with older systems.