206259 – cups won't accept a text file with a copyright symbol in it

Bug 206259 - cups won't accept a text file with a copyright symbol in it

Summary: cups won't accept a text file with a copyright symbol in it

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	paps
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Akira TAGOH
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	FC6Target
TreeView+	depends on / blocked

Reported:	2006-09-13 10:19 UTC by Jonathan Kamens
Modified:	2007-11-30 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:	0.6.6-15
Clone Of:
Environment:
Last Closed:	2006-09-29 15:10:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jonathan Kamens 2006-09-13 10:19:40 UTC

If I print a text file which has a copyright symbol in it (decimal character
169) but is otherwise real ascii, to a printer configured to use the hpijs
"Generic PCL5e Printer Foomatic" driver (the recommended driver), what comes out
of the printer is a blank page instead of the contents of the text file.

If I remove the copyright symbol, the file prints fine.

There is nothing in the cups error_log to explain what goes wrong.  However,
when I invoke texttopaps by hand, I see this message to stderr: "(null): Invalid
character in input".  Note that despite displaying this message, texttops still
produces a PostScript file containing the contents of the text file, although
the line containing the copyright character is omitted.

I believe this wasn't a problem in earlier cups versions.

As a point of information, a2ps has no trouble with this file.

I have cups-1.2.2-13 and foomatic-3.0.2-37.

Comment 1 Tim Waugh 2006-09-13 10:30:59 UTC

"copyright symbol in it (decimal character 169)" tells me that your file is in
some sort of ISO-8859-x encoding or other.

Did you print the file using 'lp
-odocument-format=text/plain;charset=iso-8859-whatever', or instead did you set
your locale correctly in the environment you submitted the job from?

Comment 2 Jonathan Kamens 2006-09-13 14:51:55 UTC

It's not in any encoding, as far as I know.  It's a simple plain-text file, not
exactly ASCII but rather the bastardized ASCII that Microsoft inflicted upon us
all when they started putting weird quotation marks, (R), (C), TM, etc. in files
that they were calling text.  If there's an iso-8859-x encoding corresponding to
this, I don't know what it is.

Cups used to print files containing this character without any trouble.  I
haven't changed anything; it's cups that has changed.

Furthermore, a2ps prints this file without any problem, and I'm calling a2ps
from the same locale that I'm calling lpr from.

In short, I don't really care whether it's "correct" that this file has a
high-bit character in it which isn't, strictly speaking, an ASCII character. 
The reality is that text files have such characters in them all the time
nowadays, cups should be able to print them rather than choking on them, and it
used to do so and doesn't any longer.

Comment 3 Tim Waugh 2006-09-13 15:07:49 UTC

A quick test shows that the (C) symbol is indeed the single byte 0xa9 in
iso-8859-1, so that's probably the encoding your document is in. (Yes, it's
certainly in an encoding!  All plain text is, even if that encoding is ASCII.)

The texttops filter shipped with upstream CUPS cannot handle UTF-8 at all, and
by chance happens to handle your copyright character.  Put this in context: the
default encoding for the entire distribution is UTF-8, but the print spooler
wouldn't accept the vast majority of UTF-8 characters.

We now ship a text->PS filter based on paps and this allows us to generate
correct output for UTF-8 text files.  http://paps.sourceforge.net/ has an
example of this.

So the solution is for you to
a) convert your document to UTF-8 using
  iconv -f iso-8859-1 -t utf-8 < in > out
or
b) print your document with a 'document-format=text/plain;charset=iso-8859-1'
option so that the text filter has a sporting chance of knowing what encoding
you expect it to be using.

I'm going to reassign this bug to the paps component, because I think that, if
possible, it would be much better if the individual bytes not understood could
be omitted from the line (perhaps with '?' or the box character substituted),
rather than the entire line being missed out.

Comment 4 Jonathan Kamens 2006-09-13 15:12:41 UTC

I think you might have misunderstood part of my bug.  When I print the file
through cups with lpr, it doesn't print *at all*.  The printer spits out a blank
page.  When I call texttopaps directly, the resulting postscript file is missing
the line with the copyright symbol, as I previously mentioned.  As you point
out, throwing away the entire line is worse than just throwing away the
misunderstood symbol, but even that would be better than one cups is doing now,
i.e., throwing away the entire file.

I don't know why cups generates a blank page even though texttopaps is
generating a postscript file with content in it.

Incidentally, I still think this is at its root a cups bug and that's where it
should be address.  In my mind, sort of by definition, something that worked
before and doesn't anymore is a bug.

Perhaps the paps encoder needs to be smarter about guessing encodings, such that
it could reasonably look at an input file, guess that it's iso-8859-1, and then
print it appropriately.  there's *lots* of software that guesses encodings
rather successfully when no encoding is specified.

Comment 5 Akira TAGOH 2006-09-14 05:09:42 UTC

(In reply to comment #3)
> I'm going to reassign this bug to the paps component, because I think that, if
> possible, it would be much better if the individual bytes not understood could
> be omitted from the line (perhaps with '?' or the box character substituted),
> rather than the entire line being missed out.

I don't think that it's possible since Pango itself needs UTF-8 strings and paps
itself also expects UTF-8 strings of course. so I'd suggest b) solution.
otherwise this issue should be an feature request like to guess the encodings.
it's quite hard to support all the encodings I'm sure though, but anyway.

BTW Tim, does CUPS assume that any errors happened when something is output to
stderr?

Comment 6 Akira TAGOH 2006-09-14 05:29:35 UTC

(In reply to comment #4)
> I think you might have misunderstood part of my bug.  When I print the file
> through cups with lpr, it doesn't print *at all*.  The printer spits out a blank

I'm still not sure why CUPS doesn't print out the incomplete PS file to the
printer though,

> page.  When I call texttopaps directly, the resulting postscript file is missing
> the line with the copyright symbol, as I previously mentioned.  As you point

paps gives up to parse a file when any invalid character appears. it may be
useful to just use paps as filter tool, because it may be likely to be given any
binary files, and it will possibly causes paps may generates a bigger PS file
that entirely prints out the broken things. however when working as CUPS filter,
we are able to assume that CUPS will invokes texttopaps for only text/plain
files. so I can get rid of this limitation for this purpose only.

> Perhaps the paps encoder needs to be smarter about guessing encodings, such that
> it could reasonably look at an input file, guess that it's iso-8859-1, and then
> print it appropriately.  there's *lots* of software that guesses encodings
> rather successfully when no encoding is specified.
> 

There are no softwares that perfectly working on the guess encodings for all the
encodings. if possible, it would be ideal though, I often saw the misguessing
encodings on even emacs.

Comment 7 Tim Waugh 2006-09-29 15:10:42 UTC

The problem is that paps gives up on the whole file if there is an encoding
error detected by iconv.  The iconv code gets executed even when the input file
is (expected to be) UTF-8 because of the CHARSET code in the paps-cups patch.

I've modified the paps-cups patch to avoid ever calling iconv_open("UTF-8",
"UTF-8"), and I get much better results now.

paps-0.6.6-15 is the fixed package.

Comment 8 Akira TAGOH 2006-09-29 17:25:27 UTC

Ah, thanks for tracking this down, Tim. I forgot to set CHARSET during testing
at all. that's why I didn't see ;)

Note You need to log in before you can comment on or make changes to this bug.