Red Hat Bugzilla – Bug 154242
CUPS daemon stops accepting job when network printer unreachable
Last modified: 2007-11-30 17:07:06 EST
Description of problem:
The problem is nearly identical to the problem I reported 5-2004 (Bugzilla
124345). Since the that bug was fixed I haven't seen this issue in months,
however, approximately 1 month ago we upgraded from 13.3.16 to 13.3.27. Since
then we are seeing this same basic issue again, if there is a backend lpd
process attempting to spool to an unreachable network printer the cupsd process
hangs and it will completely stop accepting jobs from other clients.
CUPS clients will make a connection to the server, however, the connection will
simply hang forever. Running commands on the server like 'lpq -a' and 'lpstat'
will also hang forever.
One thing that is different about this case, if I kill the backend process, then
cupsd will continue without issues. With the previous bugzilla the backend
process would end up in an unkillable "zombie" state and you could only recover
by restarting the cupsd daemon.
It seems hard to reproduce, but it's hit me 3 times in four weeks and it kills
printing company wide.
Is it possible that recent security patches have somehow reintroduce an subtle
problem again? Please let me know what other information I can provide.
Version-Release number of selected component (if applicable):
Sometime, but not always
Steps to Reproduce:
1. Print jobs to an unreachable LPD printer
2. Let backend hang there, trying to print
CUPSD eventually hangs all connecting clients to all printers.
Printing to all online printers should continue without issues.
Are you able to reproduce this problem on demand with any reliability at all?
You say "sometime" as how reproducible it is -- do you mean the three times in
four weeks, or is it more frequent on a test machine?
When this occurs, in what way is the remote printer unreachable? Is there no
DNS entry, or are connections refused, or is there no response at all from the
remote IP address (or something else)?
The only patch between 1.1.17-13.3.16 and 1.1.17-13.3.27 that touches the
scheduler is cups-attrs.patch, to fix bug #107789. That bug certainly is fixed,
and the fix is certainly correct. The symptom had been a scheduler crash.
Could you please start by setting the LogLevel in /etc/cups/cupsd.conf to
"debug2"? That way we stand more of a chance of diagnosing the problem in
I have not been able to reproduce this issue reliably. I suceceeded in hanging
the print server one time on a test box. We have recently upgraded our backup/
standby printer server to RHEL4 and are testing it but have not been able to
reproduce the issue on the much newer version of CUPS included in this release.
If testing continues to go well we may just upgrade our primary box.
That being said I'm pretty sure this is a real problem, possibly not related to
the previous bug at all. After the fix for 124345 I've actually received a few
mails from others reporting similar hangs even with versions that contain that
fix. Most had similar environments as ours and saw random hangs. Here's a
> We have 120 printers and 250 users on the system. The application is
> character based (written in Business BASIC) and all it does is
> printing using the lp -d command to JetDirect printers.
> Your problem seemed to be about the same, because cups hangs when the
> system gets busy at the end of the day when all users start printing
> at about the same time, and I found a job that could not be printed
> because the networkprinter was powered off.
At the time this was reported to me we were not seeing an issue, but it has
returned with a vengence the last few weeks.
When we see the failure the printer is unreachable, usually powered off, or
perhaps a network/WAN outage. DNS still resolved just no response from that host.
It hangs the entire print system, anything attempting to talk to cups just hangs
indefinitely including local commands such as lpstat. This is not as bad as the
hang with bug 124345 as if I manually kill the lpd backend process the cupsd
process will recover. With the previous bug, it required killing the cupsd
process, so it may be unrelated.
Also, about a third of our printers use the lpd backend, while the rest use the
sockect backend. We've only seen this issue with the lpd backend.
I will increase the log level and hope it happens again when I can capture it.
Is there anything else I can do while it's hung to capture information? Because
it is our enterprise printing enviroment I usually have only a short amount of
time, but anything I can do I will try.
> Is there anything else I can do while it's hung to capture information?
Yes, there is. You haven't set the 'hardware' field of this report, but
presuming it is i386 please fetch and install this debuginfo package:
This package does not interfere with the running CUPS program, but instead
provides extra files that allow the debugger to make more sense of the running
Then, if/when the cupsd process hangs, it would be extremely useful to see the
debugger output. Become root using 'su -', then use the 'script' command to
start recording output, then do the following:
ps axf | grep [c]upsd
This will show you the process ID of the cupsd process. Let's say it's 637.
Then, attach the debugger:
gdb /usr/sbin/cupsd 637
(obviously with the correct PID instead)
At the (gdb) prompt, you can then find out where the program is stopped:
Then, it would be handy to see what the local variables are at each step on the
stack, like this:
(gdb) info locals
(gdb) info locals
..until it says you can't go up the stack any more.
Unfortunately, now that I'm ready to do something when it fails, we haven't seen
the failure in weeks. I'm going to try harder to reproduce this in the lab next
week as I did look back at our error cases and found a couple of things in common.
The hang always occurred when a job was already partially spooled to the printer
and the printer failed in some way (for example a printer jam) and the printer
was then powered off and left that way until a technician could arrive to repair
it. I'm hoping I can reproduce that environment and perhaps get lucky.
Okay. Fingers crossed we can catch this with debuginfo..
Well, after a couple of months of running clean, we have suddenly hit this issue
with a vengence over the last two weeks or so. We've had about 5 total hangs in
the last 10 days. Can I get a debuginfo package for the latest cups release
(1.1.17-13.3.31). Hopefully with that we can catch it.
Here it is:
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.