Bug 189454 - cups serial backend often causes cups to hang on start/restart
Summary: cups serial backend often causes cups to hang on start/restart
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: cups
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Red Hat Kernel Manager
QA Contact: Brian Brock
URL: http://www.redhat.com/security/update...
Whiteboard: NdRvw
Depends On:
Blocks: 190430
TreeView+ depends on / blocked
 
Reported: 2006-04-20 03:33 UTC by Steven Roberts
Modified: 2018-10-19 20:41 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 15:42:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sysrq-t & w (46.47 KB, application/octet-stream)
2007-03-28 03:52 UTC, Norm Murray
no flags Details

Description Steven Roberts 2006-04-20 03:33:36 UTC
Description of problem: 
cups scans all available backends that cups supports on start/restart (and so 
on startup and after log rotation). 
 
the problem is a few condidtions cause the serial backend to hang and cups has 
no backend hang protection so all of cups hangs. 
 
The way it happens to us most often around here is to have a serial console 
running (I think it occurs more frequently if mgetty is idle due to nothing 
actually hooked up to /dev/ttyS0) 
 
 
Version-Release number of selected component (if applicable): 
all known versions 
 
How reproducible: 
very. 
 
Steps to Reproduce: 
1. install cups 
2. add a serial console by having an entry like this in /etc/inittab: 
'S0:2345:respawn:/sbin/mgetty ttyS0 -r -s 38400' 
3. restart cups and watch it hang 
   
Actual results: 
cups hangs 
 
Expected results: 
cups works 
 
Additional info: 
a bug talking about it in cups: 
http://www.cups.org/str.php?L633+P0+S0+C0+I0+E0+Qserial 
 
they pretty much blow the issue off and say it is an OS driver bug. 
wether or not the OS should return to them, I think cups should provide a 
config method (which they say they may do in a future version) to disable a 
given backend and/or have a sanity check to timeout the auto-conf probing. 
 
A note on the debain mailing list about the issue: 
http://lists.debian.org/debian-printing/2005/07/msg00125.html 
 
a note on a redhat mailing list about the issue: 
https://www.redhat.com/archives/shrike-list/2003-July/msg01567.html 
 
we currently work-around this by chmod'd 0644 the serial backend (the backends 
are installed as 0755).  that causes cups to ignore the serial backend.  but 
makes things like rpm -V fail and means have to do it after every rpm 
install/update. 
 
debian installs the backends in another dir and then symlinks in active 
backends as needed.  mandrake (I think that is who it was) splits the serial 
backend out to a sub-package.

Comment 1 Tim Waugh 2006-04-24 16:39:07 UTC
FWIW, I haven't managed to reproduce this problem using CUPS 1.2 on Fedora Core 5.

I don't think we need to enable/disable backends.  I think we need to fix the
serial backend (or the kernel) not to hang.  I have to say, looking at the
serial backend code it seems more like a kernel problem.

Each /dev/ttyS* device is opened with O_NONBLOCK, and then closed.  Then the
/dev/ttyUSB* devices are tried in the same manner.

Seems like one or other of those kernel devices is not honouring O_NONBLOCK.

Comment 2 Ernie Petrides 2006-04-24 19:53:12 UTC
RHEL3 is now closed.

Comment 3 Steven Roberts 2006-04-24 22:39:39 UTC
I don't understand comment #2.  RHEL 3 isn't end of life'd yet.  I just filed RH
support tool request 872634 on this (as instructed previously by RH support
staff for bugs in RHEL, filed a bugzilla bug and if assistance is needed file a
RH support app request and cross-reference).

Comment 4 Steven Roberts 2006-04-24 22:46:13 UTC
on comment #1, two things:

1) it looks like he tested under cups 1.2 wheeras rhel3 uses cups 1.1.17.  From
a quick glance through the release notes for cups 1.2 (which appears to not be
final yet only in RC btw) they did some changes wrt scanning for printers.  It
sounds like some of these changes would allow cups to not lock up if a single
backend auto-scanner locked up.

2) I don't have a copy of the POSIX spec handy.  Also don't know if the man page
is current, but the open(2) man page says this:
       O_NONBLOCK or O_NDELAY
              When possible, the file is opened in non-blocking mode.  Neither
              the  open  nor  any subsequent operations on the file descriptor
              which is returned will cause the calling process to  wait.   For
              the  handling  of  FIFOs  (named pipes), see also fifo(4).  This
              mode need not have any effect on files other than FIFOs.
That sure sounds like O_NONBLOCK is not guaranteed to be honored for anything
besides named pipes.

Comment 5 Ernie Petrides 2006-04-24 23:12:54 UTC
In response to comment #3: Update 8 is the final full-scale update
to RHEL3 (general bug fixing, driver updates, new ISOs, etc.), and
the deadline for U8 fixes was last week.  At this point, RHEL3 is
entering "Maintenance Mode", which means only critical security
issues will get fixed (via individual package errata on demand).

Comment 6 Steven Roberts 2006-04-24 23:38:57 UTC
from this page: 
http://www.redhat.com/security/updates/errata/ 
 
It sounds like bug fixes continue through phase II (deployment) which for 
RHEL3 is given as: Oct 31, 2006. 
 
FYI, phase I (full support) is listed on that page as not ending until the 
30th. 

Comment 7 Ernie Petrides 2006-04-25 00:03:18 UTC
Thanks for the link, Steven.  I was under the impression that the
"Deployment Phase" went from 2 years out to 2-1/2 years out.  It
looks like that period has been moved out by 6 months.

I'll let my management clarify the discrepency (maybe it's a RHEL3
versus RHEL4 difference).


Comment 8 Steven Roberts 2006-04-25 00:35:49 UTC
no problem.  it actually took me a bit to find the link myself.  And it was a 
good reminder to check out the dates for RHEL 2.1.  We tend to upgrade slowly 
and so we still have a few rhel 2.1 systems that we are migrating to RHEL 3. 
 
we have started poking at RHEL4 to see what changes we will need to make to 
support it (vs rhel3), but probably won't have any deployments moving to it 
until end of this year at the earliest.  RHEL 3 has been working good for us 
and none of our 3rd party vendors have asked for beyond rhel 3 yet. 

Comment 11 RHEL Program Management 2007-01-18 15:41:41 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Comment 12 Steven Roberts 2007-01-21 19:53:30 UTC
I would like to get an explaination why with the product life cycle dates listed
and commented on in this bug why they are going to be ignored for this
particular issue.

And the dates on the listed page haven't been altered either:
http://www.redhat.com/security/updates/errata/

For Red Hat Enterprise Linux (version 3):
Full Support (including hardware updates): Oct 23, 2003 -- April 30, 2006
Deployment Support: May 1, 2006 -- Jun 30, 2007
Maintenance Support: Jul 1, 2007 -- Oct 31, 2010

In my mind this bug should quality even for 'Deployment Support' fixing due to
its severity and it even made it in before the 'Full Support' listed cutoff.

This is a pretty severe issue.  You can work around by manually removing the
serial backend (or disable it permission wise) either of which violates the
verification of the cups rpm.  Unless you do that the server won't boot if you
have cups installed and a serial console on the machine.  And cups will not
return from a regular weekly log rotate either.

Comment 13 Steven Roberts 2007-01-21 20:08:36 UTC
Added in the errata URL to the URL field.  In addition, our business has
currently asked our group to look into our future plans for Linux servers.

We currently have about 75 RHEL 3 servers and about 150 RH 7.3 servers.  The
RH7.3 servers we have been keeping up to date via fedora legacy and building
RPMs with patches ourselves.  We have dealt witht he cups issue on our rh7.3
servers, but for RHEL 3 we were awaiting an official fix from upstream since
that is why we pay the support dollars.

All of the approx. 225 servers listed above need to updated and we are currently
in planning on what we are heading to.  For the 75 RHEL 3, we have been
targetting RHEL 5 (still in beta, but should be out well before we start live
deployments).  With the demise of Fedora Legacy (Which I see as the biggest
shameful black eye to RedHat/Fedora Linux in recent history) about 90 of the 150
RH7.3 boxes we were pushing to head to RHEL 5 as well.

But Management wants us to look into possibilities of cutting costs, and if the
supported timeframes documented get ignored, it is really weighing on heading
different routes like Fedore Core 6 or CentOS, or even looking at Suse.

Comment 14 Daniel Riek 2007-01-24 22:29:45 UTC
I think one of the core issues here is, that you apparently reported the
problem directly to Buzilla. 

Bugzilla at Red Hat is a development tool, but not a support tool. So
while every bug Red Hat development handles, goes through Bugzilla, it
still plays an important role, where the individual request originated
from. - We are trying to ensure transparency to our customers and
leverage the Linux community by making as many as possible of the
entries publicly visible, but we can not guarantee proper handling of
business critical issues via that interface. 

So if you have an issue that impacts your business and that you as a
paying customer wish to see getting fixed, I have to ask you to report
it via our support organization. Only Support is able to triage and
prioritize customer issues correctly.

In this specific case I also agree that our communication was less than
optimal: 

We extended the full support phase for RHEL3 by releasing Update 8. This
problem did not get handled in Update 8 because other issues where rated
at a higher priority and the both, the available resource as well as the
amount of change we can introduce, are limited. 

Now with the extension of the Full Support we also moved out the
Deployment phase which will be concluded by a final bug-fix-only update
release (3.9). On the other hand we established stricter inclusion
criteria than for a normal update release. It is limited to high impact
problems.

In this context - with the limited 3.9 update, the admittedly
non-perfect permission change workaround, and no prioritization from
support - we considered this problem to not meet the criteria.

So if you wish to pursue this issue, I'd like to ask you to open a case
with our support organization and point them to this BZ entry.

At this point I can only promise a review and not an actual fix though.

Comment 15 Steven Roberts 2007-01-26 09:18:34 UTC
As mentioned in comment #3 I already have filed a support request on this. 
#872634.  it is still open.  I filed a bugzilla request as I was told by support
personal way back on issues I should file a bugzilla bug and then a support case
that refereneces it.

Comment 17 Jason Baron 2007-03-27 20:40:57 UTC
ok, if this really a kernel bug we need to know where the kernel is hanging to
get a sense of where the problem is. When the problem is reproduced, can you
please do an alt-sysrq-t. So we can see where the kernel is hung. thanks.

Comment 18 Steven Roberts 2007-03-27 21:26:48 UTC
To me it doesn't seem like a kernel bug which is why I tagged it as a cups 
component.  see my comment #4 as to why it seems like a cups code issue.

It sounds like cups is expecting O_NONBLOCK to not block on a device special 
file, and from the man page it is only valid for named pipes.

Comment 19 Norm Murray 2007-03-28 03:44:05 UTC
1. install cups 
2. add a serial console by having an entry like this in /etc/inittab: 
'S0:2345:respawn:/sbin/mgetty ttyS0 -r -s 38400' 
3. restart cups and watch it hang 

Interestingly starting cups with this line in /etc/inittab will not hang, but
restarting cups will cause the hang on the start

[root@amazon-6000 root]# service cups restart
Stopping cups:                                             [  OK  ]
Starting cups:

[root@amazon-6000 root]# ps aux | grep cups
root      1802  0.0  0.0  4204 1080 pts/1    S    13:26   0:00 /bin/sh
/sbin/service cups restart
root      1805  0.0  0.0  4244 1288 pts/1    S    13:26   0:00 /bin/sh
/etc/init.d/cups restart
root      1816  0.0  0.0  3612  628 pts/1    S    13:26   0:00 initlog -q -c cupsd
root      1817  0.0  0.0  3888  860 pts/1    S    13:26   0:00 cupsd
root      1818  0.0  0.0  7712 1800 ?        S    13:26   0:00 cupsd
root      1826  0.0  0.0  3612  844 ?        S    13:26   0:00
/usr/lib/cups/backend/serial
root      1828  0.0  0.0  3688  668 pts/0    S    13:28   0:00 grep cups



Comment 20 Norm Murray 2007-03-28 03:52:02 UTC
Created attachment 151092 [details]
sysrq-t & w 

captured sysrq-t & w from the hang upon restart of cups.

Comment 21 Suzanne Logcher 2007-05-08 15:37:15 UTC
Unfortunately this issue was not approved for inclusion in RHEL 3.9 and it is
now too late as we are past Beta Freeze.

Since RHEL 3.9 is the last release for RHEL 3, if you still want this issue
fixed, please work with Red Hat Support and request an async errata.

Comment 22 RHEL Program Management 2007-05-08 15:42:03 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 


Note You need to log in before you can comment on or make changes to this bug.