Description of problem: cups scans all available backends that cups supports on start/restart (and so on startup and after log rotation). the problem is a few condidtions cause the serial backend to hang and cups has no backend hang protection so all of cups hangs. The way it happens to us most often around here is to have a serial console running (I think it occurs more frequently if mgetty is idle due to nothing actually hooked up to /dev/ttyS0) Version-Release number of selected component (if applicable): all known versions How reproducible: very. Steps to Reproduce: 1. install cups 2. add a serial console by having an entry like this in /etc/inittab: 'S0:2345:respawn:/sbin/mgetty ttyS0 -r -s 38400' 3. restart cups and watch it hang Actual results: cups hangs Expected results: cups works Additional info: a bug talking about it in cups: http://www.cups.org/str.php?L633+P0+S0+C0+I0+E0+Qserial they pretty much blow the issue off and say it is an OS driver bug. wether or not the OS should return to them, I think cups should provide a config method (which they say they may do in a future version) to disable a given backend and/or have a sanity check to timeout the auto-conf probing. A note on the debain mailing list about the issue: http://lists.debian.org/debian-printing/2005/07/msg00125.html a note on a redhat mailing list about the issue: https://www.redhat.com/archives/shrike-list/2003-July/msg01567.html we currently work-around this by chmod'd 0644 the serial backend (the backends are installed as 0755). that causes cups to ignore the serial backend. but makes things like rpm -V fail and means have to do it after every rpm install/update. debian installs the backends in another dir and then symlinks in active backends as needed. mandrake (I think that is who it was) splits the serial backend out to a sub-package.
FWIW, I haven't managed to reproduce this problem using CUPS 1.2 on Fedora Core 5. I don't think we need to enable/disable backends. I think we need to fix the serial backend (or the kernel) not to hang. I have to say, looking at the serial backend code it seems more like a kernel problem. Each /dev/ttyS* device is opened with O_NONBLOCK, and then closed. Then the /dev/ttyUSB* devices are tried in the same manner. Seems like one or other of those kernel devices is not honouring O_NONBLOCK.
RHEL3 is now closed.
I don't understand comment #2. RHEL 3 isn't end of life'd yet. I just filed RH support tool request 872634 on this (as instructed previously by RH support staff for bugs in RHEL, filed a bugzilla bug and if assistance is needed file a RH support app request and cross-reference).
on comment #1, two things: 1) it looks like he tested under cups 1.2 wheeras rhel3 uses cups 1.1.17. From a quick glance through the release notes for cups 1.2 (which appears to not be final yet only in RC btw) they did some changes wrt scanning for printers. It sounds like some of these changes would allow cups to not lock up if a single backend auto-scanner locked up. 2) I don't have a copy of the POSIX spec handy. Also don't know if the man page is current, but the open(2) man page says this: O_NONBLOCK or O_NDELAY When possible, the file is opened in non-blocking mode. Neither the open nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait. For the handling of FIFOs (named pipes), see also fifo(4). This mode need not have any effect on files other than FIFOs. That sure sounds like O_NONBLOCK is not guaranteed to be honored for anything besides named pipes.
In response to comment #3: Update 8 is the final full-scale update to RHEL3 (general bug fixing, driver updates, new ISOs, etc.), and the deadline for U8 fixes was last week. At this point, RHEL3 is entering "Maintenance Mode", which means only critical security issues will get fixed (via individual package errata on demand).
from this page: http://www.redhat.com/security/updates/errata/ It sounds like bug fixes continue through phase II (deployment) which for RHEL3 is given as: Oct 31, 2006. FYI, phase I (full support) is listed on that page as not ending until the 30th.
Thanks for the link, Steven. I was under the impression that the "Deployment Phase" went from 2 years out to 2-1/2 years out. It looks like that period has been moved out by 6 months. I'll let my management clarify the discrepency (maybe it's a RHEL3 versus RHEL4 difference).
no problem. it actually took me a bit to find the link myself. And it was a good reminder to check out the dates for RHEL 2.1. We tend to upgrade slowly and so we still have a few rhel 2.1 systems that we are migrating to RHEL 3. we have started poking at RHEL4 to see what changes we will need to make to support it (vs rhel3), but probably won't have any deployments moving to it until end of this year at the earliest. RHEL 3 has been working good for us and none of our 3rd party vendors have asked for beyond rhel 3 yet.
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request.
I would like to get an explaination why with the product life cycle dates listed and commented on in this bug why they are going to be ignored for this particular issue. And the dates on the listed page haven't been altered either: http://www.redhat.com/security/updates/errata/ For Red Hat Enterprise Linux (version 3): Full Support (including hardware updates): Oct 23, 2003 -- April 30, 2006 Deployment Support: May 1, 2006 -- Jun 30, 2007 Maintenance Support: Jul 1, 2007 -- Oct 31, 2010 In my mind this bug should quality even for 'Deployment Support' fixing due to its severity and it even made it in before the 'Full Support' listed cutoff. This is a pretty severe issue. You can work around by manually removing the serial backend (or disable it permission wise) either of which violates the verification of the cups rpm. Unless you do that the server won't boot if you have cups installed and a serial console on the machine. And cups will not return from a regular weekly log rotate either.
Added in the errata URL to the URL field. In addition, our business has currently asked our group to look into our future plans for Linux servers. We currently have about 75 RHEL 3 servers and about 150 RH 7.3 servers. The RH7.3 servers we have been keeping up to date via fedora legacy and building RPMs with patches ourselves. We have dealt witht he cups issue on our rh7.3 servers, but for RHEL 3 we were awaiting an official fix from upstream since that is why we pay the support dollars. All of the approx. 225 servers listed above need to updated and we are currently in planning on what we are heading to. For the 75 RHEL 3, we have been targetting RHEL 5 (still in beta, but should be out well before we start live deployments). With the demise of Fedora Legacy (Which I see as the biggest shameful black eye to RedHat/Fedora Linux in recent history) about 90 of the 150 RH7.3 boxes we were pushing to head to RHEL 5 as well. But Management wants us to look into possibilities of cutting costs, and if the supported timeframes documented get ignored, it is really weighing on heading different routes like Fedore Core 6 or CentOS, or even looking at Suse.
I think one of the core issues here is, that you apparently reported the problem directly to Buzilla. Bugzilla at Red Hat is a development tool, but not a support tool. So while every bug Red Hat development handles, goes through Bugzilla, it still plays an important role, where the individual request originated from. - We are trying to ensure transparency to our customers and leverage the Linux community by making as many as possible of the entries publicly visible, but we can not guarantee proper handling of business critical issues via that interface. So if you have an issue that impacts your business and that you as a paying customer wish to see getting fixed, I have to ask you to report it via our support organization. Only Support is able to triage and prioritize customer issues correctly. In this specific case I also agree that our communication was less than optimal: We extended the full support phase for RHEL3 by releasing Update 8. This problem did not get handled in Update 8 because other issues where rated at a higher priority and the both, the available resource as well as the amount of change we can introduce, are limited. Now with the extension of the Full Support we also moved out the Deployment phase which will be concluded by a final bug-fix-only update release (3.9). On the other hand we established stricter inclusion criteria than for a normal update release. It is limited to high impact problems. In this context - with the limited 3.9 update, the admittedly non-perfect permission change workaround, and no prioritization from support - we considered this problem to not meet the criteria. So if you wish to pursue this issue, I'd like to ask you to open a case with our support organization and point them to this BZ entry. At this point I can only promise a review and not an actual fix though.
As mentioned in comment #3 I already have filed a support request on this. #872634. it is still open. I filed a bugzilla request as I was told by support personal way back on issues I should file a bugzilla bug and then a support case that refereneces it.
ok, if this really a kernel bug we need to know where the kernel is hanging to get a sense of where the problem is. When the problem is reproduced, can you please do an alt-sysrq-t. So we can see where the kernel is hung. thanks.
To me it doesn't seem like a kernel bug which is why I tagged it as a cups component. see my comment #4 as to why it seems like a cups code issue. It sounds like cups is expecting O_NONBLOCK to not block on a device special file, and from the man page it is only valid for named pipes.
1. install cups 2. add a serial console by having an entry like this in /etc/inittab: 'S0:2345:respawn:/sbin/mgetty ttyS0 -r -s 38400' 3. restart cups and watch it hang Interestingly starting cups with this line in /etc/inittab will not hang, but restarting cups will cause the hang on the start [root@amazon-6000 root]# service cups restart Stopping cups: [ OK ] Starting cups: [root@amazon-6000 root]# ps aux | grep cups root 1802 0.0 0.0 4204 1080 pts/1 S 13:26 0:00 /bin/sh /sbin/service cups restart root 1805 0.0 0.0 4244 1288 pts/1 S 13:26 0:00 /bin/sh /etc/init.d/cups restart root 1816 0.0 0.0 3612 628 pts/1 S 13:26 0:00 initlog -q -c cupsd root 1817 0.0 0.0 3888 860 pts/1 S 13:26 0:00 cupsd root 1818 0.0 0.0 7712 1800 ? S 13:26 0:00 cupsd root 1826 0.0 0.0 3612 844 ? S 13:26 0:00 /usr/lib/cups/backend/serial root 1828 0.0 0.0 3688 668 pts/0 S 13:28 0:00 grep cups
Created attachment 151092 [details] sysrq-t & w captured sysrq-t & w from the hang upon restart of cups.
Unfortunately this issue was not approved for inclusion in RHEL 3.9 and it is now too late as we are past Beta Freeze. Since RHEL 3.9 is the last release for RHEL 3, if you still want this issue fixed, please work with Red Hat Support and request an async errata.