Description of problem: Occasionally I have an IPP process running at 100% processor. This seems to come from occasional print jobs but I have been unable to find out what sort of jobs cause it. It doesn't do any harm other than slow down the print server. However, I have done strace -p on the process, and here's what I get: time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 recv(3, "", 2048, 0) = 0 time(NULL) = 1056471483 which goes on forever. Does this help? I can try to get more info if you give me some pointers on how to gathrer it. Sorry to be so ague, I'm hoping the strace is useful. Version-Release number of selected component (if applicable): cups-1.1.17-13.3 How reproducible: Well, at least on my print server, let people print for a week or so and another process will appear. Steps to Reproduce: 1. print many files 2. eventually one causes IPP at 100% 3. Actual results: Expected results: Additional info: Not actually sure if this means some files fail to print. I will ask my users if you think this gives useful insight into the problem. Thanks a lot. Paul Mitcheson
This will be due to rpc.mountd or something listening on port 631. The cups initscript has been changed to make it start before nfs. *** This bug has been marked as a duplicate of 97767 ***
Thanks for the quick response. However, I'm not running nfsd and cups still works - it doesn't just exit on starting, just occasionally I see this 100%, but cups is still running and does still print, it's just that the load on the machine is higher. Doesn't seem that rpc.mountd is running on that port. [root@capprint root]# rpcinfo -p program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 32768 status 100024 1 tcp 32768 status 100007 2 udp 941 ypbind 100007 1 udp 941 ypbind 100007 2 tcp 944 ypbind 100007 1 tcp 944 ypbind 391002 2 tcp 32770 sgi_fam Probably you are correct and I jsut don't understand something... which has so often been the case recently for me :-( Thanks, Paul
Please run 'netstat -tnlp | grep :631' as root when you see the errant behaviour. Perhaps cups needs to start before portmapper, not just nfs.
*** Bug 98203 has been marked as a duplicate of this bug. ***
*** Bug 92341 has been marked as a duplicate of this bug. ***
Here's the o/p of netstat: [root@capprint root]# netstat -tnlp | grep :631 tcp 0 0 0.0.0.0:631 0.0.0.0:* LISTEN 729/cupsd This time it happened when a print job was cancelled. Nothing shows in the error log. Regards, Paul
Please fetch and install this package: ftp://people.redhat.com/twaugh/tmp/cups-debuginfo-1.1.17-13.3.i386.rpm This is the source code and debug information for the cups package, which allows the debugger (gdb) to show more information about what's going on. Next time lpstat busy loops please attach gdb to the process like this: gdb /usr/bin/lpstat 1702 (for a PID of 1702) Then at the '(gdb)' prompt run the command 'bt' to get a backtrace, and paste that in here.
Hi Tim, OK, I got the error again (actually there are now 3 ipp process running so the load is 3.0) here's the back trace of them: Attaching to program: /usr/bin/lpstat, process 10843 0xffffe002 in ?? () (gdb) bt #0 0xffffe002 in ?? () #1 0x40021b91 in ?? () #2 0x40020b83 in ?? () #3 0x400291f0 in ?? () #4 0x08049904 in main (argc=6, argv=0xbffff8f4) at lpstat.c:501 #5 0x420156a4 in ?? () They all give exactly the same back trace. Does this help, Tim? I have killed 2 processes but left one running (and am coping with slow printing in case you ned more info) Thanks a lot, Paul
Had you installed the cups-debuginfo package? That backtrace has no useful symbols.
yeap, it's installed: [root@capprint root]# rpm -query cups-debuginfo cups-debuginfo-1.1.17-13.3 OK, what should I do next? Paul
Perhaps the version of gdb in Red Hat Linux 9 is having trouble with that debuginfo package. Would it be possible for you to use the current rawhide version of gdb at all?
[root@capprint root]# gdb /usr/bin/lpstat 9717 GNU gdb Red Hat Linux (5.3.90-0.20030710.21rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". This looks even less useful: [root@capprint root]# rpm -query gdb gdb-5.3.90-0.20030710.21 [root@capprint root]# gdb /usr/bin/lpstat 9717 Attaching to program: /usr/bin/lpstat, process 9717 0x40021b11 in ?? () (gdb) bt #0 0x40021b11 in ?? () #1 0x080551a8 in ?? () #2 0xbffea140 in ?? () #3 0x00000008 in ?? () #4 0x0805a9a8 in ?? () #5 0x00000003 in ?? () #6 0x00000015 in ?? () #7 0x42132320 in ?? () Paul
Grr. :-( Okay, another plan: please use gcore to create a core dump from the running lpstat process, and mail it to me at twaugh. (gcore is in the gdb package.) I'll try to get some sense out of it from this end. Does strace still show time() and recv() alternating by the way?
strace -p stil gives: time(NULL) = 1062415043 recv(3, "", 2048, 0) = 0 time(NULL) = 1062415043 recv(3, "", 2048, 0) = 0 time(NULL) = 1062415043 recv(3, "", 2048, 0) = 0 time(NULL) = 1062415043 will mail you the core dump now...
Figured it out: it needs to be /usr/lib/cups/backend/ipp that you tell gdb to look at. My fault, I brought up lpstat first. I got confused about which process was looping. Here's what I get from your (*very* useful) core file: #0 0xffffe002 in ?? () #1 0x4001df58 in httpRead (http=0x8056de8, buffer=0xbffea140 "\003", length=134573544) at http.c:921 #2 0x40021b91 in ipp_read (http=0x8056de8, buffer=0xbffea140 "\003", length=8) at ipp.c:2057 #3 0x40020b83 in ippRead (http=0x80551a8, ipp=0x805ac30) at ipp.c:866 #4 0x400291f0 in cupsDoFileRequest (http=0x80551a8, request=0x8057ee0, resource=0xbfffd590 "/ipp/", filename=0xbfffd190 "/var/spool/cups/tmp/3f4cbfc6127c1") at util.c:456 #5 0x08049904 in main (argc=6, argv=0xbfffdff4) at ipp.c:589 #6 0x420156a4 in __libc_start_main () from /lib/tls/libc.so.6
OK, Tim - gdb rawhide attached to the ipp process: GNU gdb Red Hat Linux (5.3.90-0.20030710.21rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". Attaching to program: /usr/lib/cups/backend/ipp, process 9717 Reading symbols from /usr/lib/libcups.so.2...Reading symbols from /usr/lib/debug//usr/lib/libcups.so.2.debug...done. done. Loaded symbols for /usr/lib/libcups.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/libssl.so.4...done. Loaded symbols for /lib/libssl.so.4 Reading symbols from /lib/libcrypto.so.4...done. Loaded symbols for /lib/libcrypto.so.4 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /usr/kerberos/lib/libgssapi_krb5.so.2...done. Loaded symbols for /usr/kerberos/lib/libgssapi_krb5.so.2 Reading symbols from /usr/kerberos/lib/libkrb5.so.3...done. Loaded symbols for /usr/kerberos/lib/libkrb5.so.3 Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...done. Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3 Reading symbols from /usr/kerberos/lib/libcom_err.so.3...done. Loaded symbols for /usr/kerberos/lib/libcom_err.so.3 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0xffffe002 in ?? () (gdb) bt #0 0xffffe002 in ?? () #1 0x420df321 in recv () from /lib/tls/libc.so.6 #2 0x4001e33a in httpGets (line=0xbffea080 "\211Y\001Blò\002@2j\tBoâ\001@", length=32, http=0x80551a8) at http.c:1307 #3 0x4001df58 in httpRead (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at http.c:921 #4 0x40021b91 in ipp_read (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at ipp.c:2057 #5 0x40020b83 in ippRead (http=0x80551a8, ipp=0x805ac30) at ipp.c:866 #6 0x400291f0 in cupsDoFileRequest (http=0x80551a8, request=0x8057ee0, resource=0xbfffd590 "/ipp/", filename=0xbfffd190 "/var/spool/cups/tmp/3f4cbfc6127c1") at util.c:456 #7 0x08049904 in main (argc=6, argv=0xbfffdff4) at ipp.c:589 Does that help? Cheers, Paul
Yes, very much. It would be great if you could, at that point, type 'finish' (so that you're back in the recv function), 'finish' again so you're in httpGets, and then keep typing 'next' until it stops. Let me know what output it gives up to that point.
OK, here's the full gdb session. The for and if else statements just go on looping forever after that. (gdb) bt #0 0xffffe002 in ?? () #1 0x420df321 in recv () from /lib/tls/libc.so.6 #2 0x4001e33a in httpGets (line=0xbffea080 "\211Y\001Blò\002@2j\tBoâ\001@", length=32, http=0x80551a8) at http.c:1307 #3 0x4001df58 in httpRead (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at http.c:921 #4 0x40021b91 in ipp_read (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at ipp.c:2057 #5 0x40020b83 in ippRead (http=0x80551a8, ipp=0x805ac30) at ipp.c:866 #6 0x400291f0 in cupsDoFileRequest (http=0x80551a8, request=0x8057ee0, resource=0xbfffd590 "/ipp/", filename=0xbfffd190 "/var/spool/cups/tmp/3f4cbfc6127c1") at util.c:456 #7 0x08049904 in main (argc=6, argv=0xbfffdff4) at ipp.c:589 (gdb) finish Run till exit from #0 0xffffe002 in ?? () 0x420df321 in recv () from /lib/tls/libc.so.6 (gdb) finish Run till exit from #0 0x420df321 in recv () from /lib/tls/libc.so.6 0x4001e33a in httpGets (line=0xbffea080 "\211Y\001Blò\002@2j\tBoâ\001@", length=32, http=0x80551a8) at http.c:1307 1307 bytes = recv(http->fd, bufend, HTTP_MAX_BUFFER - http->used, 0); (gdb) next 1309 if (bytes < 0) (gdb) next 1337 else if (bytes == 0) (gdb) next 1339 http->error = EPIPE; (gdb) next 1395 return (NULL); (gdb) next 1396 } (gdb) next httpRead (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at http.c:1108 1108 } (gdb) next 0x40021b91 in ipp_read (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at ipp.c:2057 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) (gdb) next 2014 if (http->used > 0) (gdb) next 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) (gdb) next 2014 if (http->used > 0) (gdb) next 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) (gdb) next 2014 if (http->used > 0) Cheers, Paul
Can you now do 'p http->state'? Bet it's zero.
2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) (gdb) next 2014 if (http->used > 0) (gdb) p http->state $1 = HTTP_POST_SEND Is that right?
Oh, that's not what I expected it to be. :-( Could you please run 'next' until you get to line 2057 (the call to httpRead) again, and then type 'step'. After that, keep typing 'next' until you get back to the same call to httpRead (line 2057). It's really helpful you doing this by the way!
(gdb) next 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) step httpRead (http=0x80551a8, buffer=0xbffea140 "\003", length=0) at http.c:908 908 if (http == NULL || buffer == NULL) (gdb) next 901 { (gdb) next 908 if (http == NULL || buffer == NULL) (gdb) next 911 http->activity = time(NULL); (gdb) next 913 if (length <= 0) (gdb) next 911 http->activity = time(NULL); (gdb) next 913 if (length <= 0) (gdb) next 916 if (http->data_encoding == HTTP_ENCODE_CHUNKED && (gdb) next 921 if (httpGets(len, sizeof(len), http) == NULL) (gdb) next 1108 } (gdb) next 0x40021b91 in ipp_read (http=0x80551a8, buffer=0xbffea140 "\003", length=8) at ipp.c:2057 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) (gdb) next 2014 if (http->used > 0) (gdb) next 2057 else if ((bytes = httpRead(http, (char *)buffer, length - tbytes)) < 0) (gdb) next 2012 for (tbytes = 0, bytes = 0; tbytes < length; tbytes += bytes, buffer += bytes) Tim - no probs about doing it - I want my print server to start behaving. Thanks for looking at it! Paul
Created attachment 94115 [details] cups-1.1.17-loop.patch This is the patch that I hope will fix the problem. I have put a source package containing this patch at ftp://people.redhat.com/twaugh/tmp/, and I would be grateful if you would try it out. You can build it using 'rpmbuild --rebuild cups-1.1.17-13.3.0.1.src.rpm' -- let me know if you'd like prebuilt binary packages instead.
OK, Tim - no need for you to build an rpm binary - I just built it and installed it. (I think it is ok, but I've just done this from home so fingers crossed I can print when I get to work in the morning...!) I think if my print server is still ok in 2 weeks, you have nailed it. Let's say I will try to remember to post on Sept 15th if all is still well, or earlier if the bug shows again. Regards and thanks a lot, Paul
Tim, I'm happy that the problem has been resolved. We have had pleny of heavy printing this week, and the problem has not occured. I have another problem related to CUPS which I will post another bug report for later on - it's a different issue and has always been present (ie your pach has not caused it) Thanks a lot, Tim. Paul
Would you be willing to test out another RPM? There is a chance that the patch you tested might have side effects in some situations, and apparently 1.1.19 has a different way of dealing with the original problem according to the upstream maintainer. It would be good to know whether this particular issue is still present in 1.1.19.
ftp://people.redhat.com/twaugh/tmp/cups*
Hi Tim, yeah - i'll test it. do you have a binary? Perhaps it might even fix my other bug where jobs just hang the printers? Paul
Binaries for 1.1.19-13.0.0.0.1test are at: ftp://people.redhat.com/twaugh/tmp/cups* (Yes, it might fix other problems too.) Thanks.
Hi Tim, Everything is going very well with the new version of cups. The ipp loop has not occured and the printers have not been jamming randomly with jobs. So far it is the most reliable printing system I have had using cups. However, what does happen is that if a job is cancelled on the printer using the "cancel job" button, the queue jams until the first one in the queue cancelled (say from the web pages). Perhaps cups has some flag set which doesn't get cleared until a job is cancelled. (ie it thinks the printer is off line?) Please let me know if I can send you any more info about that one. Paul
Tim, I forgot to say that this type of jam (from pressing the button on the printer) always happened with previous releases come to think of it) Paul
Okay, let's give it until (say) tomorrow to check that the IPP busy loop is gone. I have another package, this time based on 1.1.17 again, which I hope will fix the original problem in the same way that 1.1.19 does. It's possible that this bug is related to bug #104360, but if not we'll have to look at that once we know this one is nailed.
Tim, IPP bug has not occured today. Seems fixed in 1.1.19 Cheers, Paul
Okay, that's good news. So now we need a 1.1.17-based version, i.e. to distill the bug fixes from the 1.1.19 features. At ftp://people.redhat.com/twaugh/tmp/ there are some 1.1.17-13.3.0.2 packages (cups and cups-libs). These contain the fix that seems like it will cure the IPP busy loop problem. Could you please try them out so that (a) we can verify that the IPP busy loop problem is really gone, and (b) to see if it has any bearing on bug #104360? Thanks.
<grumble>humph, a downgrade!</grumble> Yes, of course I'll do it. What rpm command do I run, just --force to downgrade? Paul
Yeah, a downgrade. I'm just a bit nervous about doing a wholesale upgrade. :-( You can use rpm -Uvh --oldpackages ... to install them.
OK, downgrade complete. Will let you know what happens. It's working at the moment. Paul
Tim, In new cups packages would it be possible for the rpm to save a mime.convs.rpmnew and a mime.types.rpmnew instead of making the .old files. I need the raw printing fpr my non-PS printers which is not enabled by default in these files. So, if upo2date runs in cron and happens to update cups it can break the coupld of non-PS printers we have. Everything still working fine btw. Paul
If you have a raw print queue defined using redhat-config-printer the right thing will happen (i.e. the octet-stream mime-type will get defined).
ahhhh I use cups web pages to do my print admin. Is there some way I can set it from there? If I now start the redhat tool it doesn't find some of the cups defined printers. The redhat tool doesn't query the cups definitions? I guess because it can work with other printing systems too. Paul
No. :-( Basically redhat-config-printer uses its own, separate, configuration file. You ought to be able to import CUPS-defined queues using /usr/share/printconf/util/cups_import.py -- but then they will be managed by redhat-config-printer and not the CUPS web GUI.
OK, I'd rather keep the web management. Is there any way of making sure this thing always gets defined from there? Paul
Not short of removing redhat-config-printer altogether. Basically redhat-config-printer needs to be redone or replaced IMHO.
with cups as the default printing system now, and the cups web pages, doesn;t this make config-printer redundant anyway? Perhaps redhat-config-printer can become: #!/bin/bash konqueror http://localhost:631 There. I just wrote a new tool for you! OK, will let you know say friday about the status of the ipp loop or earlier if it appears. Paul
At the risk of heresy, the CUPS GUI is absolutely awful.
Well, at least you didn't object to konqueror at the same time!
Hi Tim, The new package has behaved itself all week. And, it hasn't jammed print queues randomly so far. Perhaps that is also now fixed.... so perhaps its now only pressing te cancel button on the printer which causes a problem. Can we start investigating that one now, or do you want to make a release of this fix first? Cheers, Paul
Any jams since your last comment?
no jams - it's all running well. Paul
Excellent. And no reoccurrance of bug #104360 then?
Tim, No, there has been no reoccurance of 104360 Everything has been working great for the last week and a half. Are you going to release this? Paul
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2003-275.html
I´m confused. If this bug was fixed upstream in CUPS 1.1.19 and the fix for RHL 8 and 9 is a backport of the upstream fix, then what´s the old patch still doing in cups-1.1.19-13 (Fedora and Rawhide)?
Nothing; in fact it's been removed in rawhide.