Bug 660604
Summary: | Cupsd segfault without warnings | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pierre-YvesChibon <pingou> | ||||||||||||
Component: | cups | Assignee: | Tim Waugh <twaugh> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | 14 | CC: | jmoskovc, jpopelka, twaugh | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | cups-1.4.6-1.fc13 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 855431 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2011-01-18 21:33:18 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Pierre-YvesChibon
2010-12-07 10:01:24 UTC
Please run this command as root: debuginfo-install cups and then attach gdb to cupsd and leave it running until it crashes. Do this by finding the PID of cupsd (ps ax | grep [c]upsd) and running gdb like this: gdb /usr/sbin/cupsd 1711 (gdb) continue When it crashes again, please run these gdb commands: bt info locals up info locals up info locals up info locals and paste the results here. Thanks! Created attachment 465248 [details]
gdb output asked
Btw, I attached the asked gdb output Thank you, but the output unfortunately doesn't contain the info we need. I think (maybe I'm wrong) that you pressed Enter when you saw that cupsd had crashed which evoked the previous command, i.e 'continue' which caused that cupsd was already terminated in the time when you tried to obtain info with 'bt' and 'info locals'. Can you try it again, please ? Thanks. Attached in the output from gdb, I ran the given command after seeing that gdb announced the segfault but I still have: # service cups status cupsd (pid 32204) is running... (on the contrary to previous time) Created attachment 467928 [details]
gdb output asked (version 2)
Does it contains useful information this time ? It tells us how it crashed, yes, but I haven't had time to fully investigate it yet I'm afraid. It was trying to send a D-Bus message at the time, but it isn't clear from the stack trace what's wrong. Well, thanks for looking at this. At the moment I have no problem reproducing it ([1]) so let me know if I can help. [1] # service cups status cupsd dead but pid file exists OK, if you can repeat it that makes it a lot easier to debug. Can you get to the same point in gdb again, and then run this?: select-frame 6 info locals p *dest Thanks. Should I run bt info locals up info locals up info locals up info locals before select-frame 6 info locals p *dest ? Just: bt select-frame 6 info locals p *dest In the "bt" output, I'm hoping that it says "#6 ... cupsd_send_dbus ..." -- if not, take a look to see which frame number cupsd_send_dbus is in and use 'select-frame' for that number instead. Thanks. And there is the output (told you I can reproduce it :-)). Created attachment 468646 [details]
gdb output of the select-frame 6 & info locals
Thanks. How are you reproducing the problem? At the moment I'm thinking that it's some sort of issue with a job not getting stopped when the printer it is queued for reaches its browse timeout. We expect the printer to disappear from the CUPS list of printers, and for any jobs currently processing to be stopped, but it looks like that might not be happening for some reason. It would be a great help if you can still reproduce the problem after doing this: cupsctl LogLevel=debug2 service cups restartlog If so, please attach /var/log/cups/error_log (you might have to compress it with e.g. xz first as it might be quite large). To reproduce the problem I just let cups running. I don't even try to print or anything. The surprising bits is that: # service cups status cupsd (pid 20806) is running... While I have not touched it since I posted the log yesterday (not even restarted it). I will run the command and keep you informed of the output. Created attachment 468886 [details]
error_log from cups running debug2
Ha! It times out the printer, stops the job, and then starts the job again. No wonder... Still analysing the cause. Could you please attach your /etc/cups/cupsd.conf file? Thanks. Created attachment 468923 [details]
cups.conf
OK, I've managed to reproduce the problem locally. Here's what I did: 1. Stop CUPS. 2. Edit /var/cache/cups/remote.cache and add these lines to the end: <Printer Nemo> BrowseTime 1292500224 Info Not there Location Nowhere MakeModel Mysteriously absent DeviceURI ipp://192.168.20.10:631/printers/Nemo State Idle Type 16814094 Accepting Yes JobSheets none none Option job-sheets none,none Option lease-duration 300 </Printer> 3. Start CUPS 4. Turn on debugging: cupsctl LogLevel=debug2 5. Submit a job to Nemo e.g. "echo . | lp -d Nemo" 6. Now restart CUPS again and clear the log: service cups restartlog 7. Wait for 5 minutes Without performing step 6 I couldn't reproduce the problem, which leads me to wonder if it's a timing issue of some sort. The backend is retrying every 30 seconds, and the browse timeout is 300 seconds, so the backend is logging messages at exactly the moment we time out its printer. Here's the log at that point: d [16/Dec/2010:12:54:32 +0000] select_timeout(0): 8 seconds to browse timeout a printer d [16/Dec/2010:12:54:40 +0000] cupsdAddEvent(event=printer-deleted, dest=0x7f6d86a6f4e0(Nemo), job=(nil)(0), text="%s '%s' deleted by directory services (timeout).", ...) d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa7590(281), event=0x7f6d86ac6aa0(printer-deleted)) d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa76e0(282), event=0x7f6d86ac46f0(printer-deleted)) D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(-----S) D [16/Dec/2010:12:54:40 +0000] cupsdSetBusyState: Printing jobs and dirty files D [16/Dec/2010:12:54:40 +0000] Remote destination "Nemo" has timed out; deleting it... d [16/Dec/2010:12:54:40 +0000] cupsdDeletePrinter(p=0x7f6d86a6f4e0(Nemo), update=1) d [16/Dec/2010:12:54:40 +0000] cupsdSetJobState(job=0x7f6d86a749b0(47), state=5, newstate=3, action=1, message="Job stopped due to printer being deleted.") d [16/Dec/2010:12:54:40 +0000] stop_job(job=0x7f6d86a749b0(47), action=1) d [16/Dec/2010:12:54:40 +0000] cupsdEndProcess(pid=1058, force=1) d [16/Dec/2010:12:54:40 +0000] cupsdAddEvent(event=job-state-changed, dest=0x7f6d86a6f4e0(Nemo), job=0x7f6d86a749b0(47), text="%s", ...) d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa7590(281), event=0x7f6d86ac4cb0(job-state-changed)) D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(-----S) I [16/Dec/2010:12:54:40 +0000] [Job 47] Job stopped due to printer being deleted. D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(----J-) d [16/Dec/2010:12:54:40 +0000] finalize_job(job=0x7f6d86a749b0(47)) d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-connecting-to-device" d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-com.apple.print.recoverable-warning" D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(P-----) d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-offline-report" d [16/Dec/2010:12:54:40 +0000] cupsdDeleteProfile(profile="(null)") d [16/Dec/2010:12:54:40 +0000] cupsdRemoveSelect(fd=13) d [16/Dec/2010:12:54:40 +0000] cupsdCheckJobs: 1 active jobs, sleeping=0, reload=0 d [16/Dec/2010:12:54:40 +0000] start_job(job=0x7f6d86a749b0(47), printer=0x7f6d86a6f4e0(Nemo)) ...and it starts the job again. Reported upstream. The reason is that cupsdDeletePrinter() assumes that this is sufficient to prevent jobs being started in the queue: 787 cupsdSetPrinterState(p, IPP_PRINTER_STOPPED, update); In fact, it isn't: 2952 /* 2953 * Can't set status of remote printers... 2954 */ 2955 2956 if (p->type & CUPS_PRINTER_DISCOVERED) 2957 return; When cupsdCheckJobs() is called later (from finalize_job()), it re-starts the job. I am glad you could reproduce it. Let me know if there is anything else I can do. This build contains the fix: http://koji.fedoraproject.org/koji/buildinfo?buildID=211214 *** Bug 667920 has been marked as a duplicate of this bug. *** cups-1.4.6-1.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc14 cups-1.4.6-1.fc13 has been submitted as an update for Fedora 13. https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc13 cups-1.4.6-1.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update cups'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc13 cups-1.4.6-1.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. cups-1.4.6-1.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. |