660604 – Cupsd segfault without warnings

Bug 660604 - Cupsd segfault without warnings

Summary: Cupsd segfault without warnings

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cups
Sub Component:
Version:	14
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Tim Waugh
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	667920 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-12-07 10:01 UTC by Pierre-YvesChibon
Modified:	2011-04-17 21:25 UTC (History)
CC List:	3 users (show)
Fixed In Version:	cups-1.4.6-1.fc13
Clone Of:
Clones:	855431 (view as bug list)
Environment:
Last Closed:	2011-01-18 21:33:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gdb output asked (7.95 KB, text/plain) 2010-12-07 15:06 UTC, Pierre-YvesChibon	no flags	Details
gdb output asked (version 2) (4.47 KB, text/plain) 2010-12-10 09:07 UTC, Pierre-YvesChibon	no flags	Details
gdb output of the select-frame 6 & info locals (3.52 KB, text/plain) 2010-12-14 17:03 UTC, Pierre-YvesChibon	no flags	Details
error_log from cups running debug2 (220.26 KB, application/x-gzip) 2010-12-15 16:38 UTC, Pierre-YvesChibon	no flags	Details
cups.conf (3.99 KB, text/plain) 2010-12-15 18:04 UTC, Pierre-YvesChibon	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
CUPS Bugs and Features	3754	0	None	None	None	Never

Description Pierre-YvesChibon 2010-12-07 10:01:24 UTC

Description of problem:
My computer is connected to the printed of my work via samba.
Once in a while cups just dies without warnings or even notice. I see it by seing all printers being absent.

dmesg returns at the bottom:
[1185471.785553] SELinux: 2048 avtab hash slots, 210653 rules.
[1185471.841767] SELinux: 2048 avtab hash slots, 210653 rules.
[1185471.982002] SELinux:  9 users, 14 roles, 3473 types, 181 bools, 1 sens, 1024 cats
[1185471.982006] SELinux:  77 classes, 210653 rules
[1211561.415362] cupsd[1386]: segfault at 8 ip 00007fc3ba58dd81 sp 00007fffa1791b18 error 4 in libc-2.12.90.so[7fc3ba50b000+19a000]
[1273199.184587] cupsd[10509]: segfault at 8 ip 00007f5d8b46ad81 sp 00007fff58142a58 error 4 in libc-2.12.90.so[7f5d8b3e8000+19a000]

Version-Release number of selected component (if applicable):
# rpm -q cups
cups-1.4.4-11.fc14.x86_64


How reproducible:
often

Steps to Reproduce:
1. I don't know how to reproduce it since I do not know when it happens
2.
3.
  
Actual results:
# service cups status
cupsd dead but pid file exists


Expected results:
Having cups running

Additional info:

Comment 1 Tim Waugh 2010-12-07 10:19:36 UTC

Please run this command as root:

debuginfo-install cups

and then attach gdb to cupsd and leave it running until it crashes.  Do this by finding the PID of cupsd (ps ax | grep [c]upsd) and running gdb like this:

gdb /usr/sbin/cupsd 1711
(gdb) continue

When it crashes again, please run these gdb commands:

bt
info locals
up
info locals
up
info locals
up
info locals

and paste the results here.  Thanks!

Comment 2 Pierre-YvesChibon 2010-12-07 15:06:28 UTC

Created attachment 465248 [details]
gdb output asked

Comment 3 Pierre-YvesChibon 2010-12-08 12:20:10 UTC

Btw, I attached the asked gdb output

Comment 4 Jiri Popelka 2010-12-08 13:08:08 UTC

Thank you, but the output unfortunately doesn't contain the info we need.

I think (maybe I'm wrong) that you pressed Enter when you saw that cupsd had crashed which evoked the previous command, i.e 'continue' which caused that cupsd was already terminated in the time when you tried to obtain info with 'bt' and 'info locals'. Can you try it again, please ? Thanks.

Comment 5 Pierre-YvesChibon 2010-12-10 09:06:25 UTC

Attached in the output from gdb, I ran the given command after seeing that gdb announced the segfault but I still have:

# service cups status
cupsd (pid  32204) is running...

(on the contrary to previous time)

Comment 6 Pierre-YvesChibon 2010-12-10 09:07:15 UTC

Created attachment 467928 [details]
gdb output asked (version 2)

Comment 7 Pierre-YvesChibon 2010-12-14 10:32:44 UTC

Does it contains useful information this time ?

Comment 8 Tim Waugh 2010-12-14 13:25:43 UTC

It tells us how it crashed, yes, but I haven't had time to fully investigate it yet I'm afraid.  It was trying to send a D-Bus message at the time, but it isn't clear from the stack trace what's wrong.

Comment 9 Pierre-YvesChibon 2010-12-14 13:58:28 UTC

Well, thanks for looking at this. At the moment I have no problem reproducing it ([1]) so let me know if I can help.


[1] 
# service cups status
cupsd dead but pid file exists

Comment 10 Tim Waugh 2010-12-14 14:51:16 UTC

OK, if you can repeat it that makes it a lot easier to debug.

Can you get to the same point in gdb again, and then run this?:

select-frame 6
info locals
p *dest

Thanks.

Comment 11 Pierre-YvesChibon 2010-12-14 15:01:01 UTC

Should I run  

bt
info locals
up
info locals
up
info locals
up
info locals

before 
 
select-frame 6
info locals
p *dest

?

Comment 12 Tim Waugh 2010-12-14 15:57:38 UTC

Just:

bt
select-frame 6
info locals
p *dest

In the "bt" output, I'm hoping that it says "#6 ... cupsd_send_dbus ..." -- if not, take a look to see which frame number cupsd_send_dbus is in and use 'select-frame' for that number instead.

Thanks.

Comment 13 Pierre-YvesChibon 2010-12-14 17:02:33 UTC

And there is the output (told you I can reproduce it :-)).

Comment 14 Pierre-YvesChibon 2010-12-14 17:03:10 UTC

Created attachment 468646 [details]
gdb output of the select-frame 6 & info locals

Comment 15 Tim Waugh 2010-12-15 15:50:40 UTC

Thanks.

How are you reproducing the problem?

At the moment I'm thinking that it's some sort of issue with a job not getting stopped when the printer it is queued for reaches its browse timeout.  We expect the printer to disappear from the CUPS list of printers, and for any jobs currently processing to be stopped, but it looks like that might not be happening for some reason.

It would be a great help if you can still reproduce the problem after doing this:

cupsctl LogLevel=debug2
service cups restartlog

If so, please attach /var/log/cups/error_log (you might have to compress it with e.g. xz first as it might be quite large).

Comment 16 Pierre-YvesChibon 2010-12-15 16:05:27 UTC

To reproduce the problem I just let cups running. I don't even try to print or anything.

The surprising bits is that:
# service cups status
cupsd (pid  20806) is running...

While I have not touched it since I posted the log yesterday (not even restarted it).

I will run the command and keep you informed of the output.

Comment 17 Pierre-YvesChibon 2010-12-15 16:38:44 UTC

Created attachment 468886 [details]
error_log from cups running debug2

Comment 18 Tim Waugh 2010-12-15 17:16:35 UTC

Ha!  It times out the printer, stops the job, and then starts the job again.  No wonder...

Still analysing the cause.

Comment 19 Tim Waugh 2010-12-15 17:55:26 UTC

Could you please attach your /etc/cups/cupsd.conf file?  Thanks.

Comment 20 Pierre-YvesChibon 2010-12-15 18:04:56 UTC

Created attachment 468923 [details]
cups.conf

Comment 21 Tim Waugh 2010-12-16 13:01:03 UTC

OK, I've managed to reproduce the problem locally.  Here's what I did:

1. Stop CUPS.
2. Edit /var/cache/cups/remote.cache and add these lines to the end:

<Printer Nemo>
BrowseTime 1292500224
Info Not there
Location Nowhere
MakeModel Mysteriously absent
DeviceURI ipp://192.168.20.10:631/printers/Nemo
State Idle
Type 16814094
Accepting Yes
JobSheets none none
Option job-sheets none,none
Option lease-duration 300
</Printer>

3. Start CUPS
4. Turn on debugging: cupsctl LogLevel=debug2
5. Submit a job to Nemo e.g. "echo . | lp -d Nemo"
6. Now restart CUPS again and clear the log: service cups restartlog
7. Wait for 5 minutes

Without performing step 6 I couldn't reproduce the problem, which leads me to wonder if it's a timing issue of some sort.  The backend is retrying every 30 seconds, and the browse timeout is 300 seconds, so the backend is logging messages at exactly the moment we time out its printer.

Here's the log at that point:

d [16/Dec/2010:12:54:32 +0000] select_timeout(0): 8 seconds to browse timeout a printer
d [16/Dec/2010:12:54:40 +0000] cupsdAddEvent(event=printer-deleted, dest=0x7f6d86a6f4e0(Nemo), job=(nil)(0), text="%s '%s' deleted by directory services (timeout).", ...)
d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa7590(281), event=0x7f6d86ac6aa0(printer-deleted))
d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa76e0(282), event=0x7f6d86ac46f0(printer-deleted))
D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(-----S)
D [16/Dec/2010:12:54:40 +0000] cupsdSetBusyState: Printing jobs and dirty files
D [16/Dec/2010:12:54:40 +0000] Remote destination "Nemo" has timed out; deleting it...
d [16/Dec/2010:12:54:40 +0000] cupsdDeletePrinter(p=0x7f6d86a6f4e0(Nemo), update=1)
d [16/Dec/2010:12:54:40 +0000] cupsdSetJobState(job=0x7f6d86a749b0(47), state=5, newstate=3, action=1, message="Job stopped due to printer being deleted.")
d [16/Dec/2010:12:54:40 +0000] stop_job(job=0x7f6d86a749b0(47), action=1)
d [16/Dec/2010:12:54:40 +0000] cupsdEndProcess(pid=1058, force=1)
d [16/Dec/2010:12:54:40 +0000] cupsdAddEvent(event=job-state-changed, dest=0x7f6d86a6f4e0(Nemo), job=0x7f6d86a749b0(47), text="%s", ...)
d [16/Dec/2010:12:54:40 +0000] cupsd_send_notification(sub=0x7f6d86aa7590(281), event=0x7f6d86ac4cb0(job-state-changed))
D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(-----S)
I [16/Dec/2010:12:54:40 +0000] [Job 47] Job stopped due to printer being deleted.
D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(----J-)
d [16/Dec/2010:12:54:40 +0000] finalize_job(job=0x7f6d86a749b0(47))
d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-connecting-to-device"
d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-com.apple.print.recoverable-warning"
D [16/Dec/2010:12:54:40 +0000] cupsdMarkDirty(P-----)
d [16/Dec/2010:12:54:40 +0000] cupsdSetPrinterReasons(p=0x7f6d86a6f4e0(Nemo),s="-offline-report"
d [16/Dec/2010:12:54:40 +0000] cupsdDeleteProfile(profile="(null)")
d [16/Dec/2010:12:54:40 +0000] cupsdRemoveSelect(fd=13)
d [16/Dec/2010:12:54:40 +0000] cupsdCheckJobs: 1 active jobs, sleeping=0, reload=0
d [16/Dec/2010:12:54:40 +0000] start_job(job=0x7f6d86a749b0(47), printer=0x7f6d86a6f4e0(Nemo))

...and it starts the job again.

Comment 22 Tim Waugh 2010-12-16 15:27:26 UTC

Reported upstream.  The reason is that cupsdDeletePrinter() assumes that this is sufficient to prevent jobs being started in the queue:

787      cupsdSetPrinterState(p, IPP_PRINTER_STOPPED, update);

In fact, it isn't:

2952    /*
2953     * Can't set status of remote printers...
2954     */
2955   
2956     if (p->type & CUPS_PRINTER_DISCOVERED)
2957       return;

When cupsdCheckJobs() is called later (from finalize_job()), it re-starts the job.

Comment 23 Pierre-YvesChibon 2010-12-17 13:40:46 UTC

I am glad you could reproduce it.

Let me know if there is anything else I can do.

Comment 24 Tim Waugh 2010-12-22 11:46:40 UTC

This build contains the fix:
  http://koji.fedoraproject.org/koji/buildinfo?buildID=211214

Comment 25 Tim Waugh 2011-01-07 10:59:01 UTC

*** Bug 667920 has been marked as a duplicate of this bug. ***

Comment 26 Fedora Update System 2011-01-07 12:31:06 UTC

cups-1.4.6-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc14

Comment 27 Fedora Update System 2011-01-07 12:32:03 UTC

cups-1.4.6-1.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc13

Comment 28 Fedora Update System 2011-01-07 19:57:51 UTC

cups-1.4.6-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update cups'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/cups-1.4.6-1.fc13

Comment 29 Fedora Update System 2011-01-18 21:32:57 UTC

cups-1.4.6-1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 30 Fedora Update System 2011-04-17 21:25:24 UTC

cups-1.4.6-1.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.