Bug 164641

Summary:	[PATCH] cupsd segfault when SIGCHLD received
Product:	Red Hat Enterprise Linux 4	Reporter:	Tim Waugh <twaugh>
Component:	cups	Assignee:	Tim Waugh <twaugh>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	tao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHSA-2005-772	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-09-27 11:52:08 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	146850
Bug Blocks:

Description Tim Waugh 2005-07-29 16:11:51 UTC

** This bug also affects RHEL4, so I am cloning it **

+++ This bug was initially created as a clone of Bug #146850 +++

Description of problem:
If cupsd gets a SIGHUP and is in the process of reloading the configuration when
it receives a SIGCHLD it can segfault in sigchld_handler. 

Version-Release number of selected component (if applicable):
cups-1.1.17-13.3.6

How reproducible:
Occasionally

Steps to Reproduce:
1. send SIGHUP to cupsd
2.
3.
  
Actual results:
cupsd exits with sig 11

Expected results:
cupsd reloads configuration and continues running

Additional info:
Here is the customer's report:
Core was generated by `cupsd'.
 Program terminated with signal 11, Segmentation fault.
 ...
 #0  sigchld_handler (sig=17) at main.c:775
 775           if (job->state != NULL &&
 (gdb) bt
 #0  sigchld_handler (sig=17) at main.c:775
 #1  <signal handler called>
 #2  0xb7376edb in _int_free () from /lib/tls/libc.so.6
 #3  0xb7375e68 in free () from /lib/tls/libc.so.6
 #4  0xb74899f5 in _ipp_free_attr () from /usr/lib/libcups.so.2
 #5  0xb748822a in ippDelete () from /usr/lib/libcups.so.2
 #6  0x08068a50 in FreeAllJobs () at job.c:375
 #7  0x08054951 in ReadConfiguration () at conf.c:177
 #8  0x0805c5a4 in main (argc=1, argv=0xbfffb134) at main.c:411
 #9  0xb731a768 in __libc_start_main () from /lib/tls/libc.so.6
 #10 0x0804c401 in _start ()

Investigating an above backtrace,
I guess that an invalid pointer operation in "sigchld_handler" function which is
executed when cupsd receives "SIGCHLD"
signal causes it abnormal termination with "Segmentation fault".

A part of sigchld_hanlder function is:
 770        /*
 771         * Lookup the PID in the jobs list...
 772         */
 773
 774         for (job = Jobs; job != NULL; job = job->next)
 775           if (job->state != NULL &&
 776               job->state->values[0].integer == IPP_JOB_PROCESSING)
 777           {

It seems that cupsd terminated abnormally in "Jobs" list operation.

(gdb) print job
$1 = (job_t *) 0x64

Therefore I am sure that cupsd terminated abnormally with "Segmentation fault"
because of an invalid pointer operation.

Moreover, I traced above backtrace and then we find "FreeAllJobs" function at #6.
Looking the following lines in "FreeAllJobs" function on this backtrace #6:     
 (gdb) up 6
 #6  0x08068a50 in FreeAllJobs () at job.c:375
 375         ippDelete(job->attrs);
 (gdb) list
 370
 371       for (job = Jobs; job; job = next)
 372       {
 373         next = job->next;
 374
 375         ippDelete(job->attrs);
 376         free(job->filetypes);
 377         free(job);
 378       }
 379

Apparently, it seems to be while the same "Jobs" list was operated.

Referring the "next" value at this time:
 (gdb) print next
 $2 = (job_t *) 0x80c3918
 (gdb) print job
 $3 = (job_t *) 0x80c28c0

It seems that these pointers are valid.

(gdb) down 6
#0  sigchld_handler (sig=17) at main.c:775
775           if (job->state != NULL &&

Returning to #0 backtrace and then tracing "Jobs" list pointer one after another:
 (gdb) print Jobs
 $4 = (job_t *) 0x80ba600
 (gdb) print Jobs->next
 $5 = (struct job_str *) 0x80bb598
 (gdb) print Jobs->next->next
 $6 = (struct job_str *) 0x64
 
These are same on backtrace #6, too.

Therefore, it can be judged that the trouble occurrence cause guessed first was
correct.

Summarizing this trouble occurrence cause:
 1) "cupsd" received SIGHUP signal
 2) "cupsd" did the following actions:
 [1] Sending SIGTERM(or SIGKILL) signal to all children processes which connect
to their parent process
 [2] Releasing all items which cascade connection to the "Jobs" list

 3) The parent process, "cupsd" received SIGCHLD signal
 4) Start to perform "sigchld_handler", the signal handler of SIGCHLD
 after interrupting 2)-[2] process temporarily

 5) In the sigchld_handler function, there is processing by which "Jobs" list is
referred.
 The problem might occur depending on the timing of "SIGCHLD" interruption.
 At this time, "cupsd" sometimes remains an invalid pointer of "Jobs" list.   
 And then "cupsd" tries to refer an invalid pointer
 when it traces each an element from "Jobs" list one after another in
"sigchld_handler".
 And then "cupsd" terminates abnormally.   

Therefore, it can be judged
that it is a timing trouble which occurs because sigchld_handler was called in
"Jobs" list operating.

Correction proposal and reproduction test with a patch:
 I performed the reproduction test with the patch which is fixed that cupsd is
blocking SIGCHLD signal while "Jobs" list operation,
and then I confirmed that this problem didn't occur.

Comment 3 Red Hat Bugzilla 2005-09-27 11:52:08 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-772.html