Bug 137387

Summary: checkpid() does not check for the death of ALL threads belonging to a process.
Product: [Fedora] Fedora Reporter: Ian Macdonald <ian>
Component: initscriptsAssignee: Bill Nottingham <notting>
Status: CLOSED RAWHIDE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: rvokal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-28 02:56:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
This fixes the issue described in this bug. none

Description Ian Macdonald 2004-10-27 23:41:40 UTC
Description of problem:

checkpid() in /etc/init.d/functions is broken. It returns success if
any of a program's threads is no longer running. It should return
success only if all threads from a given program are no longer running.

This fails if one upgrades to OpenLDAP 2.2, as it's no longer possible
to shut down slurpd. slurpd fails to die on the TERM sent from
killproc(), but checkpid() erroneously reports that slurpd has been
shut down, because some (but not all) of its threads are no longer
active in /proc. For this reason, fallthrough to issuing a KILL never
happens.


Version-Release number of selected component (if applicable):


How reproducible:

Every time.

Steps to Reproduce:
1. Install OpenLDAP 2.2 as a master server, using slurpd for
replication. You can use the RPM from FC3.
2. Attempt to shut down the LDAP service, using the init script.
  
Actual results:

Observe with ps(1) that slurpd is still running.

Expected results:

slurpd should have been KILLed when it refused to be TERMinated.

Additional info:

Although OpenLDAP 2.2 is not a part of FC2, this problem has the
potential to occur with any threaded daemon that responds to a TERM by
shutting down some, but not all, of its threads.

I haven't checked the initscripts in FC3, but I suspect the problem is
still there in checkpid().

Comment 1 Ian Macdonald 2004-10-28 00:11:23 UTC
Created attachment 105876 [details]
This fixes the issue described in this bug.

Using this version of checkpid(), daemons which stop some, but not all, threads
after receiving a TERM are later properly sent a KILL to finish the job off.

Comment 2 Bill Nottingham 2004-10-28 02:56:32 UTC
This is fixed in current development packages (7.85 and later.)