Bug 615321

Summary: condor_master ignores SIGQUIT on RHEL4 - does not shutdown properly.
Product: Red Hat Enterprise MRG Reporter: Ken Giusti <kgiusti>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED INSUFFICIENT_DATA QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 1.3CC: matt
Target Milestone: 1.3.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-19 19:24:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Giusti 2010-07-16 13:59:04 UTC
Description of problem:

Occasionally, "service condor stop" will not stop condor.  It appears as if the SIGQUIT signal used to shutdown condor is not being acted upon.



Version-Release number of selected component (if applicable):  

RHEL4

condor*-7.4.4-0.4.el4 


How reproducible:  

Rare.  Seen only on first attempted shutdown after system boot.  Cannot repro if first "service condor stop" is successful.


Steps to Reproduce:
1. boot RHEL 4 system, with condor started as a service
2. use condor configuration as specified in  https://bugzilla.redhat.com/show_bug.cgi?id=610773
3. watch /var/log/MasterLog - should see SIGQUIT log message when "service condor stop" done, otherwise condor will not shutdown
  
Actual results:
service condor stop fails with an error message.


Expected results:
service condor stop should succeed and all condor processes should have exited cleanly.

Additional info:

See https://bugzilla.redhat.com/show_bug.cgi?id=610773 for additional information.

Comment 1 Matthew Farrellee 2010-08-04 21:05:06 UTC
What is the error message?

Does the MasterLog say if a signal was received?

Comment 2 Ken Giusti 2010-08-05 13:30:43 UTC
The only error message - actually, warning - I have seen is the failure of the condor stop command:

[kgiusti@localhost ~]$ sudo /sbin/service condor stop
Password:
Stopping Condor daemons:                                   [  OK  ]
Warning: condor_master may not have exited, start/restart may fail


No, the MasterLog shows no activity whatsoever during the failed shutdown.   During a successful shutdown, the log will contain the following log message:

07/16 08:36:57 Got SIGQUIT.  Performing fast shutdown.

When the failure occurs, there is NO new activity in the master log.  It appears as if the signal is lost/blocked.

Comment 3 Matthew Farrellee 2010-08-19 13:50:10 UTC
What was the state of the broker when stopping Condor?

Comment 4 Matthew Farrellee 2010-08-19 13:52:08 UTC
Possibly related to Bug 625450

Comment 5 Matthew Farrellee 2010-08-19 15:48:19 UTC
Strike comment 4

Comment 6 Matthew Farrellee 2010-08-19 16:07:14 UTC
Ken said broker was present, but the primary issue included no mention of SIGQUIT being received by the master.