Bug 615321 - condor_master ignores SIGQUIT on RHEL4 - does not shutdown properly.
Summary: condor_master ignores SIGQUIT on RHEL4 - does not shutdown properly.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.3
Hardware: All
OS: Linux
low
medium
Target Milestone: 1.3.2
: ---
Assignee: grid-maint-list
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-16 13:59 UTC by Ken Giusti
Modified: 2010-11-24 12:51 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-19 19:24:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ken Giusti 2010-07-16 13:59:04 UTC
Description of problem:

Occasionally, "service condor stop" will not stop condor.  It appears as if the SIGQUIT signal used to shutdown condor is not being acted upon.



Version-Release number of selected component (if applicable):  

RHEL4

condor*-7.4.4-0.4.el4 


How reproducible:  

Rare.  Seen only on first attempted shutdown after system boot.  Cannot repro if first "service condor stop" is successful.


Steps to Reproduce:
1. boot RHEL 4 system, with condor started as a service
2. use condor configuration as specified in  https://bugzilla.redhat.com/show_bug.cgi?id=610773
3. watch /var/log/MasterLog - should see SIGQUIT log message when "service condor stop" done, otherwise condor will not shutdown
  
Actual results:
service condor stop fails with an error message.


Expected results:
service condor stop should succeed and all condor processes should have exited cleanly.

Additional info:

See https://bugzilla.redhat.com/show_bug.cgi?id=610773 for additional information.

Comment 1 Matthew Farrellee 2010-08-04 21:05:06 UTC
What is the error message?

Does the MasterLog say if a signal was received?

Comment 2 Ken Giusti 2010-08-05 13:30:43 UTC
The only error message - actually, warning - I have seen is the failure of the condor stop command:

[kgiusti@localhost ~]$ sudo /sbin/service condor stop
Password:
Stopping Condor daemons:                                   [  OK  ]
Warning: condor_master may not have exited, start/restart may fail


No, the MasterLog shows no activity whatsoever during the failed shutdown.   During a successful shutdown, the log will contain the following log message:

07/16 08:36:57 Got SIGQUIT.  Performing fast shutdown.

When the failure occurs, there is NO new activity in the master log.  It appears as if the signal is lost/blocked.

Comment 3 Matthew Farrellee 2010-08-19 13:50:10 UTC
What was the state of the broker when stopping Condor?

Comment 4 Matthew Farrellee 2010-08-19 13:52:08 UTC
Possibly related to Bug 625450

Comment 5 Matthew Farrellee 2010-08-19 15:48:19 UTC
Strike comment 4

Comment 6 Matthew Farrellee 2010-08-19 16:07:14 UTC
Ken said broker was present, but the primary issue included no mention of SIGQUIT being received by the master.


Note You need to log in before you can comment on or make changes to this bug.