Bug 680518

Summary: rfe: handle SIGHUP in condor_configd
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: condor-wallaby-clientAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Tomas Rusnak <trusnak>
Severity: low Docs Contact:
Priority: medium    
Version: 1.3CC: iboverma, jwest, ltoscano, matt, mhusnain, mkudlej, pvn, trusnak, willb
Target Milestone: 2.0Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: condor-wallaby-client-4.0-3 Doc Type: Enhancement
Doc Text:
C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd C: The condor_configd would exit with a failure F: The condor_configd now handles SIGHUP on *nix systems R: The condor_configd will exit successfully Release Note Entry: Previously, when a reconfigure signal from MRG Grid or SIGHUP was sent to condor_configd, condor_configd would unexpectedly fail and then quit. Condor_configd is now able to handle SIGHUP on Linux, UNIX, and similar operating systems and then exit gracefully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 15:39:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 693778    

Description Jon Thomas 2011-02-25 18:59:18 UTC
Handle SIGHUP in condor_configd to avoid mail spam on 
condor_configure_pool --activate

example mail:

Subject: [Condor] Problem <hostname>: condor_configd died (1)

This is an automated email from the Condor system
on machine "hostname". Do not reply.

"/usr/sbin/condor_configd" on "hostname" died due to signal 1 (Hangup).
Condor will automatically restart this process in 11 seconds.

-- 

Useful bits from MasterLog:
02/24/11 16:55:54 Reconfiguring all running daemons.
02/24/11 16:55:54 Sent SIGHUP to CONFIGD (pid 2804)
02/24/11 16:55:54 The CONFIGD (pid 2804) died due to signal 1 (Hangup)
02/24/11 16:55:54 Sending obituary for "/usr/sbin/condor_configd"

Comment 2 Robert Rati 2011-03-11 15:08:40 UTC
This is now fixed on master.  When the configd receives a SIGHUP, it will exit gracefully for non-windows based systems.

Comment 3 Robert Rati 2011-03-15 17:35:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd
C: The condor_configd would exit with a failure
F: The condor_configd now handles SIGHUP on *nix systems
R: The condor_configd will exit successfully

Comment 13 Tomas Rusnak 2011-05-04 16:05:39 UTC
Reproduced on x86_64/RHEL5:

# condor -v
$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $

To: root@hostname
Subject: [Condor] Problem hostname: condor_configd died (1)

This is an automated email from the Condor system
on machine "hostname".  Do not reply.

MasterLog:
05/04/11 19:02:16 The CONFIGD (pid 18846) died due to signal 1 (Hangup)
05/04/11 19:02:16 ProcAPI::buildFamily failed: parent 18846 not found on system.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 Sending obituary for "/usr/sbin/condor_configd"

Comment 14 Tomas Rusnak 2011-05-04 16:23:31 UTC
Retested on all supported platforms x86,x86_64/RHEL5,RHEL6:

# kill -s SIGHUP $(ps ax | grep condor_configd | grep -v grep | awk '{print $1}')

MasterLog:
05/04/11 19:07:17 The CONFIGD (pid 12428) exited with status 0
05/04/11 19:07:17 ProcAPI::buildFamily failed: parent 12428 not found on system.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 restarting /usr/sbin/condor_configd in 10 seconds
05/04/11 19:07:17 enter Daemons::UpdateCollector
05/04/11 19:07:17 Trying to update collector <10.34.37.121:9618>
05/04/11 19:07:17 Attempting to send update via UDP to collector hostname <IP:9618>
05/04/11 19:07:17 MgmtMasterPlugin: calling update
05/04/11 19:07:17 exit Daemons::UpdateCollector
05/04/11 19:07:27 ::RealStart; CONFIGD on_hold=0
05/04/11 19:07:27 Create_Process: using fast clone() to create child process.
05/04/11 19:07:27 SharedPortEndpoint: Inside destructor.
05/04/11 19:07:27 start recover timer (28)
05/04/11 19:07:27 Started process "/usr/sbin/condor_configd -d", pid and pgroup = 12498

No mail sent, signal handled by condor_configd.

>>> VERIFIED

Comment 15 Misha H. Ali 2011-05-30 05:51:48 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,8 @@
 C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd
 C: The condor_configd would exit with a failure
 F: The condor_configd now handles SIGHUP on *nix systems
-R: The condor_configd will exit successfully+R: The condor_configd will exit successfully
+
+Release Note Entry:
+
+Previously, when a reconfigure signal from MRG Grid or SIGHUP was sent to condor_configd, condor_configd would unexpectedly fail and then quit. Condor_configd is now able to handle SIGHUP on Linux, UNIX, and similar operating systems and then exit gracefully.

Comment 16 Misha H. Ali 2011-06-06 03:24:20 UTC
Technical note can be viewed in the release notes for 2.0 at the documentation stage here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues

Comment 17 errata-xmlrpc 2011-06-23 15:39:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html