Bug 680518 - rfe: handle SIGHUP in condor_configd
Summary: rfe: handle SIGHUP in condor_configd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-client
Version: 1.3
Hardware: Unspecified
OS: Linux
medium
low
Target Milestone: 2.0
: ---
Assignee: Robert Rati
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 693778
TreeView+ depends on / blocked
 
Reported: 2011-02-25 18:59 UTC by Jon Thomas
Modified: 2018-11-14 14:17 UTC (History)
9 users (show)

Fixed In Version: condor-wallaby-client-4.0-3
Doc Type: Enhancement
Doc Text:
C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd C: The condor_configd would exit with a failure F: The condor_configd now handles SIGHUP on *nix systems R: The condor_configd will exit successfully Release Note Entry: Previously, when a reconfigure signal from MRG Grid or SIGHUP was sent to condor_configd, condor_configd would unexpectedly fail and then quit. Condor_configd is now able to handle SIGHUP on Linux, UNIX, and similar operating systems and then exit gracefully.
Clone Of:
Environment:
Last Closed: 2011-06-23 15:39:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0889 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.0 Release 2011-06-23 15:35:53 UTC

Description Jon Thomas 2011-02-25 18:59:18 UTC
Handle SIGHUP in condor_configd to avoid mail spam on 
condor_configure_pool --activate

example mail:

Subject: [Condor] Problem <hostname>: condor_configd died (1)

This is an automated email from the Condor system
on machine "hostname". Do not reply.

"/usr/sbin/condor_configd" on "hostname" died due to signal 1 (Hangup).
Condor will automatically restart this process in 11 seconds.

-- 

Useful bits from MasterLog:
02/24/11 16:55:54 Reconfiguring all running daemons.
02/24/11 16:55:54 Sent SIGHUP to CONFIGD (pid 2804)
02/24/11 16:55:54 The CONFIGD (pid 2804) died due to signal 1 (Hangup)
02/24/11 16:55:54 Sending obituary for "/usr/sbin/condor_configd"

Comment 2 Robert Rati 2011-03-11 15:08:40 UTC
This is now fixed on master.  When the configd receives a SIGHUP, it will exit gracefully for non-windows based systems.

Comment 3 Robert Rati 2011-03-15 17:35:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd
C: The condor_configd would exit with a failure
F: The condor_configd now handles SIGHUP on *nix systems
R: The condor_configd will exit successfully

Comment 13 Tomas Rusnak 2011-05-04 16:05:39 UTC
Reproduced on x86_64/RHEL5:

# condor -v
$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $

To: root@hostname
Subject: [Condor] Problem hostname: condor_configd died (1)

This is an automated email from the Condor system
on machine "hostname".  Do not reply.

MasterLog:
05/04/11 19:02:16 The CONFIGD (pid 18846) died due to signal 1 (Hangup)
05/04/11 19:02:16 ProcAPI::buildFamily failed: parent 18846 not found on system.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist.
05/04/11 19:02:16 Sending obituary for "/usr/sbin/condor_configd"

Comment 14 Tomas Rusnak 2011-05-04 16:23:31 UTC
Retested on all supported platforms x86,x86_64/RHEL5,RHEL6:

# kill -s SIGHUP $(ps ax | grep condor_configd | grep -v grep | awk '{print $1}')

MasterLog:
05/04/11 19:07:17 The CONFIGD (pid 12428) exited with status 0
05/04/11 19:07:17 ProcAPI::buildFamily failed: parent 12428 not found on system.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist.
05/04/11 19:07:17 restarting /usr/sbin/condor_configd in 10 seconds
05/04/11 19:07:17 enter Daemons::UpdateCollector
05/04/11 19:07:17 Trying to update collector <10.34.37.121:9618>
05/04/11 19:07:17 Attempting to send update via UDP to collector hostname <IP:9618>
05/04/11 19:07:17 MgmtMasterPlugin: calling update
05/04/11 19:07:17 exit Daemons::UpdateCollector
05/04/11 19:07:27 ::RealStart; CONFIGD on_hold=0
05/04/11 19:07:27 Create_Process: using fast clone() to create child process.
05/04/11 19:07:27 SharedPortEndpoint: Inside destructor.
05/04/11 19:07:27 start recover timer (28)
05/04/11 19:07:27 Started process "/usr/sbin/condor_configd -d", pid and pgroup = 12498

No mail sent, signal handled by condor_configd.

>>> VERIFIED

Comment 15 Misha H. Ali 2011-05-30 05:51:48 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,8 @@
 C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd
 C: The condor_configd would exit with a failure
 F: The condor_configd now handles SIGHUP on *nix systems
-R: The condor_configd will exit successfully+R: The condor_configd will exit successfully
+
+Release Note Entry:
+
+Previously, when a reconfigure signal from MRG Grid or SIGHUP was sent to condor_configd, condor_configd would unexpectedly fail and then quit. Condor_configd is now able to handle SIGHUP on Linux, UNIX, and similar operating systems and then exit gracefully.

Comment 16 Misha H. Ali 2011-06-06 03:24:20 UTC
Technical note can be viewed in the release notes for 2.0 at the documentation stage here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues

Comment 17 errata-xmlrpc 2011-06-23 15:39:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html


Note You need to log in before you can comment on or make changes to this bug.