Handle SIGHUP in condor_configd to avoid mail spam on condor_configure_pool --activate example mail: Subject: [Condor] Problem <hostname>: condor_configd died (1) This is an automated email from the Condor system on machine "hostname". Do not reply. "/usr/sbin/condor_configd" on "hostname" died due to signal 1 (Hangup). Condor will automatically restart this process in 11 seconds. -- Useful bits from MasterLog: 02/24/11 16:55:54 Reconfiguring all running daemons. 02/24/11 16:55:54 Sent SIGHUP to CONFIGD (pid 2804) 02/24/11 16:55:54 The CONFIGD (pid 2804) died due to signal 1 (Hangup) 02/24/11 16:55:54 Sending obituary for "/usr/sbin/condor_configd"
This is now fixed on master. When the configd receives a SIGHUP, it will exit gracefully for non-windows based systems.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd C: The condor_configd would exit with a failure F: The condor_configd now handles SIGHUP on *nix systems R: The condor_configd will exit successfully
Reproduced on x86_64/RHEL5: # condor -v $CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ To: root@hostname Subject: [Condor] Problem hostname: condor_configd died (1) This is an automated email from the Condor system on machine "hostname". Do not reply. MasterLog: 05/04/11 19:02:16 The CONFIGD (pid 18846) died due to signal 1 (Hangup) 05/04/11 19:02:16 ProcAPI::buildFamily failed: parent 18846 not found on system. 05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist. 05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist. 05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist. 05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist. 05/04/11 19:02:16 ProcAPI::getProcInfo() pid 18846 does not exist. 05/04/11 19:02:16 Sending obituary for "/usr/sbin/condor_configd"
Retested on all supported platforms x86,x86_64/RHEL5,RHEL6: # kill -s SIGHUP $(ps ax | grep condor_configd | grep -v grep | awk '{print $1}') MasterLog: 05/04/11 19:07:17 The CONFIGD (pid 12428) exited with status 0 05/04/11 19:07:17 ProcAPI::buildFamily failed: parent 12428 not found on system. 05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist. 05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist. 05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist. 05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist. 05/04/11 19:07:17 ProcAPI::getProcInfo() pid 12428 does not exist. 05/04/11 19:07:17 restarting /usr/sbin/condor_configd in 10 seconds 05/04/11 19:07:17 enter Daemons::UpdateCollector 05/04/11 19:07:17 Trying to update collector <10.34.37.121:9618> 05/04/11 19:07:17 Attempting to send update via UDP to collector hostname <IP:9618> 05/04/11 19:07:17 MgmtMasterPlugin: calling update 05/04/11 19:07:17 exit Daemons::UpdateCollector 05/04/11 19:07:27 ::RealStart; CONFIGD on_hold=0 05/04/11 19:07:27 Create_Process: using fast clone() to create child process. 05/04/11 19:07:27 SharedPortEndpoint: Inside destructor. 05/04/11 19:07:27 start recover timer (28) 05/04/11 19:07:27 Started process "/usr/sbin/condor_configd -d", pid and pgroup = 12498 No mail sent, signal handled by condor_configd. >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,8 @@ C: Sending a reconfigure signal from MRG Grid or a SIGHUP from the command line to the condor_configd C: The condor_configd would exit with a failure F: The condor_configd now handles SIGHUP on *nix systems -R: The condor_configd will exit successfully+R: The condor_configd will exit successfully + +Release Note Entry: + +Previously, when a reconfigure signal from MRG Grid or SIGHUP was sent to condor_configd, condor_configd would unexpectedly fail and then quit. Condor_configd is now able to handle SIGHUP on Linux, UNIX, and similar operating systems and then exit gracefully.
Technical note can be viewed in the release notes for 2.0 at the documentation stage here: http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html