Hide Forgot
Description of problem: The condor_triggerd daemon ignore SIGTERM and SIGQUIT signal. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. configure system to use triggerd 2. try to service condor restart, service condor stop, or simple send SIGTERM or SIGQUIT signals to triggerd daemon 3. tail -f /var/log/condor/MasterLog 4. after long time (depends on SHUTDOWN_FAST_TIMEOUT), the master killed triggerd with signal 9 Actual results: daemon ignore signals Expected results: daemon respond on signals properly Additional info: Config: QMF_BROKER_HOST=localhost ALL_DEBUG=D_FULLDEBUG CONFIGD_ARGS = -d ALLOW_WRITE = * ALLOW_READ = * ALLOW_NEGOTIATOR = * ALLOW_ADMINISTRATOR_READ = * QMF_PUBLISH_SUBMISSIONS = False HISTORY = $(SPOOL)/history JOB_SERVER = $(SBIN)/condor_job_server JOB_SERVER_ARGS = -f JOB_SERVER.JOB_SERVER_LOG = $(LOG)/JobServerLog JOB_SERVER.JOB_SERVER_ADDRESS_FILE = $(LOG)/.job_server_address JOB_SERVER.SCHEDD_NAME = schedd STARTD_CRON_NAME = TRIGGER_DATA STARTD_CRON_AUTOPUBLISH = If_Changed TRIGGER_DATA_JOBLIST = GetData TRIGGER_DATA_GETDATA_PREFIX = Triggerd TRIGGER_DATA_GETDATA_EXECUTABLE = $(BIN)/get_trigger_data TRIGGER_DATA_GETDATA_PERIOD = 5m TRIGGER_DATA_GETDATA_RECONFIG = FALSE DAEMON_LIST = $(DAEMON_LIST), JOB_SERVER, TRIGGERD ENABLE_ABSENT_NODES_DETECTION=True DC_DAEMON_LIST = $(DAEMON_LIST), JOB_SERVER, TRIGGERD QMF_BROKER_AUTH_MECH = ANONYMOUS MasterLog: 05/25/11 11:48:46 Timeout for fast shutdown has expired for TRIGGERD. 05/25/11 11:48:46 ProcAPI::buildFamily() Found daddypid on the system: 2436 05/25/11 11:48:46 Sent SIGKILL to TRIGGERD (pid 2436) and all its children. 05/25/11 11:48:46 DaemonCore: No more children processes to reap. 05/25/11 11:48:46 The TRIGGERD (pid 2436) died due to signal 9 (Killed) 05/25/11 11:48:46 ProcAPI::buildFamily failed: parent 2436 not found on system. 05/25/11 11:48:46 ProcAPI::getProcInfo() pid 2436 does not exist. 05/25/11 11:48:46 ProcAPI::getProcInfo() pid 2436 does not exist. 05/25/11 11:48:46 ProcAPI::getProcInfo() pid 2436 does not exist. 05/25/11 11:48:46 ProcAPI::getProcInfo() pid 2436 does not exist. 05/25/11 11:48:46 ProcAPI::getProcInfo() pid 2436 does not exist. 05/25/11 11:48:46 NumberOfChildren() returning 0 05/25/11 11:48:46 All daemons are gone. Exiting. 05/25/11 11:48:46 MgmtMasterPlugin: shutting down... 05/25/11 11:48:47 **** condor_master (condor_MASTER) pid 2424 EXITING WITH STATUS 0
Try with repaired config gives me same results. The triggerd completely ignore such signals. The TriggerLog stay untouched. Parameter changed to: DC_DAEMON_LIST =+ TRIGGERD, JOB_SERVER
What packages on rhel6 were you using when you found the issue? Does it still if you run with the latest qpid packages for rhel6?
Retested over current packages: # rpm -qa | egrep '(qpid|qmf|condor)' | sort condor-7.6.1-0.6.el6.x86_64 condor-aviary-7.6.1-0.6.el6.x86_64 condor-classads-7.6.1-0.6.el6.x86_64 condor-debuginfo-7.6.1-0.6.el6.x86_64 condor-kbdd-7.6.1-0.6.el6.x86_64 condor-qmf-7.6.1-0.6.el6.x86_64 condor-vm-gahp-7.6.1-0.6.el6.x86_64 condor-wallaby-base-db-1.12-1.el6.noarch condor-wallaby-client-4.0-6.el6.noarch condor-wallaby-tools-4.0-6.el6.noarch python-condorutils-1.5-3.el6.noarch python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-7.el6.x86_64 qpid-cpp-client-0.10-5.el6.x86_64 qpid-cpp-client-devel-0.10-5.el6.x86_64 qpid-cpp-client-devel-docs-0.10-5.el6.noarch qpid-cpp-client-rdma-0.10-5.el6.x86_64 qpid-cpp-client-ssl-0.10-5.el6.x86_64 qpid-cpp-server-0.10-5.el6.x86_64 qpid-cpp-server-cluster-0.10-5.el6.x86_64 qpid-cpp-server-devel-0.10-5.el6.x86_64 qpid-cpp-server-rdma-0.10-5.el6.x86_64 qpid-cpp-server-ssl-0.10-5.el6.x86_64 qpid-cpp-server-store-0.10-5.el6.x86_64 qpid-cpp-server-xml-0.10-5.el6.x86_64 qpid-java-client-0.10-6.el6.noarch qpid-java-common-0.10-6.el6.noarch qpid-java-example-0.10-6.el6.noarch qpid-java-jca-0.10-6.el6.noarch qpid-qmf-0.10-7.el6.x86_64 qpid-tests-0.10-1.el6.noarch qpid-tools-0.10-4.el6.noarch rh-qpid-cpp-tests-0.10-5.el6.x86_64 ruby-qpid-0.7.946106-2.el6.x86_64 ruby-qpid-qmf-0.10-7.el6.x86_64 MasterLog: 05/26/11 12:39:02 Got SIGQUIT. Performing fast shutdown. 05/26/11 12:39:02 Trying to update collector <127.0.0.1:9618> 05/26/11 12:39:02 Attempting to send update via UDP to collector localhost.localdomain <127.0.0.1:9618> 05/26/11 12:39:02 NumberOfChildren() returning 6 05/26/11 12:39:02 Send_Signal(): Doing kill(8073,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to COLLECTOR (pid 8073) 05/26/11 12:39:02 Send_Signal(): Doing kill(8083,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to CONFIGD (pid 8083) 05/26/11 12:39:02 Send_Signal(): Doing kill(8079,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to JOB_SERVER (pid 8079) 05/26/11 12:39:02 Send_Signal(): Doing kill(8080,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to NEGOTIATOR (pid 8080) 05/26/11 12:39:02 Send_Signal(): Doing kill(8081,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to STARTD (pid 8081) 05/26/11 12:39:02 Send_Signal(): Doing kill(8082,3) [SIGQUIT] 05/26/11 12:39:02 Sent SIGQUIT to TRIGGERD (pid 8082) 05/26/11 12:39:02 DaemonCore: No more children processes to reap. 05/26/11 12:39:02 The NEGOTIATOR (pid 8080) exited with status 0 05/26/11 12:39:02 ProcAPI::buildFamily failed: parent 8080 not found on system. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8080 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8080 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8080 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8080 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8080 does not exist. 05/26/11 12:39:02 NumberOfChildren() returning 5 05/26/11 12:39:02 DaemonCore: No more children processes to reap. 05/26/11 12:39:02 The CONFIGD (pid 8083) exited with status 0 05/26/11 12:39:02 ProcAPI::buildFamily failed: parent 8083 not found on system. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8083 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8083 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8083 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8083 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8083 does not exist. 05/26/11 12:39:02 NumberOfChildren() returning 4 05/26/11 12:39:02 DaemonCore: No more children processes to reap. 05/26/11 12:39:02 The TRIGGERD (pid 8082) exited with status 0 05/26/11 12:39:02 ProcAPI::buildFamily failed: parent 8082 not found on system. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8082 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8082 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8082 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8082 does not exist. 05/26/11 12:39:02 ProcAPI::getProcInfo() pid 8082 does not exist. 05/26/11 12:39:02 NumberOfChildren() returning 3 05/26/11 12:39:03 DaemonCore: No more children processes to reap. No such error found in latest qpid packages. >>> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: N/A