Created attachment 422180 [details] reproducer Version-Release number of selected component (if applicable): condor-low-latency-1.1-0.1.el5 condor-7.4.3-0.17.el5 condor-job-hooks-1.4-0.3.el5 How reproducible: 100% Steps to Reproduce: 1. set up low latency 2. run test script Actual results: It dumps corefile. Expected results: There is no corefile.
Script cmd_args.py is example from http://git.fedorahosted.org/git/grid/carod.git
Created attachment 422182 [details] core dump
MasterLog: 06/08 08:24:10 ****************************************************** 06/08 08:24:10 ** condor_master (CONDOR_MASTER) STARTING UP 06/08 08:24:10 ** /usr/sbin/condor_master 06/08 08:24:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 06/08 08:24:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 06/08 08:24:10 ** $CondorVersion: 7.4.3 Jun 1 2010 BuildID: RH-7.4.3-0.17.el5 PRE-RELEASE $ 06/08 08:24:10 ** $CondorPlatform: X86_64-LINUX_RHEL5 $ 06/08 08:24:10 ** PID = 4202 06/08 08:24:10 ** Log last touched 6/8 08:24:09 06/08 08:24:10 ****************************************************** 06/08 08:24:10 Using config source: /etc/condor/condor_config 06/08 08:24:10 Using local config sources: 06/08 08:24:10 /var/lib/condor/condor_config.local 06/08 08:24:10 DaemonCore: Command Socket at <10.16.45.161:43759> 06/08 08:24:10 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 4204 06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 4206 06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 4207 06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 4208 06/08 08:24:13 Started process "/usr/sbin/carod", pid and pgroup = 4209 06/08 08:30:51 Got SIGQUIT. Performing fast shutdown. 06/08 08:30:51 Sent SIGQUIT to COLLECTOR (pid 4204) 06/08 08:30:51 Sent SIGQUIT to LL_DAEMON (pid 4209) 06/08 08:30:51 Sent SIGQUIT to NEGOTIATOR (pid 4206) 06/08 08:30:51 Sent SIGQUIT to SCHEDD (pid 4207) 06/08 08:30:51 Sent SIGQUIT to STARTD (pid 4208) 06/08 08:30:51 The COLLECTOR (pid 4204) exited with status 0 06/08 08:30:51 The NEGOTIATOR (pid 4206) exited with status 0 06/08 08:30:51 The SCHEDD (pid 4207) exited with status 0 06/08 08:30:51 The STARTD (pid 4208) exited with status 0 06/08 08:30:51 The LL_DAEMON (pid 4209) died due to signal 3 (Quit) 06/08 08:30:51 All daemons are gone. Exiting. 06/08 08:30:51 **** condor_master (condor_MASTER) pid 4202 EXITING WITH STATUS 0
This can be reproduced by restarting condor when it is configured to run the LL_DAEMON. The issue is that carod wasn't catching SIGQUIT, which condor sends on a shutdown. Fixed in: condor-low-latency-1.1-0.2
I've tested this on RHEL 4.8/5.5 x i386/x86_64 with condor-low-latency-1.1-0.2 and it works as is expected. --> VERIFIED