Bug 601677 - carod dumps corefile if condor is restarting and "grid" queue isn't in broker
Summary: carod dumps corefile if condor is restarting and "grid" queue isn't in broker
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: Development
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Robert Rati
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-08 12:55 UTC by Martin Kudlej
Modified: 2010-10-20 11:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-20 11:28:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer (173 bytes, application/x-sh)
2010-06-08 12:55 UTC, Martin Kudlej
no flags Details
core dump (12.23 KB, text/plain)
2010-06-08 13:04 UTC, Martin Kudlej
no flags Details

Description Martin Kudlej 2010-06-08 12:55:43 UTC
Created attachment 422180 [details]
reproducer

Version-Release number of selected component (if applicable):
condor-low-latency-1.1-0.1.el5
condor-7.4.3-0.17.el5
condor-job-hooks-1.4-0.3.el5

How reproducible:
100%


Steps to Reproduce:
1. set up low latency
2. run test script
  
Actual results:
It dumps corefile.

Expected results:
There is no corefile.

Comment 1 Martin Kudlej 2010-06-08 13:01:07 UTC
Script cmd_args.py is example from http://git.fedorahosted.org/git/grid/carod.git

Comment 2 Martin Kudlej 2010-06-08 13:04:44 UTC
Created attachment 422182 [details]
core dump

Comment 3 Martin Kudlej 2010-06-08 13:05:43 UTC
MasterLog:
06/08 08:24:10 ******************************************************
06/08 08:24:10 ** condor_master (CONDOR_MASTER) STARTING UP
06/08 08:24:10 ** /usr/sbin/condor_master
06/08 08:24:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
06/08 08:24:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
06/08 08:24:10 ** $CondorVersion: 7.4.3 Jun  1 2010 BuildID: RH-7.4.3-0.17.el5 PRE-RELEASE $
06/08 08:24:10 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
06/08 08:24:10 ** PID = 4202
06/08 08:24:10 ** Log last touched 6/8 08:24:09
06/08 08:24:10 ******************************************************
06/08 08:24:10 Using config source: /etc/condor/condor_config
06/08 08:24:10 Using local config sources:
06/08 08:24:10    /var/lib/condor/condor_config.local
06/08 08:24:10 DaemonCore: Command Socket at <10.16.45.161:43759>
06/08 08:24:10 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 4204
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 4206
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 4207
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 4208
06/08 08:24:13 Started process "/usr/sbin/carod", pid and pgroup = 4209
06/08 08:30:51 Got SIGQUIT.  Performing fast shutdown.
06/08 08:30:51 Sent SIGQUIT to COLLECTOR (pid 4204)
06/08 08:30:51 Sent SIGQUIT to LL_DAEMON (pid 4209)
06/08 08:30:51 Sent SIGQUIT to NEGOTIATOR (pid 4206)
06/08 08:30:51 Sent SIGQUIT to SCHEDD (pid 4207)
06/08 08:30:51 Sent SIGQUIT to STARTD (pid 4208)
06/08 08:30:51 The COLLECTOR (pid 4204) exited with status 0
06/08 08:30:51 The NEGOTIATOR (pid 4206) exited with status 0
06/08 08:30:51 The SCHEDD (pid 4207) exited with status 0
06/08 08:30:51 The STARTD (pid 4208) exited with status 0
06/08 08:30:51 The LL_DAEMON (pid 4209) died due to signal 3 (Quit)
06/08 08:30:51 All daemons are gone.  Exiting.
06/08 08:30:51 **** condor_master (condor_MASTER) pid 4202 EXITING WITH STATUS 0

Comment 4 Robert Rati 2010-06-09 15:34:27 UTC
This can be reproduced by restarting condor when it is configured to run the LL_DAEMON.  The issue is that carod wasn't catching SIGQUIT, which condor sends on a shutdown.

Fixed in:
condor-low-latency-1.1-0.2

Comment 5 Martin Kudlej 2010-08-19 11:29:38 UTC
I've tested this on RHEL 4.8/5.5 x i386/x86_64 with condor-low-latency-1.1-0.2 and it works as is expected. --> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.