Bug 601677 - carod dumps corefile if condor is restarting and "grid" queue isn't in broker
carod dumps corefile if condor is restarting and "grid" queue isn't in broker
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
Development
All Linux
high Severity high
: 1.3
: ---
Assigned To: Robert Rati
Martin Kudlej
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-08 08:55 EDT by Martin Kudlej
Modified: 2010-10-20 07:28 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 07:28:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer (173 bytes, application/x-sh)
2010-06-08 08:55 EDT, Martin Kudlej
no flags Details
core dump (12.23 KB, text/plain)
2010-06-08 09:04 EDT, Martin Kudlej
no flags Details

  None (edit)
Description Martin Kudlej 2010-06-08 08:55:43 EDT
Created attachment 422180 [details]
reproducer

Version-Release number of selected component (if applicable):
condor-low-latency-1.1-0.1.el5
condor-7.4.3-0.17.el5
condor-job-hooks-1.4-0.3.el5

How reproducible:
100%


Steps to Reproduce:
1. set up low latency
2. run test script
  
Actual results:
It dumps corefile.

Expected results:
There is no corefile.
Comment 1 Martin Kudlej 2010-06-08 09:01:07 EDT
Script cmd_args.py is example from http://git.fedorahosted.org/git/grid/carod.git
Comment 2 Martin Kudlej 2010-06-08 09:04:44 EDT
Created attachment 422182 [details]
core dump
Comment 3 Martin Kudlej 2010-06-08 09:05:43 EDT
MasterLog:
06/08 08:24:10 ******************************************************
06/08 08:24:10 ** condor_master (CONDOR_MASTER) STARTING UP
06/08 08:24:10 ** /usr/sbin/condor_master
06/08 08:24:10 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
06/08 08:24:10 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
06/08 08:24:10 ** $CondorVersion: 7.4.3 Jun  1 2010 BuildID: RH-7.4.3-0.17.el5 PRE-RELEASE $
06/08 08:24:10 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
06/08 08:24:10 ** PID = 4202
06/08 08:24:10 ** Log last touched 6/8 08:24:09
06/08 08:24:10 ******************************************************
06/08 08:24:10 Using config source: /etc/condor/condor_config
06/08 08:24:10 Using local config sources:
06/08 08:24:10    /var/lib/condor/condor_config.local
06/08 08:24:10 DaemonCore: Command Socket at <10.16.45.161:43759>
06/08 08:24:10 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 4204
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 4206
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 4207
06/08 08:24:13 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 4208
06/08 08:24:13 Started process "/usr/sbin/carod", pid and pgroup = 4209
06/08 08:30:51 Got SIGQUIT.  Performing fast shutdown.
06/08 08:30:51 Sent SIGQUIT to COLLECTOR (pid 4204)
06/08 08:30:51 Sent SIGQUIT to LL_DAEMON (pid 4209)
06/08 08:30:51 Sent SIGQUIT to NEGOTIATOR (pid 4206)
06/08 08:30:51 Sent SIGQUIT to SCHEDD (pid 4207)
06/08 08:30:51 Sent SIGQUIT to STARTD (pid 4208)
06/08 08:30:51 The COLLECTOR (pid 4204) exited with status 0
06/08 08:30:51 The NEGOTIATOR (pid 4206) exited with status 0
06/08 08:30:51 The SCHEDD (pid 4207) exited with status 0
06/08 08:30:51 The STARTD (pid 4208) exited with status 0
06/08 08:30:51 The LL_DAEMON (pid 4209) died due to signal 3 (Quit)
06/08 08:30:51 All daemons are gone.  Exiting.
06/08 08:30:51 **** condor_master (condor_MASTER) pid 4202 EXITING WITH STATUS 0
Comment 4 Robert Rati 2010-06-09 11:34:27 EDT
This can be reproduced by restarting condor when it is configured to run the LL_DAEMON.  The issue is that carod wasn't catching SIGQUIT, which condor sends on a shutdown.

Fixed in:
condor-low-latency-1.1-0.2
Comment 5 Martin Kudlej 2010-08-19 07:29:38 EDT
I've tested this on RHEL 4.8/5.5 x i386/x86_64 with condor-low-latency-1.1-0.2 and it works as is expected. --> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.