Bug 698946 - gridmanager crashes when a EC2/e job is submitted
Summary: gridmanager crashes when a EC2/e job is submitted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: 2.0
: ---
Assignee: Robert Rati
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks: 679553
TreeView+ depends on / blocked
 
Reported: 2011-04-22 11:48 UTC by Luigi Toscano
Modified: 2011-06-27 14:11 UTC (History)
2 users (show)

Fixed In Version: condor-7.6.1-0.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 14:11:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
Sample job used for testing (387 bytes, text/plain)
2011-04-26 17:57 UTC, Luigi Toscano
no flags Details

Description Luigi Toscano 2011-04-22 11:48:08 UTC
Description of problem:
When a EC2/e job is submitted, gridmanager crashes.

SchedLog:
04/22/11 13:35:47 (pid:14819) Started condor_gmanager for owner condor pid=14918
04/22/11 13:35:55 (pid:14819) condor_gridmanager (PID 14918, owner condor) exited due to signal 11 (Segmentation fault).


GridmanagerLog.<user>:
04/22/11 13:35:50 [14918] ================================>  EC2Job::EC2Job 1 
04/22/11 13:35:50 [14918] Found job 2.0 --- inserting
04/22/11 13:35:50 [14918] (2.0) doEvaluateState called: gmState GM_HOLD, condorState 1
Stack dump for process 14918 at timestamp 1303472155 (11 frames)
condor_gridmanager(dprintf_dump_stack+0x4a)[0x81aedba]
condor_gridmanager[0x81d8a66]
[0xcfc420]
/lib/libc.so.6(cfree+0x34)[0x378ae4]
condor_gridmanager(_ZN6EC2JobD0Ev+0x59)[0x80b6f69]
condor_gridmanager(_Z15doContactScheddv+0xe3a)[0x80d8a2a]
condor_gridmanager(_ZN12TimerManager7TimeoutEv+0x390)[0x8158df0]
condor_gridmanager(_ZN10DaemonCore6DriverEv+0x282)[0x81460a2]
condor_gridmanager(main+0x116f)[0x815cb5f]
/lib/libc.so.6(__libc_start_main+0xdc)[0x324e9c]
condor_gridmanager[0x80b58c1]


The testing environment was configured both manually and using wallaby.


Version-Release number of selected component (if applicable):
condor-7.6.1-0.1.el5
condor-classads-7.6.1-0.1.el5
condor-ec2-enhanced-hooks-1.1-3.el5
condor-qmf-7.6.1-0.1.el5
condor-wallaby-client-4.0-5.el5
python-condorec2e-1.1-3.el5
python-condorutils-1.5-2.el5

Found on RHEL5.6 and RHEL6.1Beta, i386/x86_64.

Comment 1 Luigi Toscano 2011-04-26 17:57:05 UTC
Created attachment 495003 [details]
Sample job used for testing

Comment 2 Robert Rati 2011-04-26 19:22:07 UTC
Issue was with uninitialized pointers being freed.  The EC2Job constructor
could exit w/o all pointers being initialized.

Fixed upstream on V7_6-branch

Comment 3 Luigi Toscano 2011-05-17 17:11:20 UTC
Fix applied, condor_gridmanager does not crash anymore.

Verified on RHEL5.6/6.1, i386/x86_64.
condor-7.6.1-0.4


Note You need to log in before you can comment on or make changes to this bug.