Hide Forgot
Description of problem: When a EC2/e job is submitted, gridmanager crashes. SchedLog: 04/22/11 13:35:47 (pid:14819) Started condor_gmanager for owner condor pid=14918 04/22/11 13:35:55 (pid:14819) condor_gridmanager (PID 14918, owner condor) exited due to signal 11 (Segmentation fault). GridmanagerLog.<user>: 04/22/11 13:35:50 [14918] ================================> EC2Job::EC2Job 1 04/22/11 13:35:50 [14918] Found job 2.0 --- inserting 04/22/11 13:35:50 [14918] (2.0) doEvaluateState called: gmState GM_HOLD, condorState 1 Stack dump for process 14918 at timestamp 1303472155 (11 frames) condor_gridmanager(dprintf_dump_stack+0x4a)[0x81aedba] condor_gridmanager[0x81d8a66] [0xcfc420] /lib/libc.so.6(cfree+0x34)[0x378ae4] condor_gridmanager(_ZN6EC2JobD0Ev+0x59)[0x80b6f69] condor_gridmanager(_Z15doContactScheddv+0xe3a)[0x80d8a2a] condor_gridmanager(_ZN12TimerManager7TimeoutEv+0x390)[0x8158df0] condor_gridmanager(_ZN10DaemonCore6DriverEv+0x282)[0x81460a2] condor_gridmanager(main+0x116f)[0x815cb5f] /lib/libc.so.6(__libc_start_main+0xdc)[0x324e9c] condor_gridmanager[0x80b58c1] The testing environment was configured both manually and using wallaby. Version-Release number of selected component (if applicable): condor-7.6.1-0.1.el5 condor-classads-7.6.1-0.1.el5 condor-ec2-enhanced-hooks-1.1-3.el5 condor-qmf-7.6.1-0.1.el5 condor-wallaby-client-4.0-5.el5 python-condorec2e-1.1-3.el5 python-condorutils-1.5-2.el5 Found on RHEL5.6 and RHEL6.1Beta, i386/x86_64.
Created attachment 495003 [details] Sample job used for testing
Issue was with uninitialized pointers being freed. The EC2Job constructor could exit w/o all pointers being initialized. Fixed upstream on V7_6-branch
Fix applied, condor_gridmanager does not crash anymore. Verified on RHEL5.6/6.1, i386/x86_64. condor-7.6.1-0.4