Bug 808137

Summary: condor_startd reconfig from partitionable slots to static slots crashes if job is running
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 1.3CC: matt, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 19:57:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2012-03-29 17:05:16 UTC
Description of problem:

The condor_startd allows a reconfig to change its slot configuration. If a job is running during that reconfiguration, the startd will crash.


Version-Release number of selected component (if applicable):

Likely all, definitely 7.6.7-0.8.


How reproducible:

100%


Steps to Reproduce:
0. Enable partitionable slots

 SLOT_TYPE_1=cpus=100%
 NUM_SLOTS=1
 NUM_SLOTS_TYPE_1 = 1
 SLOT_TYPE_1_PARTITIONABLE=true

1. service condor start

2. echo 'cmd=/bin/sleep\nargs=1d\nqueue' | condor_submit

3. Wait for job to start running

4. Disable partitionable slots

 #SLOT_TYPE_1=cpus=100%
 #NUM_SLOTS=1
 #NUM_SLOTS_TYPE_1 = 1
 #SLOT_TYPE_1_PARTITIONABLE=true

5. condor_reconfig


Actual results:

In MasterLog -

 The STARTD (pid 10060) died due to signal 11 (Segmentation fault)

In StartLog -

 Stack dump for process 10060 at timestamp 1333040167 (10 frames)
 condor_startd(dprintf_dump_stack+0x63)[0x532f93]
 condor_startd(linux_sig_coredump(int)+0x40)[0x4aeeb0]
 /lib64/libpthread.so.0[0x376300f4a0]
 condor_startd(ResMgr::reconfig_resources()+0x1fb)[0x4830eb]
 condor_startd(main_config()+0x17)[0x470437]
 condor_startd(handle_dc_sighup(Service*, int)+0x1a)[0x4b110a]
 condor_startd(DaemonCore::Driver()+0x231)[0x4a2e91]
 condor_startd(main+0x10fb)[0x4b08ab]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x3762c1ec5d]
 condor_startd[0x469b99]


Expected results:

No crash.

Comment 2 Anne-Louise Tangring 2016-05-26 19:57:30 UTC
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.