Bug 808137 - condor_startd reconfig from partitionable slots to static slots crashes if job is running
condor_startd reconfig from partitionable slots to static slots crashes if jo...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.3
All All
low Severity low
: ---
: ---
Assigned To: grid-maint-list
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-29 13:05 EDT by Matthew Farrellee
Modified: 2016-05-26 15:57 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-26 15:57:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2012-03-29 13:05:16 EDT
Description of problem:

The condor_startd allows a reconfig to change its slot configuration. If a job is running during that reconfiguration, the startd will crash.


Version-Release number of selected component (if applicable):

Likely all, definitely 7.6.7-0.8.


How reproducible:

100%


Steps to Reproduce:
0. Enable partitionable slots

 SLOT_TYPE_1=cpus=100%
 NUM_SLOTS=1
 NUM_SLOTS_TYPE_1 = 1
 SLOT_TYPE_1_PARTITIONABLE=true

1. service condor start

2. echo 'cmd=/bin/sleep\nargs=1d\nqueue' | condor_submit

3. Wait for job to start running

4. Disable partitionable slots

 #SLOT_TYPE_1=cpus=100%
 #NUM_SLOTS=1
 #NUM_SLOTS_TYPE_1 = 1
 #SLOT_TYPE_1_PARTITIONABLE=true

5. condor_reconfig


Actual results:

In MasterLog -

 The STARTD (pid 10060) died due to signal 11 (Segmentation fault)

In StartLog -

 Stack dump for process 10060 at timestamp 1333040167 (10 frames)
 condor_startd(dprintf_dump_stack+0x63)[0x532f93]
 condor_startd(linux_sig_coredump(int)+0x40)[0x4aeeb0]
 /lib64/libpthread.so.0[0x376300f4a0]
 condor_startd(ResMgr::reconfig_resources()+0x1fb)[0x4830eb]
 condor_startd(main_config()+0x17)[0x470437]
 condor_startd(handle_dc_sighup(Service*, int)+0x1a)[0x4b110a]
 condor_startd(DaemonCore::Driver()+0x231)[0x4a2e91]
 condor_startd(main+0x10fb)[0x4b08ab]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x3762c1ec5d]
 condor_startd[0x469b99]


Expected results:

No crash.
Comment 2 Anne-Louise Tangring 2016-05-26 15:57:30 EDT
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.

Note You need to log in before you can comment on or make changes to this bug.