Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 808137

Summary:	condor_startd reconfig from partitionable slots to static slots crashes if job is running
Product:	Red Hat Enterprise MRG	Reporter:	Matthew Farrellee <matt>
Component:	condor	Assignee:	grid-maint-list <grid-maint-list>
Status:	CLOSED WONTFIX	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	low	Docs Contact:
Priority:	low
Version:	1.3	CC:	matt, tstclair
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-05-26 19:57:30 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2012-03-29 17:05:16 UTC

Description of problem:

The condor_startd allows a reconfig to change its slot configuration. If a job is running during that reconfiguration, the startd will crash.


Version-Release number of selected component (if applicable):

Likely all, definitely 7.6.7-0.8.


How reproducible:

100%


Steps to Reproduce:
0. Enable partitionable slots

 SLOT_TYPE_1=cpus=100%
 NUM_SLOTS=1
 NUM_SLOTS_TYPE_1 = 1
 SLOT_TYPE_1_PARTITIONABLE=true

1. service condor start

2. echo 'cmd=/bin/sleep\nargs=1d\nqueue' | condor_submit

3. Wait for job to start running

4. Disable partitionable slots

 #SLOT_TYPE_1=cpus=100%
 #NUM_SLOTS=1
 #NUM_SLOTS_TYPE_1 = 1
 #SLOT_TYPE_1_PARTITIONABLE=true

5. condor_reconfig


Actual results:

In MasterLog -

 The STARTD (pid 10060) died due to signal 11 (Segmentation fault)

In StartLog -

 Stack dump for process 10060 at timestamp 1333040167 (10 frames)
 condor_startd(dprintf_dump_stack+0x63)[0x532f93]
 condor_startd(linux_sig_coredump(int)+0x40)[0x4aeeb0]
 /lib64/libpthread.so.0[0x376300f4a0]
 condor_startd(ResMgr::reconfig_resources()+0x1fb)[0x4830eb]
 condor_startd(main_config()+0x17)[0x470437]
 condor_startd(handle_dc_sighup(Service*, int)+0x1a)[0x4b110a]
 condor_startd(DaemonCore::Driver()+0x231)[0x4a2e91]
 condor_startd(main+0x10fb)[0x4b08ab]
 /lib64/libc.so.6(__libc_start_main+0xfd)[0x3762c1ec5d]
 condor_startd[0x469b99]


Expected results:

No crash.

Comment 2 Anne-Louise Tangring 2016-05-26 19:57:30 UTC

MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.