Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 519437

Summary:	Race condition can cause crash in collector and schedd if a query workers are enabled
Product:	Red Hat Enterprise MRG	Reporter:	Robert Rati <rrati>
Component:	grid	Assignee:	Robert Rati <rrati>
Status:	CLOSED ERRATA	QA Contact:	Martin Kudlej <mkudlej>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	1.1	CC:	lbrindle, matt, mkudlej
Target Milestone:	1.2
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Grid bug fix C: The collector and schedd were susceptible to a race condition. C: The race condition could cause the collector and schedd to core dump in some circumstances. F: Ensured that the termination signal is processed in the correct order. R: The race condition no longer exists, and core dumps no longer occur A race condition was causing the collector and schedd to core dump in some circumstances. The race condition was rectified by ensuring that the termination signal is handled correctly, and the core dumps no longer occur.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-12-03 09:19:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	527551

Description Robert Rati 2009-08-26 15:47:27 UTC

Description of problem:
The collector and schedd are susceptible to a race condition that can cause a core dump.  If a shut down request is sent to the collector or schedd while they are processing a command that forces them to fork a subprocess (condor_q/condor_status), then the termination signal is processed first before cleaning up the child process.  When the child processes information is cleaned up, a core will result.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2009-08-26 18:01:20 UTC

This can be reproduced on the collector by issuing:

condor_status -any & sleep 1 & killall -TERM condor_collector

Comment 2 Robert Rati 2009-08-27 15:38:35 UTC

Fixed in:
condor-7.3.2-0.5

Comment 3 Martin Kudlej 2009-10-09 10:55:37 UTC

I set up CREATE_CORE_FILES = TRUE and ulimit -c to non zero value. After command from first comment there are no coredump in LOG directory.
Tested on RHEL 5.4/4.8 x86_64/i386 with condor-7.4.0-0.(5.el5/4.el4) and it works -->VERIFIED

Comment 4 Irina Boverman 2009-10-29 14:29:56 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
please see bug summary.

Comment 5 Lana Brindley 2009-11-09 00:06:02 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-please see bug summary.+Grid bug fix
+
+C: The collector and schedd were susceptible to a race condition.
+C: The race condition could cause the collector and schedd to core dump in some circumstances.
+F: Ensured that the termination signal is processed in the correct order.
+R: The race condition no longer exists, and core dumps no longer occur
+
+A race condition was causing the collector and schedd to core dump in some circumstances. The race condition was rectified by ensuring that the termination signal is handled correctly, and the core dumps no longer occur.

Comment 7 errata-xmlrpc 2009-12-03 09:19:15 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html