729121 – Investigate Shutdown Semantics of Condor daemons

Bug 729121 - Investigate Shutdown Semantics of Condor daemons

Summary: Investigate Shutdown Semantics of Condor daemons

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	condor
Sub Component:
Version:	2.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	2.2
Target Release:	---
Assignee:	Timothy St. Clair
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-08-08 18:44 UTC by Timothy St. Clair
Modified:	2012-03-27 19:10 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-03-27 19:10:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Timothy St. Clair 2011-08-08 18:44:02 UTC

Description of problem:
Condor Collector may execv /bin/true in order to bypass C++ destructors 

Version-Release number of selected component (if applicable):
2.0

How reproducible:
?

Steps to Reproduce:
1. Shutdown Collector

  
Actual results:
backup of /bin/true 

Expected results:
Normal shutdown. 


Additional info:
In daemon_core.

Comment 2 Martin Kudlej 2012-03-08 09:30:39 UTC

How can we reproduce this? It's easy to shutdown collector, but how can we recognize that there is wrong behaviour? Please be more specific.

Comment 3 Timothy St. Clair 2012-03-13 14:46:19 UTC

This is more of a workitem BZ not necessarily a bug, but to help define behavior.

Comment 4 Timothy St. Clair 2012-03-27 19:10:03 UTC

So here is the logic behind it...

/* On Unix, we define our own exit() call. The reason is messy:
* Basically, daemonCore Create_Thread call fork() on Unix.
* When the forked child calls exit, however, all the class
* destructors are called. However, the code was never written in
* a way that expects the daemons to be forked. For instance, some
* global constructor in the schedd tells the gridmanager to shutdown...
* certainly we do not want this happening in our forked child! Also,
* we've seen problems were the forked child gets stuck in libc realloc
* on Linux trying to free up space in the gsi libraries after being
* called by some global destructor. So.... for now, if we are
* forked via Create_Thread, we have our child exit _without_ calling
* any c++ destructors. How do we accomplish that magic feat? By
* exiting via a call to exec()! So here it is... we overload exit()
* inside of daemonCore -- we do it this way so we catch all calls to
* exit, including ones buried in dprintf etc. Note we dont want to
* do this via a macro setting, because some .C files that call exit
* do not include condor_daemon_core.h, and we don't want to put it
* into condor_common.h (because we only want to overload exit for
* daemonCore daemons). So doing it this way works for all cases.
*/

There is no real way to "fix this." It appears that this was a backstop fix given the architecture.

I will take cleanup into account when refactoring pieces, namely by using shared_ptrs where possible so individual destructors don't cause too much damage.

Note You need to log in before you can comment on or make changes to this bug.