Bug 729121 - Investigate Shutdown Semantics of Condor daemons
Summary: Investigate Shutdown Semantics of Condor daemons
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: All
OS: Linux
medium
low
Target Milestone: 2.2
: ---
Assignee: Timothy St. Clair
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-08 18:44 UTC by Timothy St. Clair
Modified: 2012-03-27 19:10 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-27 19:10:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Timothy St. Clair 2011-08-08 18:44:02 UTC
Description of problem:
Condor Collector may execv /bin/true in order to bypass C++ destructors 

Version-Release number of selected component (if applicable):
2.0

How reproducible:
?

Steps to Reproduce:
1. Shutdown Collector

  
Actual results:
backup of /bin/true 

Expected results:
Normal shutdown. 


Additional info:
In daemon_core.

Comment 2 Martin Kudlej 2012-03-08 09:30:39 UTC
How can we reproduce this? It's easy to shutdown collector, but how can we recognize that there is wrong behaviour? Please be more specific.

Comment 3 Timothy St. Clair 2012-03-13 14:46:19 UTC
This is more of a workitem BZ not necessarily a bug, but to help define behavior.

Comment 4 Timothy St. Clair 2012-03-27 19:10:03 UTC
So here is the logic behind it... 

/* On Unix, we define our own exit() call.  The reason is messy:
* Basically, daemonCore Create_Thread call fork() on Unix.
* When the forked child calls exit, however, all the class
* destructors are called.  However, the code was never written in
* a way that expects the daemons to be forked.  For instance, some
* global constructor in the schedd tells the gridmanager to shutdown...
* certainly we do not want this happening in our forked child!  Also,
* we've seen problems were the forked child gets stuck in libc realloc
* on Linux trying to free up space in the gsi libraries after being
* called by some global destructor.  So.... for now, if we are
* forked via Create_Thread, we have our child exit _without_ calling
* any c++ destructors.  How do we accomplish that magic feat?  By
* exiting via a call to exec()!  So here it is... we overload exit()
* inside of daemonCore -- we do it this way so we catch all calls to
* exit, including ones buried in dprintf etc.  Note we dont want to
* do this via a macro setting, because some .C files that call exit
* do not include condor_daemon_core.h, and we don't want to put it
* into condor_common.h (because we only want to overload exit for
* daemonCore daemons).  So doing it this way works for all cases.
*/

There is no real way to "fix this."  It appears that this was a backstop fix given the architecture. 

I will take cleanup into account when refactoring pieces, namely by using shared_ptrs where possible so individual destructors don't cause too much damage.


Note You need to log in before you can comment on or make changes to this bug.