Bug 729121

Summary: Investigate Shutdown Semantics of Condor daemons
Product: Red Hat Enterprise MRG Reporter: Timothy St. Clair <tstclair>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 2.0CC: matt, mkudlej, tstclair
Target Milestone: 2.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-27 19:10:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Timothy St. Clair 2011-08-08 18:44:02 UTC
Description of problem:
Condor Collector may execv /bin/true in order to bypass C++ destructors 

Version-Release number of selected component (if applicable):
2.0

How reproducible:
?

Steps to Reproduce:
1. Shutdown Collector

  
Actual results:
backup of /bin/true 

Expected results:
Normal shutdown. 


Additional info:
In daemon_core.

Comment 2 Martin Kudlej 2012-03-08 09:30:39 UTC
How can we reproduce this? It's easy to shutdown collector, but how can we recognize that there is wrong behaviour? Please be more specific.

Comment 3 Timothy St. Clair 2012-03-13 14:46:19 UTC
This is more of a workitem BZ not necessarily a bug, but to help define behavior.

Comment 4 Timothy St. Clair 2012-03-27 19:10:03 UTC
So here is the logic behind it... 

/* On Unix, we define our own exit() call.  The reason is messy:
* Basically, daemonCore Create_Thread call fork() on Unix.
* When the forked child calls exit, however, all the class
* destructors are called.  However, the code was never written in
* a way that expects the daemons to be forked.  For instance, some
* global constructor in the schedd tells the gridmanager to shutdown...
* certainly we do not want this happening in our forked child!  Also,
* we've seen problems were the forked child gets stuck in libc realloc
* on Linux trying to free up space in the gsi libraries after being
* called by some global destructor.  So.... for now, if we are
* forked via Create_Thread, we have our child exit _without_ calling
* any c++ destructors.  How do we accomplish that magic feat?  By
* exiting via a call to exec()!  So here it is... we overload exit()
* inside of daemonCore -- we do it this way so we catch all calls to
* exit, including ones buried in dprintf etc.  Note we dont want to
* do this via a macro setting, because some .C files that call exit
* do not include condor_daemon_core.h, and we don't want to put it
* into condor_common.h (because we only want to overload exit for
* daemonCore daemons).  So doing it this way works for all cases.
*/

There is no real way to "fix this."  It appears that this was a backstop fix given the architecture. 

I will take cleanup into account when refactoring pieces, namely by using shared_ptrs where possible so individual destructors don't cause too much damage.