Description of problem:
Condor Collector may execv /bin/true in order to bypass C++ destructors
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Shutdown Collector
backup of /bin/true
How can we reproduce this? It's easy to shutdown collector, but how can we recognize that there is wrong behaviour? Please be more specific.
This is more of a workitem BZ not necessarily a bug, but to help define behavior.
So here is the logic behind it...
/* On Unix, we define our own exit() call. The reason is messy:
* Basically, daemonCore Create_Thread call fork() on Unix.
* When the forked child calls exit, however, all the class
* destructors are called. However, the code was never written in
* a way that expects the daemons to be forked. For instance, some
* global constructor in the schedd tells the gridmanager to shutdown...
* certainly we do not want this happening in our forked child! Also,
* we've seen problems were the forked child gets stuck in libc realloc
* on Linux trying to free up space in the gsi libraries after being
* called by some global destructor. So.... for now, if we are
* forked via Create_Thread, we have our child exit _without_ calling
* any c++ destructors. How do we accomplish that magic feat? By
* exiting via a call to exec()! So here it is... we overload exit()
* inside of daemonCore -- we do it this way so we catch all calls to
* exit, including ones buried in dprintf etc. Note we dont want to
* do this via a macro setting, because some .C files that call exit
* do not include condor_daemon_core.h, and we don't want to put it
* into condor_common.h (because we only want to overload exit for
* daemonCore daemons). So doing it this way works for all cases.
There is no real way to "fix this." It appears that this was a backstop fix given the architecture.
I will take cleanup into account when refactoring pieces, namely by using shared_ptrs where possible so individual destructors don't cause too much damage.