Description of problem: The Condor init script uses killproc, which does take down the condor_master. However, it does not do so gracefully. The condor_master likes to wait around for all daemons it is managing to exit. killproc will initiate this process with a kill, but will escalate things with a kill -9. The result is the master does not get to do its normal cleanup, which includes removing SCHEDD.lock files in an HA Schedd setup. It is possible to skip using the init script for everyday stop'ing of Condor, but many people will use it anyway. Consider re-writing the init script to use condor_off -master, followed by a condor_off -master -fast, and only as a last resort (drastic!) actually kill -9 the master. Version-Release number of selected component (if applicable): 7.2.0-0.1 and before
Using condor_off is a bad idea in an init script for a few reasons, such as 1) it ignores the pidfile and 2) it may be denied by local policy A better solution is to use SIGQUIT, which initiates a fast shutdown This is fixed in 7.2.0-0.3 NOTE: This change stops sending -KILL to the master, which means if it hang during shutdown, it will remain hanging.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html