Description of problem: Currently, the condor RA uses the return value of daemon to determine if the process started up. This isn't that reliable for the process state. Instead, the start operation should perform a check for the condor_schedd process and return success/failure based upon the existence of the process. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The RA will now wait 10 seconds for the condor process to appear before returning success. Fixed upstream on: V7_6-branch
Tested on: RH-7.8.8-0.4.1 # ls -la /usr/sbin/condor_schedd -rwxr-xr-x. 1 root root 12 Feb 20 10:00 /usr/sbin/condor_schedd # clusvcadm -R "HA Schedd HASchedd1" Local machine trying to restart service:HA Schedd HASchedd1...Success # tail -f /var/log/cluster/rgmanager.log Feb 20 10:07:45 rgmanager [condor] Stopping condor_schedd HASchedd1 Feb 20 10:07:45 rgmanager [condor] Starting condor_schedd HASchedd1 Feb 20 10:07:56 rgmanager [condor] Failed to start condor_schedd HASchedd1 Feb 20 10:08:05 rgmanager [netfs] Checking fs "Job Queue for HASchedd1", Level 0 Feb 20 10:08:15 rgmanager status on condor "HASchedd1" returned 7 (unspecified) Feb 20 10:08:15 rgmanager [condor] Stopping condor_schedd HASchedd1 Feb 20 10:08:15 rgmanager [condor] Starting condor_schedd HASchedd1 Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd2", Level 0 Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd3", Level 0 Feb 20 10:08:26 rgmanager [condor] Failed to start condor_schedd HASchedd1 >>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html