Bug 828983 - condor resource agent start operation should have verification of startup
condor resource agent start operation should have verification of startup
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-cluster-resource-agent (Show other bugs)
Development
All Linux
medium Severity medium
: 2.3
: ---
Assigned To: Robert Rati
Tomas Rusnak
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-05 13:23 EDT by Robert Rati
Modified: 2013-03-06 13:44 EST (History)
4 users (show)

See Also:
Fixed In Version: condor-7.8.6-0.1
Doc Type: Bug Fix
Doc Text:
Cause: The condor resource agent used with RHHA wouldn't verify that a daemon had started during a start operation Consequence: The start operation could report success when in fact the daemon didn't begin to start Fix: The resource agent now waits 10 seconds to see that the process starts Result: When a start operation reports success, the daemon always will hvae started
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-03-06 13:44:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Robert Rati 2012-06-05 13:23:50 EDT
Description of problem:
Currently, the condor RA uses the return value of daemon to determine if the process started up.  This isn't that reliable for the process state.  Instead, the start operation should perform a check for the condor_schedd process and return success/failure based upon the existence of the process.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Robert Rati 2012-06-20 13:32:09 EDT
The RA will now wait 10 seconds for the condor process to appear before returning success.

Fixed upstream on:
V7_6-branch
Comment 5 Tomas Rusnak 2013-02-20 10:10:52 EST
Tested on:
RH-7.8.8-0.4.1

# ls -la /usr/sbin/condor_schedd
-rwxr-xr-x. 1 root root 12 Feb 20 10:00 /usr/sbin/condor_schedd

# clusvcadm -R "HA Schedd HASchedd1"
Local machine trying to restart service:HA Schedd HASchedd1...Success

# tail -f /var/log/cluster/rgmanager.log
Feb 20 10:07:45 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:07:45 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:07:56 rgmanager [condor] Failed to start condor_schedd HASchedd1
Feb 20 10:08:05 rgmanager [netfs] Checking fs "Job Queue for HASchedd1", Level 0
Feb 20 10:08:15 rgmanager status on condor "HASchedd1" returned 7 (unspecified)
Feb 20 10:08:15 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:08:15 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd2", Level 0
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd3", Level 0
Feb 20 10:08:26 rgmanager [condor] Failed to start condor_schedd HASchedd1

>>> VERIFIED
Comment 7 errata-xmlrpc 2013-03-06 13:44:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html

Note You need to log in before you can comment on or make changes to this bug.