Bug 828983 - condor resource agent start operation should have verification of startup
Summary: condor resource agent start operation should have verification of startup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-cluster-resource-agent
Version: Development
Hardware: All
OS: Linux
medium
medium
Target Milestone: 2.3
: ---
Assignee: Robert Rati
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-05 17:23 UTC by Robert Rati
Modified: 2013-03-06 18:44 UTC (History)
4 users (show)

Fixed In Version: condor-7.8.6-0.1
Doc Type: Bug Fix
Doc Text:
Cause: The condor resource agent used with RHHA wouldn't verify that a daemon had started during a start operation Consequence: The start operation could report success when in fact the daemon didn't begin to start Fix: The resource agent now waits 10 seconds to see that the process starts Result: When a start operation reports success, the daemon always will hvae started
Clone Of:
Environment:
Last Closed: 2013-03-06 18:44:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0564 0 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC

Description Robert Rati 2012-06-05 17:23:50 UTC
Description of problem:
Currently, the condor RA uses the return value of daemon to determine if the process started up.  This isn't that reliable for the process state.  Instead, the start operation should perform a check for the condor_schedd process and return success/failure based upon the existence of the process.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2012-06-20 17:32:09 UTC
The RA will now wait 10 seconds for the condor process to appear before returning success.

Fixed upstream on:
V7_6-branch

Comment 5 Tomas Rusnak 2013-02-20 15:10:52 UTC
Tested on:
RH-7.8.8-0.4.1

# ls -la /usr/sbin/condor_schedd
-rwxr-xr-x. 1 root root 12 Feb 20 10:00 /usr/sbin/condor_schedd

# clusvcadm -R "HA Schedd HASchedd1"
Local machine trying to restart service:HA Schedd HASchedd1...Success

# tail -f /var/log/cluster/rgmanager.log
Feb 20 10:07:45 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:07:45 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:07:56 rgmanager [condor] Failed to start condor_schedd HASchedd1
Feb 20 10:08:05 rgmanager [netfs] Checking fs "Job Queue for HASchedd1", Level 0
Feb 20 10:08:15 rgmanager status on condor "HASchedd1" returned 7 (unspecified)
Feb 20 10:08:15 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:08:15 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd2", Level 0
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd3", Level 0
Feb 20 10:08:26 rgmanager [condor] Failed to start condor_schedd HASchedd1

>>> VERIFIED

Comment 7 errata-xmlrpc 2013-03-06 18:44:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.