Bug 828983
Summary: | condor resource agent start operation should have verification of startup | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Robert Rati <rrati> |
Component: | condor-cluster-resource-agent | Assignee: | Robert Rati <rrati> |
Status: | CLOSED ERRATA | QA Contact: | Tomas Rusnak <trusnak> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | Development | CC: | matt, mkudlej, trusnak, tstclair |
Target Milestone: | 2.3 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | condor-7.8.6-0.1 | Doc Type: | Bug Fix |
Doc Text: |
Cause: The condor resource agent used with RHHA wouldn't verify that a daemon had started during a start operation
Consequence: The start operation could report success when in fact the daemon didn't begin to start
Fix: The resource agent now waits 10 seconds to see that the process starts
Result: When a start operation reports success, the daemon always will hvae started
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-03-06 18:44:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Robert Rati
2012-06-05 17:23:50 UTC
The RA will now wait 10 seconds for the condor process to appear before returning success. Fixed upstream on: V7_6-branch Tested on:
RH-7.8.8-0.4.1
# ls -la /usr/sbin/condor_schedd
-rwxr-xr-x. 1 root root 12 Feb 20 10:00 /usr/sbin/condor_schedd
# clusvcadm -R "HA Schedd HASchedd1"
Local machine trying to restart service:HA Schedd HASchedd1...Success
# tail -f /var/log/cluster/rgmanager.log
Feb 20 10:07:45 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:07:45 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:07:56 rgmanager [condor] Failed to start condor_schedd HASchedd1
Feb 20 10:08:05 rgmanager [netfs] Checking fs "Job Queue for HASchedd1", Level 0
Feb 20 10:08:15 rgmanager status on condor "HASchedd1" returned 7 (unspecified)
Feb 20 10:08:15 rgmanager [condor] Stopping condor_schedd HASchedd1
Feb 20 10:08:15 rgmanager [condor] Starting condor_schedd HASchedd1
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd2", Level 0
Feb 20 10:08:25 rgmanager [netfs] Checking fs "Job Queue for HASchedd3", Level 0
Feb 20 10:08:26 rgmanager [condor] Failed to start condor_schedd HASchedd1
>>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html |