Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 515858

Summary:	RHEL 5: Documentation: Provide information about cluster service status check and failover timeout
Product:	Red Hat Enterprise Linux 5	Reporter:	Paul Kennedy <pkennedy>
Component:	Documentation-cluster	Assignee:	Steven J. Levine <slevine>
Status:	CLOSED CURRENTRELEASE	QA Contact:	ecs-bugs
Severity:	medium	Docs Contact:
Priority:	low
Version:	5.5	CC:	adstrong, jskeoch, lhh, mhideo, slevine, ssaha
Target Milestone:	rc	Keywords:	Documentation
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	717008 (view as bug list)		Environment:
Last Closed:	2011-07-25 13:08:58 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	717008

Description Paul Kennedy 2009-08-06 04:06:35 UTC

Description of problem:

Add information about cluster service:

1. How do you increase the service status check interval? By default it appears to be set to 30 seconds.

2. Is there a failover timeout? For example, if a service has to be
relocated to another node, is there a particular number of seconds for
the process to complete otherwise the service is marked as failed? If
so, can you adjust this timeout?

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 4 RHEL Program Management 2010-08-09 18:17:31 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 5 Steven J. Levine 2011-05-23 20:26:02 UTC

I'm reassigning this to me: I have documented these issues (at least for Conga) in RHEL 6, so I can look at whether this applies to RHEL 5.

Comment 6 Steven J. Levine 2011-06-03 17:40:06 UTC

Lon: This bug -- which has been around for while -- just came into my purview and I'm not sure where to take it (I misinterpreted it at first glance).

Is there an answer to questions 1 and 2 in the bug description?  It doesn't look as if these are documented. This is a RHEL 5 bug, but I can find nothing about this in the current RHEL 6 documentation either.

Any advice about where to take this?

Comment 7 Lon Hohberger 2011-06-22 20:34:31 UTC

* rgmanager checks the status of individual resources, not whole services.  This is a change from clumanager on RHEL3, which periodically checked the status of the whole service.  Every 10 seconds, rgmanager scans the resource tree, looking for resources which have passed their "status check" interval.

* Each resource agent specifies the amount of time between periodic status checks.  Each resource utilizes these timeout values unless explicitly overridden in cluster.conf using the special <action> tag:

    <action name="status" depth="*" interval="10" />

  This tag is a special child of the resource itself in cluster.conf.  For example, if you had a file system resource for which you wanted to override the status check interval.  This becomes a bit confusing when placed next to child resources, unfortunately:

    <fs name="test" device="/dev/sdb3">
      <action name="status" depth="*" interval="10" />
      <nfsexport...>
      </nfsexport>
    </fs>

* Some agents provide multiple "depths" of checking.  For example, a normal file system status check (depth 0) is simply "is it mounted in the right place?".  A more intensive check is depth 10, which is "can I read a file from this?".  Yet, a more intensive check is depth 20, which is "can I write to this file system?".  In the previous example, I used '*', which means "use these values for all depths".  The result is that the "test" file system is checked at the highest-defined depth provided by the resource-agent (in this case, 20) every 10 seconds.

* There is no timeout for starting, stopping, or failing over resources.  Some resources take an indeterminately long amount of time to start or stop.  Unfortunately, a failure to stop (including a timeout) renders the service inoperable (failed state).  You can, if desired, turn on timeout enforcement on each resource in a service individually by adding __enforce_timeouts="1" to the reference in the cluster.conf.  See the following for an example.

   https://access.redhat.com/kb/docs/DOC-43572

Comment 10 Steven J. Levine 2011-07-18 16:02:12 UTC

This is the latest build:

Red_Hat_Enterprise_Linux-Cluster_Administration-5-web-en-US-5-30_el6eng

The typo is fixed here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/ap-status-check-CA.html