Bug 400211 - RFE: Provide a method to fail over a service to another node after X number of restarts.
Summary: RFE: Provide a method to fail over a service to another node after X number o...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager   
(Show other bugs)
Version: 5.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Lon Hohberger
QA Contact:
URL:
Whiteboard:
Keywords:
: 247139 431130 (view as bug list)
Depends On: 247139
Blocks: 429350 431130
TreeView+ depends on / blocked
 
Reported: 2007-11-26 21:49 UTC by Lon Hohberger
Modified: 2018-10-19 23:22 UTC (History)
4 users (show)

Fixed In Version: RHBA-2008-0353
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 14:30:46 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0353 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2008-05-20 12:46:24 UTC

Description Lon Hohberger 2007-11-26 21:49:32 UTC
For RH clustering the default recovery policy (when no recovery policy is
specified) on RHEL4 cluster is restart - "Restart the service in the node the
service is currently located.".

The desired behavior is to retry a restart on the same node, if that fails again
a failover to another node should be done.

Current default behavior appears endless attempts to restart.

RHEL3 had relocate after X restarts as well as a notion of "false"
starts, which is a special kind of restart.

-- Additional comment from lhh@redhat.com on 2007-11-20 18:06 EST --

Following patch allows a user to set max_restarts="x" and
restart_expire_time="y" in cluster.conf for services and virtual machines.

Basically, restart_expire_time allows one to throttle restarts - that is, if X
restarts occur within Y seconds, relocate the service instead of restarting it.
 Requires recovery_policy="restart" (which is the default).

Also makes the parsing of time values more robust; you can now enter things
like : 1h30m as part of resource metadata or in cluster.conf.

-- Additional comment from lhh@redhat.com on 2007-11-20 18:55 EST --
Created an attachment (id=265531)
Tested patch


Patches in CVS.

Comment 2 Lon Hohberger 2007-11-26 21:52:12 UTC
*** Bug 247139 has been marked as a duplicate of this bug. ***

Comment 3 RHEL Product and Program Management 2007-11-26 21:54:28 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Lon Hohberger 2008-02-01 15:15:33 UTC
*** Bug 431130 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2008-05-21 14:30:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0353.html



Note You need to log in before you can comment on or make changes to this bug.