Bug 634277

Summary: RFE: Critical/Non-Critical services & resources
Product: Red Hat Enterprise Linux 6 Reporter: Lon Hohberger <lhh>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: cluster-maint, edamato, jwest, pvn, ssaha, tao, thunt
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rgmanager-3.0.12-11.el6 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 605733
: 674710 (view as bug list) Environment:
Last Closed: 2011-05-19 14:18:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 634298    
Bug Blocks: 655920, 674710    

Description Lon Hohberger 2010-09-15 17:43:03 UTC
+++ This bug was initially created as a clone of Bug #605733 +++

Description of problem:

RHCS only defines three recovery options for a failed process:-
- Disable
- Restart and relocate if restart fails
- Relocate

There is not a "Restart but do not relocate" option.

The use case is a configuration running multiple custom/flaky applications using the same storage and IP address. If an individual application fails, the customer wants to attempt restart(s), but if the restart of a individual application fails, there is absolutely no point in relocating, because it's unlikely fix the problem and just mess up the other applications running on the same box.

--- Additional comment from lhh on 2010-09-15 13:29:19 EDT ---

There are three main components:

1) a restart-disable policy on the whole service which interacts
   with the existing max-restarts / restart-expire-time
2) non-critical independent subtrees: 
   - the ability to let designated resources fail
   - the ability to recover these resources
3) restart threshold policies on independent subtrees
   - the ability to define max-restarts / restart-expire-time
     on a per subtree basis
   - operation with normal independent subtrees:
     service goes into recovery when threshold is exceeded
   - operation with non-critical independent subtrees: 
     disable subtree when threshold is exceeded

Comment 1 Lon Hohberger 2010-09-15 17:51:14 UTC
*** Bug 493660 has been marked as a duplicate of this bug. ***

Comment 2 Lon Hohberger 2010-12-01 19:30:41 UTC
There are 17 patches in STABLE31 addressing this issue.

Comment 5 Lon Hohberger 2011-02-01 18:13:48 UTC
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=16ab187d7733c653dddc3e1b9cd90524ccdf8947
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=ca924c428bbf149531f896b52c9ba6f1597c634b
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=5203d9eefe530a13525dc32d9f48568fbabfd495
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=0e5e14cd1471464edf14776bd7ac84d14623a03d
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=5907340776e360b327642f24f7ace0ae812b7a81
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=b631ffdb818f7cf3512840dd99b8844aa230b03d
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=abe50ac2721ec8124aa2a614c2a0a05e4cfa3ad7
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=57232d8ad1dde6927a7d8cd267d1f3813e2bf0ca
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=c5db095bea06e76e021577bd56d2658f90ebbecc
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=109a4f729592e2f9039ec369df440cbb21a078c7
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=c2fa8fe7c8f2a3cbf1023a170f3f78a8de559b7a
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=3b03b46fe7d3c7d747db9a2b7721cc56aef458f2
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=4e2261f72411aae2604d9d3b771f221b11ef4b6b
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=d11004237d32ef094cf515e9215be2430723dacf
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=0ffd512aea6ed74ed0127284d0112bcbffa33061
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=c7187470032d2ab7c32a6a3ae43a358e1a99656b
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=d100504de6eff5f83dba79319ad0bd560f7a57df

Comment 8 Lon Hohberger 2011-02-11 19:03:36 UTC
How it works, and what to expect:

https://bugzilla.redhat.com/show_bug.cgi?id=605733#c14

Comment 10 errata-xmlrpc 2011-05-19 14:18:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0750.html