Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 621018 - Luci does not maintain service restart limit configuration
Luci does not maintain service restart limit configuration
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: conga (Show other bugs)
5.5
x86_64 Linux
low Severity low
: rc
: ---
Assigned To: Ryan McCabe
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-03 21:38 EDT by Alan Staples
Modified: 2011-01-26 09:58 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-26 09:58:25 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alan Staples 2010-08-03 21:38:21 EDT
Description of problem:
The luci administration pages for "service" objects has a field called "Maximum number of restart failures before relocating". Even with a service that is in a failover domain with other nodes, that is configured for a "Relocate" failover policy, luci will accept intput into this field, but saving changes to the service will not commit this particular change.

Version-Release number of selected component (if applicable): luci-0.12.2-12.el5


How reproducible:
Reliably reproducable 

Steps to Reproduce:
1. edit a service that has a failover policy of "Relocate"
2. enter a positive integer into the field "Maximum number of restart failures before relocating"
3. click "save changes"
4. open the service again, or view the /etc/cluster/cluster.conf - the numver of failures allows is still effectively 0
  
Actual results:


Expected results:


Additional info:
My cluster is built within a VMware-server environment. Both cluster nodes are on the same physical VMware server. I am using a virtual private network for cluster communication. I have a QDisk.
Comment 2 Ryan McCabe 2010-11-16 13:56:10 EST
The max_restarts attribute is only relevant when the recovery policy is restart. The UI needs to be fixed to disable these fields when the recovery policy is something other than restart.
Comment 3 Alan Staples 2010-11-16 16:21:06 EST
(In reply to comment #2)
> The max_restarts attribute is only relevant when the recovery policy is
> restart. The UI needs to be fixed to disable these fields when the recovery
> policy is something other than restart.

The luci GUI states "Maximum number of restart failures before relocating", which indicates to me that this should only be valide with a relocate policy actually. That makes sense to me - attempt to restart before relocating the server since restarting may likely fix the problem and relocating is a relatively expensive process.

What you're saying is that this is actually the maximum number of restart attempts for a service before disabling the service group on that particular node?

I can't find reference to this parameter or even the feature in the current Red Hat Cluster Administration Guide.
Comment 4 Ryan McCabe 2010-11-16 17:09:07 EST
What you stated above is correct, to the best of my knowledge: restart X times, then relocate if restart fails each time. I can't find any good documentation, either, but here's a snippet from the rgmanager patch that added the feature, that confirms the explanation above:

+       /* Check restart counter/timer for this resource */
+       if (check_restart(svcName) > 0) {
+               clulog(LOG_NOTICE, "Restart threshold for %s exceeded; "
+                      "attempting to relocate\n", svcName);
+               return handle_relocate_req(svcName, RG_START_RECOVER, -1,
+                                          new_owner);
Comment 5 Lon Hohberger 2011-01-25 19:21:23 EST
Restart counters only apply when you are using the "restart" recovery policy.

Restart recovery policy is per-host, and is zeroed each time the service is relocated - either manually or as a consequence of a failure recovery action.

That is, when "max_restarts" is exceeded within the given "restart_expire_time", rgmanager will relocate the failing service to another host in the cluster, at which point the restart counter is reset.

While this is not Red Hat documentation, it is quite accurate in describing how rgmanager's recovery policies work:

http://sources.redhat.com/cluster/wiki/ServicePolicies

Note You need to log in before you can comment on or make changes to this bug.