This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 486717 - clusvcadm -e <service> -F handling bugs
clusvcadm -e <service> -F handling bugs
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
5.2
All Linux
low Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
: 486711 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-21 07:15 EST by Yevheniy Demchenko
Modified: 2009-09-02 07:04 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:04:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch. (610 bytes, patch)
2009-02-21 07:15 EST, Yevheniy Demchenko
no flags Details | Diff
patch. (882 bytes, patch)
2009-02-23 09:29 EST, Yevheniy Demchenko
no flags Details | Diff

  None (edit)
Description Yevheniy Demchenko 2009-02-21 07:15:33 EST
Created attachment 332822 [details]
patch.

Description of problem:
Expected behaviour of clusvcadm -e <service> -F is to follow failover domain rules, i.e. try to start a service on the node with the lowest priority first, then, if unsuccessful, try to start in on node with ++priority and so on. Instead, if first node was unable to start the service, clusvcadm falsely reports it was started on some other node in failover domain, but service is not started.
This is caused by putting of service into the "recovering" state by the first node and not dealing with this by following.

Version-Release number of selected component (if applicable):
Bug was initially found in rgmanager-2.0.38, still present in 2.0.46


How reproducible:

Always
Steps to Reproduce:
1. Install 3(at least) nodes cluster (node01,node02,node03)
2. Define restricted failover domain test_domain (node01 - priority 1, node02 - priority 2)
3. Define a service test_service for this failover domain
4. Make a service unable to start on node01 (unmount fs, unistall service etc.)
5. Try "clusvcadm -e service:test_service -F" on node01
  
Actual results:
clusvcadm reports that service was started on node02, but it's a lie.

Expected results:
service is started on node02

Additional info:
On node01:
<error>  Starting Service test_service > Failed
[30964] notice: start on service "test_service" returned 1 (generic error)
[30964] warning: #68: Failed to start service:test_service; return value: 1
[30964] notice: Stopping service service:test_service
<debug>  Verifying Configuration Of service:test_service
<info>   Stopping Service service:test_service
<error>  Monitoring Service service:test_service > Service Is Not Running
<info>   Stopping Service service:test_service > Succeed
[30964] notice: Service service:test_service is recovering
[30964] debug: Sent remote-start request to 2
[31038] debug: 1 events processed

On node02:
[29205] debug: Not starting service:test_service: recovery state
[29204] debug: 1 events processed

Attached patch makes clurgmgrd behave as expected. 
diff -U 3 -r ./rgmanager-2.0.38.orig/src/daemons/rg_state.c ./rgmanager-2.0.38/src/daemons/rg_state.c
--- ./rgmanager-2.0.38.orig/src/daemons/rg_state.c	2008-03-27 21:12:36.000000000 +0100
+++ ./rgmanager-2.0.38/src/daemons/rg_state.c	2009-02-19 02:24:00.000000000 +0100
@@ -2061,7 +2061,11 @@
 			ret = RG_EFAIL;
 			goto out;
 		} else {
-			ret = svc_start_remote(svcName, RG_START_REMOTE, target);
+			if (request == RG_ENABLE) {
+			    ret = svc_start_remote(svcName, RG_START_RECOVER, target);
+			} else {
+			    ret = svc_start_remote(svcName, RG_START_REMOTE, target);
+			}
 		}
 
 		switch(ret) {
Comment 1 Yevheniy Demchenko 2009-02-23 09:26:17 EST
Proposed patch still doesn't work under certain circumstances. Here is a revised one:
diff -U 3 -r ./rgmanager-2.0.38.orig/src/daemons/rg_state.c ./rgmanager-2.0.38/src/daemons/rg_state.c
--- ./rgmanager-2.0.38.orig/src/daemons/rg_state.c      2008-03-27 21:12:36.000000000 +0100
+++ ./rgmanager-2.0.38/src/daemons/rg_state.c   2009-02-23 15:04:00.000000000 +0100
@@ -2054,14 +2054,14 @@
                target = best_target_node(allowed_nodes, 0,
                                          svcName, 1);
                if (target == me) {
-                       ret = handle_start_remote_req(svcName, request);
+                       ret = handle_start_remote_req(svcName, (request==RG_ENABLE?RG_START_RECOVER:request));
                        if (ret == RG_EAGAIN)
                                goto out;
               } else if (target < 0) {
                        ret = RG_EFAIL;
                        goto out;
                } else {
-                       ret = svc_start_remote(svcName, RG_START_REMOTE, target);
+                           ret = svc_start_remote(svcName, (request==RG_ENABLE?RG_START_RECOVER:RG_START_REMOTE), target);
                }

                switch(ret) {
Comment 2 Yevheniy Demchenko 2009-02-23 09:29:10 EST
Created attachment 332934 [details]
patch.
Comment 3 Lon Hohberger 2009-02-26 15:16:26 EST
*** Bug 486711 has been marked as a duplicate of this bug. ***
Comment 4 Lon Hohberger 2009-02-27 10:10:19 EST
http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=c0de9cfb54b5e3a8e0de4b95ae80d2ce5dae4aae

Pushed to RHEL5 / master / STABLE2 / STABLE3
Comment 7 errata-xmlrpc 2009-09-02 07:04:53 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1339.html

Note You need to log in before you can comment on or make changes to this bug.