Bug 192117

Summary: service only starts if added on the node for which it is default
Product: [Retired] Red Hat Cluster Suite Reporter: Lenny Maiorani <lenny>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0149 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-10 21:16:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix (as real patch) none

Description Lenny Maiorani 2006-05-17 17:57:25 UTC
Description of problem:
When a new VIP is added by adding it to /etc/cluster/cluster.conf and running
'ccs_tool update' and 'cman_tool version' the VIP is only started if this was
done on the node for which the VIP has the same domain. 

Example: 
Add this line to the /etc/cluster/cluster.conf resources section:
<ip address="10.250.1.93/16" monitor_link="1"/>

Add this service:
<service autostart="1" domain="node2" name="10.250.1.93">
        <ip ref="10.250.1.93/16"/>
</service>

And update the configuration version number.

If all this is done on node1, the service will not be started. However, if it is
done on node2 it will be started.


Version-Release number of selected component (if applicable):
1.9.46-1.3speed (patch listed on bug 182454)

How reproducible:
Always


Actual results:
Service doesn't get started

Expected results:
Service starts

Comment 1 Lenny Maiorani 2006-05-22 23:53:16 UTC
I have found some other problems with this version of the rgmanager. These are
much more important, but look related.

When a node panics, the VIPs on that node do not fail-over to other nodes and
they say the node they are on is "unknown". 

During a graceful shutdown, the VIPs appear to stay on the node which went down
but they are in Stopped state and have parens around the node name. 

After a graceful shutdown, bringing the node back online it only activates the
VIPs which it owns by default.

Should the changes around ownership of VIPs be taken out of this patch? Is there
a bug in one of those checks?

Comment 2 Lon Hohberger 2006-07-20 17:14:41 UTC
This is the one we worked through, right?

Does this happen on the 1.4.2x or the current beta bits in the RHN channel?

Comment 3 Lenny Maiorani 2006-07-20 19:09:20 UTC
I saw some additional situations:
When VIPs are first added to /etc/cluster/cluster.conf when the cluster service
is up, some (many) VIPs are not get started initially. But, the weird things are
that they will get started if you remove them from the file and add them back
with the exact same approach. 
Even more, you can get them started by just remove just ONE of them and add it back.

In short, the bug is still valid in rgmanager-1.9.46-U4pre1 you gave me.


Comment 4 dex chen 2006-07-20 19:38:16 UTC
When I take a closer look at this issue, I found the "ip.sh" script is not 
invoked when I run ccs_tool update /etc/clsuter/cluster.conf to update the new 
added VIP services. The end result of this is that the VIPs are not assigned to 
any physical interfaces.

Comment 5 Lon Hohberger 2006-09-06 18:04:42 UTC
I wonder if this has been fixed in U4?

Comment 7 Lon Hohberger 2006-10-24 18:26:38 UTC
I think I know what this is, and no, it's not fixed in U4.

Comment 8 Lon Hohberger 2006-11-17 15:49:45 UTC
This fix from head should fix it:

diff -u -r1.24 -r1.25
--- cluster/rgmanager/src/daemons/groups.c	2006/10/06 21:22:27	1.24
+++ cluster/rgmanager/src/daemons/groups.c	2006/10/23 22:47:01	1.25
@@ -1090,8 +1093,20 @@
 		if (curr->rn_resource->r_flags & RF_NEEDSTART)
 			need_init = 1;
 
-		if (get_rg_state_local(rg, &svcblk) < 0)
-			continue;
+		if (!need_init) {
+			if (get_rg_state_local(rg, &svcblk) < 0)
+				continue;
+		} else {
+			if (rg_lock(rg, &lockp) != 0)
+				continue;
+
+			if (get_rg_state(rg, &svcblk) < 0) {
+				rg_unlock(&lockp);
+				continue;
+			}
+
+			rg_unlock(&lockp);
+		}
 
 		if (!need_init && svcblk.rs_owner != my_id())
 			continue;

Comment 9 Lenny Maiorani 2006-11-17 17:24:40 UTC
Yes, this has fixed my problems. I have changed it slightly to retro-fit RHEL4U4...

diff -u -r1.24 -r1.25
--- cluster/rgmanager/src/daemons/groups.c	2006/10/06 21:22:27	1.24
+++ cluster/rgmanager/src/daemons/groups.c	2006/10/23 22:47:01	1.25
@@ -1090,8 +1093,20 @@
 		if (curr->rn_resource->r_flags & RF_NEEDSTART)
 			need_init = 1;
 
-		if (get_rg_state_local(name, &svcblk) < 0)
-			continue;
+		if (!need_init) {
+			if (get_rg_state_local(name, &svcblk) < 0)
+				continue;
+		} else {
+			if (rg_lock(name, &lockp) != 0)
+				continue;
+
+			if (get_rg_state(name, &svcblk) < 0) {
+				rg_unlock(name, lockp);
+				continue;
+			}
+
+			rg_unlock(name, lockp);
+		}
 
 		if (!need_init && svcblk.rs_owner != my_id())
 			continue;

Comment 10 Lon Hohberger 2006-11-27 22:21:53 UTC
Created attachment 142234 [details]
Fix (as real patch)

Comment 13 Red Hat Bugzilla 2007-05-10 21:16:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html