Bug 192117

Summary:

service only starts if added on the node for which it is default

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Lenny Maiorani <lenny>

Component:

rgmanager

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

cluster-maint

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0149

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-05-10 21:16:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Fix (as real patch)	none

Description Lenny Maiorani 2006-05-17 17:57:25 UTC

Description of problem:
When a new VIP is added by adding it to /etc/cluster/cluster.conf and running
'ccs_tool update' and 'cman_tool version' the VIP is only started if this was
done on the node for which the VIP has the same domain. 

Example: 
Add this line to the /etc/cluster/cluster.conf resources section:
<ip address="10.250.1.93/16" monitor_link="1"/>

Add this service:
<service autostart="1" domain="node2" name="10.250.1.93">
        <ip ref="10.250.1.93/16"/>
</service>

And update the configuration version number.

If all this is done on node1, the service will not be started. However, if it is
done on node2 it will be started.


Version-Release number of selected component (if applicable):
1.9.46-1.3speed (patch listed on bug 182454)

How reproducible:
Always


Actual results:
Service doesn't get started

Expected results:
Service starts

Comment 1 Lenny Maiorani 2006-05-22 23:53:16 UTC

I have found some other problems with this version of the rgmanager. These are
much more important, but look related.

When a node panics, the VIPs on that node do not fail-over to other nodes and
they say the node they are on is "unknown". 

During a graceful shutdown, the VIPs appear to stay on the node which went down
but they are in Stopped state and have parens around the node name. 

After a graceful shutdown, bringing the node back online it only activates the
VIPs which it owns by default.

Should the changes around ownership of VIPs be taken out of this patch? Is there
a bug in one of those checks?

Comment 2 Lon Hohberger 2006-07-20 17:14:41 UTC

This is the one we worked through, right?

Does this happen on the 1.4.2x or the current beta bits in the RHN channel?

Comment 3 Lenny Maiorani 2006-07-20 19:09:20 UTC

I saw some additional situations:
When VIPs are first added to /etc/cluster/cluster.conf when the cluster service
is up, some (many) VIPs are not get started initially. But, the weird things are
that they will get started if you remove them from the file and add them back
with the exact same approach. 
Even more, you can get them started by just remove just ONE of them and add it back.

In short, the bug is still valid in rgmanager-1.9.46-U4pre1 you gave me.

Comment 4 dex chen 2006-07-20 19:38:16 UTC

When I take a closer look at this issue, I found the "ip.sh" script is not 
invoked when I run ccs_tool update /etc/clsuter/cluster.conf to update the new 
added VIP services. The end result of this is that the VIPs are not assigned to 
any physical interfaces.

Comment 5 Lon Hohberger 2006-09-06 18:04:42 UTC

I wonder if this has been fixed in U4?

Comment 7 Lon Hohberger 2006-10-24 18:26:38 UTC

I think I know what this is, and no, it's not fixed in U4.

Comment 8 Lon Hohberger 2006-11-17 15:49:45 UTC

This fix from head should fix it:

diff -u -r1.24 -r1.25
--- cluster/rgmanager/src/daemons/groups.c	2006/10/06 21:22:27	1.24
+++ cluster/rgmanager/src/daemons/groups.c	2006/10/23 22:47:01	1.25
@@ -1090,8 +1093,20 @@
 		if (curr->rn_resource->r_flags & RF_NEEDSTART)
 			need_init = 1;
 
-		if (get_rg_state_local(rg, &svcblk) < 0)
-			continue;
+		if (!need_init) {
+			if (get_rg_state_local(rg, &svcblk) < 0)
+				continue;
+		} else {
+			if (rg_lock(rg, &lockp) != 0)
+				continue;
+
+			if (get_rg_state(rg, &svcblk) < 0) {
+				rg_unlock(&lockp);
+				continue;
+			}
+
+			rg_unlock(&lockp);
+		}
 
 		if (!need_init && svcblk.rs_owner != my_id())
 			continue;

Comment 9 Lenny Maiorani 2006-11-17 17:24:40 UTC

Yes, this has fixed my problems. I have changed it slightly to retro-fit RHEL4U4...

diff -u -r1.24 -r1.25
--- cluster/rgmanager/src/daemons/groups.c	2006/10/06 21:22:27	1.24
+++ cluster/rgmanager/src/daemons/groups.c	2006/10/23 22:47:01	1.25
@@ -1090,8 +1093,20 @@
 		if (curr->rn_resource->r_flags & RF_NEEDSTART)
 			need_init = 1;
 
-		if (get_rg_state_local(name, &svcblk) < 0)
-			continue;
+		if (!need_init) {
+			if (get_rg_state_local(name, &svcblk) < 0)
+				continue;
+		} else {
+			if (rg_lock(name, &lockp) != 0)
+				continue;
+
+			if (get_rg_state(name, &svcblk) < 0) {
+				rg_unlock(name, lockp);
+				continue;
+			}
+
+			rg_unlock(name, lockp);
+		}
 
 		if (!need_init && svcblk.rs_owner != my_id())
 			continue;

Comment 10 Lon Hohberger 2006-11-27 22:21:53 UTC

Created attachment 142234 [details]
Fix (as real patch)

Comment 13 Red Hat Bugzilla 2007-05-10 21:16:43 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html