Bug 192117
Summary: | service only starts if added on the node for which it is default | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Lenny Maiorani <lenny> | ||||
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | cluster-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2007-0149 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-05-10 21:16:43 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Lenny Maiorani
2006-05-17 17:57:25 UTC
I have found some other problems with this version of the rgmanager. These are much more important, but look related. When a node panics, the VIPs on that node do not fail-over to other nodes and they say the node they are on is "unknown". During a graceful shutdown, the VIPs appear to stay on the node which went down but they are in Stopped state and have parens around the node name. After a graceful shutdown, bringing the node back online it only activates the VIPs which it owns by default. Should the changes around ownership of VIPs be taken out of this patch? Is there a bug in one of those checks? This is the one we worked through, right? Does this happen on the 1.4.2x or the current beta bits in the RHN channel? I saw some additional situations: When VIPs are first added to /etc/cluster/cluster.conf when the cluster service is up, some (many) VIPs are not get started initially. But, the weird things are that they will get started if you remove them from the file and add them back with the exact same approach. Even more, you can get them started by just remove just ONE of them and add it back. In short, the bug is still valid in rgmanager-1.9.46-U4pre1 you gave me. When I take a closer look at this issue, I found the "ip.sh" script is not invoked when I run ccs_tool update /etc/clsuter/cluster.conf to update the new added VIP services. The end result of this is that the VIPs are not assigned to any physical interfaces. I wonder if this has been fixed in U4? I think I know what this is, and no, it's not fixed in U4. This fix from head should fix it: diff -u -r1.24 -r1.25 --- cluster/rgmanager/src/daemons/groups.c 2006/10/06 21:22:27 1.24 +++ cluster/rgmanager/src/daemons/groups.c 2006/10/23 22:47:01 1.25 @@ -1090,8 +1093,20 @@ if (curr->rn_resource->r_flags & RF_NEEDSTART) need_init = 1; - if (get_rg_state_local(rg, &svcblk) < 0) - continue; + if (!need_init) { + if (get_rg_state_local(rg, &svcblk) < 0) + continue; + } else { + if (rg_lock(rg, &lockp) != 0) + continue; + + if (get_rg_state(rg, &svcblk) < 0) { + rg_unlock(&lockp); + continue; + } + + rg_unlock(&lockp); + } if (!need_init && svcblk.rs_owner != my_id()) continue; Yes, this has fixed my problems. I have changed it slightly to retro-fit RHEL4U4... diff -u -r1.24 -r1.25 --- cluster/rgmanager/src/daemons/groups.c 2006/10/06 21:22:27 1.24 +++ cluster/rgmanager/src/daemons/groups.c 2006/10/23 22:47:01 1.25 @@ -1090,8 +1093,20 @@ if (curr->rn_resource->r_flags & RF_NEEDSTART) need_init = 1; - if (get_rg_state_local(name, &svcblk) < 0) - continue; + if (!need_init) { + if (get_rg_state_local(name, &svcblk) < 0) + continue; + } else { + if (rg_lock(name, &lockp) != 0) + continue; + + if (get_rg_state(name, &svcblk) < 0) { + rg_unlock(name, lockp); + continue; + } + + rg_unlock(name, lockp); + } if (!need_init && svcblk.rs_owner != my_id()) continue; Created attachment 142234 [details]
Fix (as real patch)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0149.html |