Bug 822104 - rgmanager misses cluster.conf updates
rgmanager misses cluster.conf updates
Status: CLOSED DUPLICATE of bug 889098
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
5.8
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Ryan McCabe
Cluster QE
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-16 06:49 EDT by Klaus Steinberger
Modified: 2013-06-13 16:13 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-17 12:03:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rgmanager Dump from node gar-ha-xen01 (53.03 KB, application/octet-stream)
2012-05-22 02:57 EDT, Klaus Steinberger
no flags Details
rgmanager dump from node gar-ha-xen02 (55.06 KB, application/octet-stream)
2012-05-22 02:58 EDT, Klaus Steinberger
no flags Details
rgmanager dump from node gar-ha-xen03 (54.12 KB, application/octet-stream)
2012-05-22 02:58 EDT, Klaus Steinberger
no flags Details

  None (edit)
Description Klaus Steinberger 2012-05-16 06:49:07 EDT
Description of problem:
when we add or remove a vm Service to cluster.conf and propose that changed cluster.conf through ccs_tool update, only the node on which we started ccs_tool recognize the changes in the services.

all other clusternode show the old list of services. The only way we found was to migrate all services out of the cluster node, and then restart rgmanager.

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-28.el5.x86_64

How reproducible:
every time we change cluster.conf
one more hint: this bug appears since 5.7


Steps to Reproduce:
1. change service in cluster.conf
2. propagate by "ccs_tool update /etc/cluster/cluster.conf"
3. check with clustat on all nodes, only the node on which ccs_tool
   was run takes the change. Further changes to cluster.conf will not
   help, even if it was done on other node.
  
Actual results:


Expected results:


Additional info:
Comment 1 Lon Hohberger 2012-05-21 17:44:53 EDT
You'll need to collect an rgmanager dump from all nodes and add them here.  Information on how to do it is contained in the following knowledge base article:

https://access.redhat.com/knowledge/solutions/65620
Comment 2 Lon Hohberger 2012-05-21 17:46:07 EDT
Note that if rgmanager ends up waiting for fencing to complete, it will not process configuration changes.
Comment 3 RHEL Product and Program Management 2012-05-21 17:59:11 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.
Comment 4 Klaus Steinberger 2012-05-22 02:57:02 EDT
Created attachment 585938 [details]
rgmanager Dump from node gar-ha-xen01
Comment 5 Klaus Steinberger 2012-05-22 02:58:05 EDT
Created attachment 585939 [details]
rgmanager dump from node gar-ha-xen02
Comment 6 Klaus Steinberger 2012-05-22 02:58:58 EDT
Created attachment 585940 [details]
rgmanager dump from node gar-ha-xen03
Comment 7 Klaus Steinberger 2012-05-22 03:01:19 EDT
(In reply to comment #2)
> Note that if rgmanager ends up waiting for fencing to complete, it will not
> process configuration changes.

Ok, that could be possible, it could be related to: bz#822134

But:  Why does rgmanager work on the node calling ccs_tool, but not on the others?
Comment 8 Lon Hohberger 2012-05-22 16:19:53 EDT
The two nodes are blocked in _event_thread_f.  Do you also have logs from that specific incident?
Comment 9 Klaus Steinberger 2012-05-23 00:48:26 EDT
(In reply to comment #8)
> The two nodes are blocked in _event_thread_f.  Do you also have logs from
> that specific incident?

I have no idea what _event_thread_f is, and how it is triggered.
Comment 10 Lon Hohberger 2012-05-23 13:37:12 EDT
_event_thread_f dispatches event handlers.  The only thing that causes it to block is fencing (or should be).

The hang could be related to the other bug, but might not be.

You need to file a ticket with Red Hat Support and include your sosreports and so forth - preferably taken from a cluster in the 'broken' state.  The sosreports and/or cores should have enough information to triage the issue.

If you have already filed a ticket, please indicate it here.
Comment 11 Klaus Steinberger 2012-05-23 14:52:35 EDT
too bad, I have a contract for a RHEL 6.2 cluster, but not for the 5.8 Cluster which runs on SL. I'm in the process migrating the clusters to 6.2 but it needs some more time.
Comment 12 Ryan McCabe 2012-07-20 10:12:17 EDT

*** This bug has been marked as a duplicate of bug 822134 ***
Comment 13 Ryan McCabe 2012-07-31 12:28:35 EDT
Reopening.

I was able to reproduce this by running clusvcadm -r <svc> on two services in a loop, then updating the configuration while those were running.
Comment 17 Ryan McCabe 2013-06-13 16:13:31 EDT

*** This bug has been marked as a duplicate of bug 889098 ***

Note You need to log in before you can comment on or make changes to this bug.