Description of problem: when we add or remove a vm Service to cluster.conf and propose that changed cluster.conf through ccs_tool update, only the node on which we started ccs_tool recognize the changes in the services. all other clusternode show the old list of services. The only way we found was to migrate all services out of the cluster node, and then restart rgmanager. Version-Release number of selected component (if applicable): rgmanager-2.0.52-28.el5.x86_64 How reproducible: every time we change cluster.conf one more hint: this bug appears since 5.7 Steps to Reproduce: 1. change service in cluster.conf 2. propagate by "ccs_tool update /etc/cluster/cluster.conf" 3. check with clustat on all nodes, only the node on which ccs_tool was run takes the change. Further changes to cluster.conf will not help, even if it was done on other node. Actual results: Expected results: Additional info:
You'll need to collect an rgmanager dump from all nodes and add them here. Information on how to do it is contained in the following knowledge base article: https://access.redhat.com/knowledge/solutions/65620
Note that if rgmanager ends up waiting for fencing to complete, it will not process configuration changes.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Created attachment 585938 [details] rgmanager Dump from node gar-ha-xen01
Created attachment 585939 [details] rgmanager dump from node gar-ha-xen02
Created attachment 585940 [details] rgmanager dump from node gar-ha-xen03
(In reply to comment #2) > Note that if rgmanager ends up waiting for fencing to complete, it will not > process configuration changes. Ok, that could be possible, it could be related to: bz#822134 But: Why does rgmanager work on the node calling ccs_tool, but not on the others?
The two nodes are blocked in _event_thread_f. Do you also have logs from that specific incident?
(In reply to comment #8) > The two nodes are blocked in _event_thread_f. Do you also have logs from > that specific incident? I have no idea what _event_thread_f is, and how it is triggered.
_event_thread_f dispatches event handlers. The only thing that causes it to block is fencing (or should be). The hang could be related to the other bug, but might not be. You need to file a ticket with Red Hat Support and include your sosreports and so forth - preferably taken from a cluster in the 'broken' state. The sosreports and/or cores should have enough information to triage the issue. If you have already filed a ticket, please indicate it here.
too bad, I have a contract for a RHEL 6.2 cluster, but not for the 5.8 Cluster which runs on SL. I'm in the process migrating the clusters to 6.2 but it needs some more time.
*** This bug has been marked as a duplicate of bug 822134 ***
Reopening. I was able to reproduce this by running clusvcadm -r <svc> on two services in a loop, then updating the configuration while those were running.
*** This bug has been marked as a duplicate of bug 889098 ***