Red Hat Bugzilla – Bug 822104
rgmanager misses cluster.conf updates
Last modified: 2013-06-13 16:13:31 EDT
Description of problem:
when we add or remove a vm Service to cluster.conf and propose that changed cluster.conf through ccs_tool update, only the node on which we started ccs_tool recognize the changes in the services.
all other clusternode show the old list of services. The only way we found was to migrate all services out of the cluster node, and then restart rgmanager.
Version-Release number of selected component (if applicable):
every time we change cluster.conf
one more hint: this bug appears since 5.7
Steps to Reproduce:
1. change service in cluster.conf
2. propagate by "ccs_tool update /etc/cluster/cluster.conf"
3. check with clustat on all nodes, only the node on which ccs_tool
was run takes the change. Further changes to cluster.conf will not
help, even if it was done on other node.
You'll need to collect an rgmanager dump from all nodes and add them here. Information on how to do it is contained in the following knowledge base article:
Note that if rgmanager ends up waiting for fencing to complete, it will not process configuration changes.
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release. Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products. This request is not yet committed for inclusion in
Created attachment 585938 [details]
rgmanager Dump from node gar-ha-xen01
Created attachment 585939 [details]
rgmanager dump from node gar-ha-xen02
Created attachment 585940 [details]
rgmanager dump from node gar-ha-xen03
(In reply to comment #2)
> Note that if rgmanager ends up waiting for fencing to complete, it will not
> process configuration changes.
Ok, that could be possible, it could be related to: bz#822134
But: Why does rgmanager work on the node calling ccs_tool, but not on the others?
The two nodes are blocked in _event_thread_f. Do you also have logs from that specific incident?
(In reply to comment #8)
> The two nodes are blocked in _event_thread_f. Do you also have logs from
> that specific incident?
I have no idea what _event_thread_f is, and how it is triggered.
_event_thread_f dispatches event handlers. The only thing that causes it to block is fencing (or should be).
The hang could be related to the other bug, but might not be.
You need to file a ticket with Red Hat Support and include your sosreports and so forth - preferably taken from a cluster in the 'broken' state. The sosreports and/or cores should have enough information to triage the issue.
If you have already filed a ticket, please indicate it here.
too bad, I have a contract for a RHEL 6.2 cluster, but not for the 5.8 Cluster which runs on SL. I'm in the process migrating the clusters to 6.2 but it needs some more time.
*** This bug has been marked as a duplicate of bug 822134 ***
I was able to reproduce this by running clusvcadm -r <svc> on two services in a loop, then updating the configuration while those were running.
*** This bug has been marked as a duplicate of bug 889098 ***