822104 – rgmanager misses cluster.conf updates

Bug 822104 - rgmanager misses cluster.conf updates

Summary: rgmanager misses cluster.conf updates

Keywords:
Status:	CLOSED DUPLICATE of bug 889098
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rgmanager
Sub Component:
Version:	5.8
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ryan McCabe
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-16 10:49 UTC by Klaus Steinberger
Modified:	2013-06-13 20:13 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-08-17 16:03:13 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rgmanager Dump from node gar-ha-xen01 (53.03 KB, application/octet-stream) 2012-05-22 06:57 UTC, Klaus Steinberger	no flags	Details
rgmanager dump from node gar-ha-xen02 (55.06 KB, application/octet-stream) 2012-05-22 06:58 UTC, Klaus Steinberger	no flags	Details
rgmanager dump from node gar-ha-xen03 (54.12 KB, application/octet-stream) 2012-05-22 06:58 UTC, Klaus Steinberger	no flags	Details
View All

Description Klaus Steinberger 2012-05-16 10:49:07 UTC

Description of problem:
when we add or remove a vm Service to cluster.conf and propose that changed cluster.conf through ccs_tool update, only the node on which we started ccs_tool recognize the changes in the services.

all other clusternode show the old list of services. The only way we found was to migrate all services out of the cluster node, and then restart rgmanager.

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-28.el5.x86_64

How reproducible:
every time we change cluster.conf
one more hint: this bug appears since 5.7


Steps to Reproduce:
1. change service in cluster.conf
2. propagate by "ccs_tool update /etc/cluster/cluster.conf"
3. check with clustat on all nodes, only the node on which ccs_tool
   was run takes the change. Further changes to cluster.conf will not
   help, even if it was done on other node.
  
Actual results:


Expected results:


Additional info:

Comment 1 Lon Hohberger 2012-05-21 21:44:53 UTC

You'll need to collect an rgmanager dump from all nodes and add them here.  Information on how to do it is contained in the following knowledge base article:

https://access.redhat.com/knowledge/solutions/65620

Comment 2 Lon Hohberger 2012-05-21 21:46:07 UTC

Note that if rgmanager ends up waiting for fencing to complete, it will not process configuration changes.

Comment 3 RHEL Program Management 2012-05-21 21:59:11 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 4 Klaus Steinberger 2012-05-22 06:57:02 UTC

Created attachment 585938 [details]
rgmanager Dump from node gar-ha-xen01

Comment 5 Klaus Steinberger 2012-05-22 06:58:05 UTC

Created attachment 585939 [details]
rgmanager dump from node gar-ha-xen02

Comment 6 Klaus Steinberger 2012-05-22 06:58:58 UTC

Created attachment 585940 [details]
rgmanager dump from node gar-ha-xen03

Comment 7 Klaus Steinberger 2012-05-22 07:01:19 UTC

(In reply to comment #2)
> Note that if rgmanager ends up waiting for fencing to complete, it will not
> process configuration changes.

Ok, that could be possible, it could be related to: bz#822134

But:  Why does rgmanager work on the node calling ccs_tool, but not on the others?

Comment 8 Lon Hohberger 2012-05-22 20:19:53 UTC

The two nodes are blocked in _event_thread_f.  Do you also have logs from that specific incident?

Comment 9 Klaus Steinberger 2012-05-23 04:48:26 UTC

(In reply to comment #8)
> The two nodes are blocked in _event_thread_f.  Do you also have logs from
> that specific incident?

I have no idea what _event_thread_f is, and how it is triggered.

Comment 10 Lon Hohberger 2012-05-23 17:37:12 UTC

_event_thread_f dispatches event handlers.  The only thing that causes it to block is fencing (or should be).

The hang could be related to the other bug, but might not be.

You need to file a ticket with Red Hat Support and include your sosreports and so forth - preferably taken from a cluster in the 'broken' state.  The sosreports and/or cores should have enough information to triage the issue.

If you have already filed a ticket, please indicate it here.

Comment 11 Klaus Steinberger 2012-05-23 18:52:35 UTC

too bad, I have a contract for a RHEL 6.2 cluster, but not for the 5.8 Cluster which runs on SL. I'm in the process migrating the clusters to 6.2 but it needs some more time.

Comment 12 Ryan McCabe 2012-07-20 14:12:17 UTC


*** This bug has been marked as a duplicate of bug 822134 ***

Comment 13 Ryan McCabe 2012-07-31 16:28:35 UTC

Reopening.

I was able to reproduce this by running clusvcadm -r <svc> on two services in a loop, then updating the configuration while those were running.

Comment 17 Ryan McCabe 2013-06-13 20:13:31 UTC


*** This bug has been marked as a duplicate of bug 889098 ***

Note You need to log in before you can comment on or make changes to this bug.