Bug 741345

Summary:	Cannot set manual multicasting address for CMAN
Product:	Red Hat Enterprise Linux 6	Reporter:	Matt <matthew.painter>
Component:	cluster	Assignee:	Fabio Massimo Di Nitto <fdinitto>
Status:	CLOSED DUPLICATE	QA Contact:	Cluster QE <mspqa-list>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	ccaulfie, cluster-maint, lhh, rpeterso, teigland
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-09-26 18:00:36 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Matt 2011-09-26 16:11:31 UTC

Description of problem:

I have been trying to set up a cluster of 3 on 6.1 using a cisco switch, and therefore a fixed multicast address - 239.192.15.224 in this case.
 
All the docs etc. say to add to the cluster.conf:
 
        <cman>
                <multicast addr="239.192.15.224"/>
        </cman>
 
This seems to work and a cman_tool status brings back the correct multicast address, but has a Quorum status of "Activity Blocked", because the cluster nodes never join and you have clusters of 1 forming.
 
*However* if I manually run "cman_tool leave" and then "cman_tool join -m 239.192.15.224", the cluster forms.

Version-Release number of selected component (if applicable):

6.1

How reproducible:

Every time.


Steps to Reproduce:
1. Set manual multicasting address.
2. service cman restart - will timeout waiting for quorum
3. on 2+ nodes: 
   * cman_tool leave 
   * cman_tool join -m <addr>
  
Actual results:

Cluster does not form, timeouts waiting for quorum.

Expected results:

Should form a cluster without manual intervention.

Additional info:

Comment 2 Fabio Massimo Di Nitto 2011-09-26 17:09:48 UTC

Please provide full cluster.conf and collect /var/logs/cluster from all nodes when the failure appear.

This looks very similar to bugzilla 720100 that has been fixed in 6.1 updates already.

Also check that you are running latest updates.

Comment 3 Matt 2011-09-26 17:52:20 UTC

Agreed, this looks like the same issue as 720100.

However, I am running:

cman-3.0.12-41.el6.x86_64

And looking at the bug for 720100, it looks like the fix should have been rolled in here?

If you let me know what the ttl change in cluster.conf should be, I can test and let you know if it is a dupe?

Thanks

Comment 4 Matt 2011-09-26 17:57:07 UTC

I have changed my cluster.conf to have <multicast ... ttl="255/>, and verified that it is the same issue as 720100.

Comment 5 Fabio Massimo Di Nitto 2011-09-26 17:59:37 UTC

(In reply to comment #3)
> Agreed, this looks like the same issue as 720100.
> 
> However, I am running:
> 
> cman-3.0.12-41.el6.x86_64
> 
> And looking at the bug for 720100, it looks like the fix should have been
> rolled in here?

cluster-3.0.12-41.el6_1.1

this one has the fix. your version does not.

> 
> If you let me know what the ttl change in cluster.conf should be, I can test
> and let you know if it is a dupe?


You can cross check the package changelogs too.

If this is not the problem, I will still need your logs and full cluster.conf

Comment 6 Fabio Massimo Di Nitto 2011-09-26 18:00:36 UTC

(In reply to comment #4)
> I have changed my cluster.conf to have <multicast ... ttl="255/>, and verified
> that it is the same issue as 720100.

Ok sorry your comment arrived while I was writing mine.

Your packages are not updated. See my previous comment.

*** This bug has been marked as a duplicate of bug 720100 ***