Bug 861135

Summary: Cluster Traffic does not send IGMP joins
Product: Red Hat Enterprise Linux 5 Reporter: Rob Marti <robmartiwork>
Component: openaisAssignee: Jan Friesse <jfriesse>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.8CC: cluster-maint, edamato, mppz3wzs7k, rmarti, sdake
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-06 18:04:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rob Marti 2012-09-27 15:38:30 UTC
Description of problem:
Clusters do not build properly when plugged into a Cisco Nexus 7000 because multicast packets are not routed without an IGMP Join being sent.  Changing the cluster to use broadcast instead of multicast allows the cluster to work as designed, but only allows one cluster per subnet.  Information from my Networking person:

Problem Description: Multiple servers attached to four separate N7Ks that interact with each other through multicast. These servers are currently not communicating between the N7Ks creating a "dual-brain" scenario. PIM is not configured on the the N7Ks. Instead we are using a IGMP snooping queriers that traverse a fabricpath VDC to communicate with the other N7Ks.

On inspection of the N7Ks show commands, the correct switch, AB1ServSwitch1a, has been elected as the primary IGMP Snooping querier.

Looking at the packet capture, Multicast traffic was being sent from the Linux cluster devices. However, there was little or no IGMP traffic on each packet capture we did. As such, the switches were unaware of which ports to send multicast traffic, let alone, which switches. To confirm it was a problem with IGMP Joins, and not multicast itself, Cisco had me set a static Multicast path on AB1ServSwitch3a and 3b. The linux cluster on that path was able to come up properly after this.

Cisco's explanation for why clusters work before switching from the 4500s and why they don't work now is that Any multicast traffic on the Catalyst platform is treated as an implicit IGMP Join, regardless of whether or not a join is actually received by the switch. However with the N7Ks, the joins must be explicit, and the joins must come from the same source address as the multicast traffic.

The TAC number for this is 621910821


Version-Release number of selected component (if applicable):
cman-2.0.115-96.el5_8.3


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Split brained/non-quorate cluster

Expected results:
Quorate functional cluster

Additional info:

Comment 1 Jan Friesse 2013-02-25 15:06:43 UTC
OpenAIS is correctly calling setsockopt (sockets->mcast_recv, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof (mreq)); Kernel then should take care to create IGMP_JOIN message. If it is not, there may be problem ether in glibc or kernel.

How did you found out, that IGMP Join is not called?

Comment 2 Rob Marti 2013-02-25 15:29:50 UTC
I started the cluster services and our networking people watched the packets at the switch, never seeing an IGMP Join.  We also used tcpdump/wireshark to try and find it.

Using a Cisco Catalyst switch the cluster goes quorate without any issues (according to Cisco because those switches treat any Mutlicast traffic as an implicit join).  Swapping to the N7Ks leads to an unquorate cluster.

I've moved on from that job (and now work for Red Hat) but I know this is still an issue.  I'll try and get one of my ex-coworkers copied on the bug.

Comment 3 Steven Dake 2013-02-25 15:33:05 UTC
IGMP join is handled by kernel.  You can see the kernel info in /proc/net/igmp.  Displaying this file would be helpful to determine if kernel saw the openais request to add membership.

Regards
-steve

Comment 4 Corey Crawford 2013-02-25 18:00:24 UTC
Here are the contents of /proc/net/igmp from one of the clusters that split-brains.
Idx	Device    : Count Querier	Group    Users Timer	Reporter
1	lo        :     0      V3
				010000E0     1 0:00000000		0
5	bond0     :     7      V3
				2E16C0EF     1 0:00000000		0
				FB0000E0     1 0:00000000		0
				010000E0     1 0:00000000		0

Comment 5 Steven Dake 2013-02-25 18:49:15 UTC
Corey,

I would expect if you kill openais you will see one of the querier entries disappear.

I would recommend running this through Red Hat's GSS, since they have a better handle on how to properly configure your switches.  My guess is that you have IGMP timeouts (may not be proper term) turned on in the switch, and the Cisco switch is dropping the IGMP data from its querier tables after the timeout.

Regards
-steve

Comment 6 Corey Crawford 2013-02-25 19:48:54 UTC
Steve,
I can certainly put in a ticket, but I work for an educational institution with only self-support. Unless it's something on Satellite (Satellite's not clustered, no issue there) I believe we're on our own.

Thank you.

Corey Crawford

Comment 7 Jan Friesse 2013-02-26 08:32:24 UTC
One extra comment. Corosync uses ASM (Any Source Multicast). Maybe Cisco (or your Cisco configuration) likes SSM more. Sadly, we don't support SSM in OpenAIS/Corosync.

You can test that by trying omping (should be in EPEL). If (with -M ssm option) you will get expected behavior, we will know source of your problem (sadly, not a solution).

Comment 8 Corey Crawford 2013-02-28 14:15:18 UTC
I don't know if it matters, but we're not using openais or corosync. I'm trying omping this morning.

Corey Crawford

Comment 9 Corey Crawford 2013-02-28 14:17:28 UTC
Oh, sorry. We're using openais, but the cluster I'm testing on is currently not running. 
We're not using corosync, though. It doesn't even seem to be in the RHEL5 repos.

Comment 10 Jan Friesse 2013-03-01 08:46:39 UTC
(In reply to comment #9)
> Oh, sorry. We're using openais, but the cluster I'm testing on is currently
> not running. 
> We're not using corosync, though. It doesn't even seem to be in the RHEL5
> repos.

Ya, sorry for confusing you. Actually, wherever I will tell corosync, I mean openais. Transport code is almost same in both openais and corosync. And cluster (as cman) just executes openais (aisexec) or corosync (depending on RHEL version).