Red Hat Bugzilla – Bug 861135
Cluster Traffic does not send IGMP joins
Last modified: 2013-09-06 14:04:02 EDT
Description of problem:
Clusters do not build properly when plugged into a Cisco Nexus 7000 because multicast packets are not routed without an IGMP Join being sent. Changing the cluster to use broadcast instead of multicast allows the cluster to work as designed, but only allows one cluster per subnet. Information from my Networking person:
Problem Description: Multiple servers attached to four separate N7Ks that interact with each other through multicast. These servers are currently not communicating between the N7Ks creating a "dual-brain" scenario. PIM is not configured on the the N7Ks. Instead we are using a IGMP snooping queriers that traverse a fabricpath VDC to communicate with the other N7Ks.
On inspection of the N7Ks show commands, the correct switch, AB1ServSwitch1a, has been elected as the primary IGMP Snooping querier.
Looking at the packet capture, Multicast traffic was being sent from the Linux cluster devices. However, there was little or no IGMP traffic on each packet capture we did. As such, the switches were unaware of which ports to send multicast traffic, let alone, which switches. To confirm it was a problem with IGMP Joins, and not multicast itself, Cisco had me set a static Multicast path on AB1ServSwitch3a and 3b. The linux cluster on that path was able to come up properly after this.
Cisco's explanation for why clusters work before switching from the 4500s and why they don't work now is that Any multicast traffic on the Catalyst platform is treated as an implicit IGMP Join, regardless of whether or not a join is actually received by the switch. However with the N7Ks, the joins must be explicit, and the joins must come from the same source address as the multicast traffic.
The TAC number for this is 621910821
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Split brained/non-quorate cluster
Quorate functional cluster
OpenAIS is correctly calling setsockopt (sockets->mcast_recv, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof (mreq)); Kernel then should take care to create IGMP_JOIN message. If it is not, there may be problem ether in glibc or kernel.
How did you found out, that IGMP Join is not called?
I started the cluster services and our networking people watched the packets at the switch, never seeing an IGMP Join. We also used tcpdump/wireshark to try and find it.
Using a Cisco Catalyst switch the cluster goes quorate without any issues (according to Cisco because those switches treat any Mutlicast traffic as an implicit join). Swapping to the N7Ks leads to an unquorate cluster.
I've moved on from that job (and now work for Red Hat) but I know this is still an issue. I'll try and get one of my ex-coworkers copied on the bug.
IGMP join is handled by kernel. You can see the kernel info in /proc/net/igmp. Displaying this file would be helpful to determine if kernel saw the openais request to add membership.
Here are the contents of /proc/net/igmp from one of the clusters that split-brains.
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 0 V3
010000E0 1 0:00000000 0
5 bond0 : 7 V3
2E16C0EF 1 0:00000000 0
FB0000E0 1 0:00000000 0
010000E0 1 0:00000000 0
I would expect if you kill openais you will see one of the querier entries disappear.
I would recommend running this through Red Hat's GSS, since they have a better handle on how to properly configure your switches. My guess is that you have IGMP timeouts (may not be proper term) turned on in the switch, and the Cisco switch is dropping the IGMP data from its querier tables after the timeout.
I can certainly put in a ticket, but I work for an educational institution with only self-support. Unless it's something on Satellite (Satellite's not clustered, no issue there) I believe we're on our own.
One extra comment. Corosync uses ASM (Any Source Multicast). Maybe Cisco (or your Cisco configuration) likes SSM more. Sadly, we don't support SSM in OpenAIS/Corosync.
You can test that by trying omping (should be in EPEL). If (with -M ssm option) you will get expected behavior, we will know source of your problem (sadly, not a solution).
I don't know if it matters, but we're not using openais or corosync. I'm trying omping this morning.
Oh, sorry. We're using openais, but the cluster I'm testing on is currently not running.
We're not using corosync, though. It doesn't even seem to be in the RHEL5 repos.
(In reply to comment #9)
> Oh, sorry. We're using openais, but the cluster I'm testing on is currently
> not running.
> We're not using corosync, though. It doesn't even seem to be in the RHEL5
Ya, sorry for confusing you. Actually, wherever I will tell corosync, I mean openais. Transport code is almost same in both openais and corosync. And cluster (as cman) just executes openais (aisexec) or corosync (depending on RHEL version).