Bug 902922 - need to disable multicast_snooping on bridges
Summary: need to disable multicast_snooping on bridges
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-22 18:11 UTC by Brian J. Murrell
Modified: 2019-07-11 07:38 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1199209 (view as bug list)
Environment:
Last Closed: 2016-04-09 22:17:07 UTC
Embargoed:


Attachments (Terms of Use)

Description Brian J. Murrell 2013-01-22 18:11:43 UTC
Description of problem:

Corosync fails to work properly with KVM VMs.

Version-Release number of selected component (if applicable):

0.10.2.2-3.fc18.

How reproducible:

100%

Steps to Reproduce:
1. Build EL6 VMs.
2. Configure them to use corosync
3. Reboot a cluster memeber
4. Observe constant error stream in syslog
  
Actual results:

Corosync fails to manage the node.

Expected results:

Corosync manages the node.

Additional info:

Can be resolved by doing:

# echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping

For the bridge that the VM interface Corosync is using to communicate with.

Comment 1 Curtis Taylor 2013-02-08 19:00:54 UTC
(I don't intend to hijack this BZ.  My intention is to ensure that if multicast_snooping is disabled by default, it's not just a knob that is turned to cover up a real problem with bridge multicast_snooping.) 

I also have noticed that RHCS inside kvm guests can experience multicast issues with snooping enabled.  This happens in both default.xml nat'ed and non-nat'ed bridges with this config:

<network>
  <name>VMNET</name>
  <uuid>11576011-0d4d-78c9-b287-5debd1933daf</uuid>
  <bridge name='VMNET' stp='off' delay='0' />
  <mac address='52:54:00:98:BA:DD'/>
  <ip address='192.168.100.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.100.200' end='192.168.100.254' />
    </dhcp>
  </ip>
</network>

Under a nat'ed bridge the default iptables are so restrictive that iptables alone keeps two guest's omping multicast packets from succeeding.

Under the non-nat'ed bridge config above, and trying to use fence_virtd/fence_xvm, I see ~sporadic~ behavior. Well, it's not actually sporadic, though it at first appears so.  In fact, fence_xvm works until fence_virtd has been running in the host longer than multicast_membership_interval. 

Based on the fact that tcpdump in the guest sees the bridge induced igmp queries but never sees response from the kvm host running fence_virtd unless tcpdump puts the bridge into promiscuous mode or snooping is turned off, it seems as though one problem with bridge snooping is that it is not sending the igmp queries to the host running the bridge.

My observations occur under RHEL6.3.

Comment 2 Brian J. Murrell 2013-03-14 20:48:30 UTC
(In reply to comment #1)
> 
> Under a nat'ed bridge the default iptables are so restrictive that iptables
> alone keeps two guest's omping multicast packets from succeeding.

That's bug 709418.  Unfortunately I have not gotten time to get back to that bug.

Comment 3 Brian J. Murrell 2013-03-14 20:49:37 UTC
Would anyone (i.e. libvirt-maintainers) care to comment on or triage this bug?

Comment 4 Dave Allan 2013-03-15 21:28:59 UTC
This looks like it might be a duplicate of BZ 880035  Does the guidance in that BZ help at all?

Comment 5 Brian J. Murrell 2013-03-15 21:36:59 UTC
It does sound similar yes.  I guess we won't know if the netdev fix posted at the end of that bug solves it or not until we get a kernel containing it.

I will try:

# echo 1 > /sys/class/net/virbr0/bridge/multicast_querier

instead of:

# echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping

as soon as I get an opportunity to test this work environment out again.

Comment 6 Steven Ellis 2014-04-21 23:17:25 UTC
I've had similar issues with running minidlna in a KVM guest where it regularly drops off the network. All hosts are bridged onto br0 with no NAT.

Virtualisation host is RHEL 6.5 fully patched as of last night.
Guests are a mix of RHEL / Debian / Ubuntu and Centos
I have a number of physical clients trying to use dlna to the minidlna server and it keeps dropping off the network

I've just done some quick tests with 

# echo 0 > /sys/class/net/br0/bridge/multicast_snooping

and it appears to work correctly now.

I then reset multicast_snooping and tried changing multicast_querier

# echo 1 > /sys/class/net/virbr0/bridge/multicast_snooping
# echo 1 > /sys/class/net/virbr0/bridge/multicast_querier

Again the clients can now find the server.

I'll leave this in place and see how the devices perform over the next couple of days.

Comment 7 Nicolas Ecarnot 2014-06-11 14:54:09 UTC
One more victim here.
Running a 2(vm) nodes cluster (centos 6.5) in oVirt 3.4.1, this ctdb cluster was running very well since months.
I upgraded last week the hypervisors from centos 6.4 to 6.5, and the guests were yum upgraded recently.

Since then, the totem+corosync layer began to wobble, and googling around lead me to echo 0 to /sys/class/net/[blahblahblah]/bridge/multicast_snooping.

I found this work around just a couples of hours ago, but since, I see no more issue. So this sounds a good workaround.
Though I would be glad to know more, and avoid having to use a _workaround_...

Comment 8 dhyan mishra 2014-11-24 20:36:13 UTC
We are setting up a lot of Redhat VMs that support IPv6 services. Without this command for IGMP snooping disabled, the IPv6 services are not reachable on VMs that are using Bridged interfaces.

This is repeatable. 

Shouldn't there be an exception for passing IPv6 neighbor tables from the bridge to the VM?  The guest VM can access any IPv6 hosts, but clients cannot reach the IPv6 VM, unless the VM ping6 the client first. 

This needs to be addresses in new RedHat releases. We have multiple bridge interfaces and multiple VM servers, and running the command on all our servers does not seem like the solution:


echo >> 0 /sys/devices/virtual/net/brXXXX/bridge/multicast_snooping


Please address this. Thank you.

Comment 9 dhyan mishra 2014-11-25 19:20:42 UTC
Redhat KVM claims to fully support IPv6. KVM works with bridge interfaces for normal deployment. The below link describes the issue in further detail.

https://www.v13.gr/blog/?p=378


Again, without this workaround, RH KVM cannot run IPv6 services.

echo >> 0 /sys/devices/virtual/net/brXXXX/bridge/multicast_snooping


However, disabling multicast_snooping may cause other issues. It would be better to fix the issue instead of using a work around.

Comment 10 Robert McSwain 2015-01-23 15:56:53 UTC
Looking into the knowledge base turns up https://access.redhat.com/solutions/784373 and while this works as a workaround on boot, having a fix for this that doesn't require the disabling pf multicast_snooping would be ideal for getting IPv6 connectivity to the Guest VMs.

Comment 11 Robert McSwain 2015-02-23 15:21:14 UTC
Hello,

Are there any updates on this? Is the workaround mentioned for startup is 
actually correct? Thanks!

Comment 12 Robert McSwain 2015-03-02 14:54:48 UTC
Hello,

Is anyone available to pick this up and have a look? Thank you.

Comment 13 Ján Tomko 2015-03-02 15:59:05 UTC
Hello,

This bug is filed against upstream libvirt (Product: Virtualization Tools).
For issues in RHEL, a bug against Product: Red Hat Enterprise Linux would be more accurate and get higher visibility.

Comment 14 Robert McSwain 2015-03-05 16:25:41 UTC
I'll re-open this bug against RHEL and get some traction there as well as confirm this customer hit the issue upstream as well. Otherwise we can close this out. Thanks, Ján!

Comment 15 Cole Robinson 2016-04-09 22:17:07 UTC
A bug was opened against RHEL, which ended up with a kernel.git commit:

commit 47cc84ce0c2fe75c99ea5963c4b5704dd78ead54
Author: Thadeu Lima de Souza Cascardo <cascardo>
Date:   Fri May 22 12:18:59 2015 -0300

    bridge: fix parsing of MLDv2 reports


Which is in kernel 4.1 and newer. Since that appears to be the root issue, closing. If anyone is still hitting issues with latest libvirt and new kernels, I suggest filing a new bug (but feel free to reference this issue)


Note You need to log in before you can comment on or make changes to this bug.