Description of problem: Corosync fails to work properly with KVM VMs. Version-Release number of selected component (if applicable): 0.10.2.2-3.fc18. How reproducible: 100% Steps to Reproduce: 1. Build EL6 VMs. 2. Configure them to use corosync 3. Reboot a cluster memeber 4. Observe constant error stream in syslog Actual results: Corosync fails to manage the node. Expected results: Corosync manages the node. Additional info: Can be resolved by doing: # echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping For the bridge that the VM interface Corosync is using to communicate with.
(I don't intend to hijack this BZ. My intention is to ensure that if multicast_snooping is disabled by default, it's not just a knob that is turned to cover up a real problem with bridge multicast_snooping.) I also have noticed that RHCS inside kvm guests can experience multicast issues with snooping enabled. This happens in both default.xml nat'ed and non-nat'ed bridges with this config: <network> <name>VMNET</name> <uuid>11576011-0d4d-78c9-b287-5debd1933daf</uuid> <bridge name='VMNET' stp='off' delay='0' /> <mac address='52:54:00:98:BA:DD'/> <ip address='192.168.100.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.100.200' end='192.168.100.254' /> </dhcp> </ip> </network> Under a nat'ed bridge the default iptables are so restrictive that iptables alone keeps two guest's omping multicast packets from succeeding. Under the non-nat'ed bridge config above, and trying to use fence_virtd/fence_xvm, I see ~sporadic~ behavior. Well, it's not actually sporadic, though it at first appears so. In fact, fence_xvm works until fence_virtd has been running in the host longer than multicast_membership_interval. Based on the fact that tcpdump in the guest sees the bridge induced igmp queries but never sees response from the kvm host running fence_virtd unless tcpdump puts the bridge into promiscuous mode or snooping is turned off, it seems as though one problem with bridge snooping is that it is not sending the igmp queries to the host running the bridge. My observations occur under RHEL6.3.
(In reply to comment #1) > > Under a nat'ed bridge the default iptables are so restrictive that iptables > alone keeps two guest's omping multicast packets from succeeding. That's bug 709418. Unfortunately I have not gotten time to get back to that bug.
Would anyone (i.e. libvirt-maintainers) care to comment on or triage this bug?
This looks like it might be a duplicate of BZ 880035 Does the guidance in that BZ help at all?
It does sound similar yes. I guess we won't know if the netdev fix posted at the end of that bug solves it or not until we get a kernel containing it. I will try: # echo 1 > /sys/class/net/virbr0/bridge/multicast_querier instead of: # echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping as soon as I get an opportunity to test this work environment out again.
I've had similar issues with running minidlna in a KVM guest where it regularly drops off the network. All hosts are bridged onto br0 with no NAT. Virtualisation host is RHEL 6.5 fully patched as of last night. Guests are a mix of RHEL / Debian / Ubuntu and Centos I have a number of physical clients trying to use dlna to the minidlna server and it keeps dropping off the network I've just done some quick tests with # echo 0 > /sys/class/net/br0/bridge/multicast_snooping and it appears to work correctly now. I then reset multicast_snooping and tried changing multicast_querier # echo 1 > /sys/class/net/virbr0/bridge/multicast_snooping # echo 1 > /sys/class/net/virbr0/bridge/multicast_querier Again the clients can now find the server. I'll leave this in place and see how the devices perform over the next couple of days.
One more victim here. Running a 2(vm) nodes cluster (centos 6.5) in oVirt 3.4.1, this ctdb cluster was running very well since months. I upgraded last week the hypervisors from centos 6.4 to 6.5, and the guests were yum upgraded recently. Since then, the totem+corosync layer began to wobble, and googling around lead me to echo 0 to /sys/class/net/[blahblahblah]/bridge/multicast_snooping. I found this work around just a couples of hours ago, but since, I see no more issue. So this sounds a good workaround. Though I would be glad to know more, and avoid having to use a _workaround_...
We are setting up a lot of Redhat VMs that support IPv6 services. Without this command for IGMP snooping disabled, the IPv6 services are not reachable on VMs that are using Bridged interfaces. This is repeatable. Shouldn't there be an exception for passing IPv6 neighbor tables from the bridge to the VM? The guest VM can access any IPv6 hosts, but clients cannot reach the IPv6 VM, unless the VM ping6 the client first. This needs to be addresses in new RedHat releases. We have multiple bridge interfaces and multiple VM servers, and running the command on all our servers does not seem like the solution: echo >> 0 /sys/devices/virtual/net/brXXXX/bridge/multicast_snooping Please address this. Thank you.
Redhat KVM claims to fully support IPv6. KVM works with bridge interfaces for normal deployment. The below link describes the issue in further detail. https://www.v13.gr/blog/?p=378 Again, without this workaround, RH KVM cannot run IPv6 services. echo >> 0 /sys/devices/virtual/net/brXXXX/bridge/multicast_snooping However, disabling multicast_snooping may cause other issues. It would be better to fix the issue instead of using a work around.
Looking into the knowledge base turns up https://access.redhat.com/solutions/784373 and while this works as a workaround on boot, having a fix for this that doesn't require the disabling pf multicast_snooping would be ideal for getting IPv6 connectivity to the Guest VMs.
Hello, Are there any updates on this? Is the workaround mentioned for startup is actually correct? Thanks!
Hello, Is anyone available to pick this up and have a look? Thank you.
Hello, This bug is filed against upstream libvirt (Product: Virtualization Tools). For issues in RHEL, a bug against Product: Red Hat Enterprise Linux would be more accurate and get higher visibility.
I'll re-open this bug against RHEL and get some traction there as well as confirm this customer hit the issue upstream as well. Otherwise we can close this out. Thanks, Ján!
A bug was opened against RHEL, which ended up with a kernel.git commit: commit 47cc84ce0c2fe75c99ea5963c4b5704dd78ead54 Author: Thadeu Lima de Souza Cascardo <cascardo> Date: Fri May 22 12:18:59 2015 -0300 bridge: fix parsing of MLDv2 reports Which is in kernel 4.1 and newer. Since that appears to be the root issue, closing. If anyone is still hitting issues with latest libvirt and new kernels, I suggest filing a new bug (but feel free to reference this issue)