Description of problem: This is a strange one... please bare with me. I and a colleague have begun to experience unreliable host to guest multicast. As far as we can tell, the problems started after updating to libvirt-0.9.11.7-1.fc17 Guest to guest and host to guest multicast appear to be unaffected. We have spent a number of days trying to work through this, but not a lot of what we are seeing is making sense. We'd really appreciate it if someone could work with us to get this resolved. We have been using a combination of omping and fence-virtd for testing. Random datapoints: - i had been happily using libvirt-0.9.11.6 without incident - after upgrading to libvirt-0.9.11.7 (to confirm the behaviour seen by David) $subject occurred - downgrading to libvirt-0.9.11.3-1.fc17 did not make the symptoms go away - the symptoms (dis)appear seemingly at random over a period of hours (same command, nothing else being done on/to the host or guest). - turning on promiscuous mode seems to help - it can happen that omping will work fine for one multicast address but fail for another. - the use of fence_virtd may be related - exercising it several times in a row seems to trigger the problem, but stopping it does not seem to help Version-Release number of selected component (if applicable): libvirt-0.9.11.7-1.fc17 How reproducible: Semi-random Steps to Reproduce: 1. [HOST + GUEST] install omping 2. [HOST + GUEST] omping 192.168.122.1 192.168.122.101 3. Actual results: Guest: [root@pcmk-1 ~]# omping 192.168.122.1 192.168.122.101 192.168.122.1 : joined (S,G) = (*, 232.43.211.234), pinging 192.168.122.1 : unicast, seq=1, size=69 bytes, dist=0, time=0.289ms 192.168.122.1 : multicast, seq=1, size=69 bytes, dist=0, time=0.293ms 192.168.122.1 : unicast, seq=2, size=69 bytes, dist=0, time=0.206ms 192.168.122.1 : multicast, seq=2, size=69 bytes, dist=0, time=0.216ms 192.168.122.1 : unicast, seq=3, size=69 bytes, dist=0, time=0.224ms 192.168.122.1 : multicast, seq=3, size=69 bytes, dist=0, time=0.235ms 192.168.122.1 : unicast, seq=4, size=69 bytes, dist=0, time=0.178ms 192.168.122.1 : multicast, seq=4, size=69 bytes, dist=0, time=0.189ms 192.168.122.1 : unicast, seq=5, size=69 bytes, dist=0, time=0.229ms 192.168.122.1 : multicast, seq=5, size=69 bytes, dist=0, time=0.240ms 192.168.122.1 : unicast, seq=6, size=69 bytes, dist=0, time=0.205ms 192.168.122.1 : multicast, seq=6, size=69 bytes, dist=0, time=0.215ms 192.168.122.1 : unicast, seq=7, size=69 bytes, dist=0, time=0.220ms 192.168.122.1 : multicast, seq=7, size=69 bytes, dist=0, time=0.231ms 192.168.122.1 : unicast, seq=8, size=69 bytes, dist=0, time=0.224ms 192.168.122.1 : multicast, seq=8, size=69 bytes, dist=0, time=0.235ms 192.168.122.1 : waiting for response msg 192.168.122.1 : server told us to stop 192.168.122.1 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.178/0.222/0.289/0.032 192.168.122.1 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.189/0.232/0.293/0.030 Host: [02:57 PM] root@f17 ~ ☺ # omping 192.168.122.1 192.168.122.101 192.168.122.101 : waiting for response msg ... 192.168.122.101 : waiting for response msg 192.168.122.101 : joined (S,G) = (*, 232.43.211.234), pinging 192.168.122.101 : unicast, seq=1, size=69 bytes, dist=0, time=0.158ms 192.168.122.101 : unicast, seq=2, size=69 bytes, dist=0, time=0.305ms 192.168.122.101 : unicast, seq=3, size=69 bytes, dist=0, time=0.495ms 192.168.122.101 : unicast, seq=4, size=69 bytes, dist=0, time=0.249ms 192.168.122.101 : unicast, seq=5, size=69 bytes, dist=0, time=0.261ms 192.168.122.101 : unicast, seq=6, size=69 bytes, dist=0, time=0.255ms 192.168.122.101 : unicast, seq=7, size=69 bytes, dist=0, time=0.266ms ^C 192.168.122.101 : unicast, xmt/rcv/%loss = 7/7/0%, min/avg/max/std-dev = 0.158/0.284/0.495/0.103 192.168.122.101 : multicast, xmt/rcv/%loss = 7/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000 [02:59 PM] root@f17 ~ ☺ # Expected results: Host recieves unicast _and_ multicast messages. Additional info: Testing with fence_virt/fence_xvm [HOST] # yum install -y fence-virtd-multicast fence-virtd-libvirt fence-virtd # cat << EOF > /etc/fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "232.43.211.234"; family = "ipv4"; port = "1229"; # Needed on Fedora systems interface = "virbr0"; } } backends { libvirt { uri = "qemu:///system"; } } EOF # cat << EOF > /etc/cluster/fence_xvm.key redhat EOF service fence_virtd start [GUEST] # cat << EOF > /etc/cluster/fence_xvm.key redhat EOF # fence_xvm -o list -w 0 -m 232.43.211.234 Should get a list of available VMs
The host to guest behavior Andrew has provided data for is the easiest to reproduce. I just want to note that I have seen some strange behavior with guest to guest multicast as well when using corosync. It appears multicast between the guests works for a period of time, but starts to fall apart after a few minutes. This is similar to what Andrew has outlined with the host to guest omping testing. Putting the host bridge device in promiscuous mode appears to resolve my host to guest multicast problems, but the guest to guest multicast problems still occur. I am using libvirt-0.9.11.7.
Finally something that makes sense... Booting with kernel-3.3.7-1.fc17.x86_64 makes things functional again. The next most recent kernel I have is 3.4.0-1.fc17.x86_64 which displays the problem behaviour.
Any chance you can try using a 3.6 kernel just to see if upstream is still affected?
(In reply to comment #3) > Any chance you can try using a 3.6 kernel just to see if upstream is still > affected? 3.6.3-1.fc17.x86_64 is also affected. This is the kernel I first noticed the issue with. I noticed just now that 3.6.7 is also available, this also had the bad behaviour. I tend not to reboot the host for months at a time, so it is possible that I had never used 3.4.0-1.fc17.x86_64 until this morning (Or 3.6.3 until last Wednesday).
First thing comes to mind - try disabling multicast snooping in bridge and see if it affects anything/.
(In reply to comment #5) > First thing comes to mind - try disabling multicast snooping in > bridge and see if it affects anything/. How would I do that? A quick google didn't turn up anything obvious.
You can disable multicast snoopning by: echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping The problem still exists even we disable it, btw.
Oh, you need to enable multicast_querier by: echo 1 > /sys/class/net/virbr0/bridge/multicast_querier
I no longer experience the issue with 3.6.7-4.fc17.x86_64 once I run both commands: echo 1 > /sys/class/net/virbr0/bridge/multicast_querier echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping
Andrew, you probably don't want multicast snooping disabled, the network will be flooded without it. Enabling multicast_querier is enough.
Right just turning on multicast_querier is enough to make everything work. But thats just a work-around right? Multicast should work out-of-the box shouldn't it?
It is not a workaround, please check wikipedia: http://en.wikipedia.org/wiki/IGMP_snooping#IGMP_querier In order for IGMP, and thus IGMP snooping, to function, a multicast router must exist on the network and generate IGMP queries. The tables created for snooping (holding the member ports for each a multicast group) are associated with the querier. Without a querier the tables are not created and snooping will not work. Furthermore IGMP general queries must be unconditionally forwarded by all switches involved in IGMP snooping.[1] Some IGMP snooping implementations include full querier capability. Others are able to proxy and retransmit queries from the multicast router.
Ok, I think I understand all that. However, the behaviour change is clearly a regression (something that used to work doesn't anymore), and worse, the new behaviour is random. Sometimes multicast works, sometimes not - with no discernible change to the system in between. If this option cannot be turned on by default, then IMHO it would be better if multicast did not function at all on virbr0 until the multicast_querier option is enabled. At least that way the system would behave predictably.
I can't reproduce this problem on F16 host running: Linux daikengo 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux [root@daikengo bridge]# cat multicast_snooping 1 [root@daikengo bridge]# cat multicast_querier 0 I am not using libvirt to manage the bridge, I create it manually via /etc/sysconfig/.... that might be a big difference and a test to perform (is libvirt issuing some random commands? is it setting odd parameters by default?) (In reply to comment #0) > As far as we can tell, the problems started after updating to > - i had been happily using libvirt-0.9.11.6 without incident > - after upgrading to libvirt-0.9.11.7 (to confirm the behaviour seen by > David) $subject occurred > - downgrading to libvirt-0.9.11.3-1.fc17 did not make the symptoms go away > - the symptoms (dis)appear seemingly at random over a period of hours (same > command, nothing else being done on/to the host or guest). Try to check if any parameter in the bridge config have changed or might look suspicious. I think that the upgrade might have updated the config in /etc/libvirt/... but the downgrade did not revert the changes (it's something to investigate). It might be a useful datapoint to try and install kernel from f16 and see if the issue is still there. That should isolate the issue even further. If the kernel from f16 works, then we can probably assume that libvirt is ok and we can investigate possible kernel config and patch differences between f16 and f17.
This toggle was introduced by this commit: commit c5c23260594c5701af66ef754916775ba6a46bbc Author: Herbert Xu <herbert.org.au> Date: Fri Apr 13 02:37:42 2012 +0000 bridge: Add multicast_querier toggle and disable queries by default Sending general queries was implemented as an optimisation to speed up convergence on start-up. In order to prevent interference with multicast routers a zero source address has to be used. Unfortunately these packets appear to cause some multicast-aware switches to misbehave, e.g., by disrupting multicast packets to us. Since the multicast snooping feature still functions without sending our own queries, this patch will change the default to not send queries. For those that need queries in order to speed up convergence on start-up, a toggle is provided to restore the previous behaviour. Signed-off-by: Herbert Xu <herbert.org.au> Signed-off-by: David S. Miller <davem> which was merged in v3.5-rc1.
> For those that need queries in order to speed up convergence on start-up, > a toggle is provided to restore the previous behaviour. So if I understand correctly, this toggle makes multicast work faster. Yes? If so, how does this explain the on-off-on-off-on behaviour I was seeing? Without this toggle I would expect multicast to be non-functional initially and then start working at some future point. Whereas we saw that multicast started off working and then stopped. Do you understand our confusion?
Can you make sure your firewall is turned off? I noticed the default iptables rules can block multicast packets on host. I don't have to turn on multicast querier after turning off firewall.
(In reply to comment #14) > I can't reproduce this problem on F16 host running: > > Linux daikengo 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 > x86_64 x86_64 x86_64 GNU/Linux I was able to reproduce this on F16 with an earlier kernel: # uname -r 3.6.2-1.fc16.x86_64 I'm updating to the latest F16 kernel and will retest.
(In reply to comment #17) > Can you make sure your firewall is turned off? I noticed the default > iptables rules can block multicast packets on host. I don't have to turn on > multicast querier after turning off firewall. Firewall is the first thing I turn off when I install fedora. /bin/systemctl status iptables.service confirms that it is off.
3.6.7 (F16) works pretty well for me: % cat /sys/class/net/virbr0/bridge/multicast_snooping 1 % cat /sys/class/net/virbr0/bridge/multicast_querier 0 % omping -m 224.8.8.8 192.168.122.{1,45} 192.168.122.45 : waiting for response msg 192.168.122.45 : waiting for response msg 192.168.122.45 : waiting for response msg 192.168.122.45 : waiting for response msg 192.168.122.45 : waiting for response msg 192.168.122.45 : waiting for response msg 192.168.122.45 : joined (S,G) = (*, 224.8.8.8), pinging 192.168.122.45 : unicast, seq=1, size=69 bytes, dist=0, time=4.458ms 192.168.122.45 : multicast, seq=1, size=69 bytes, dist=0, time=4.923ms 192.168.122.45 : unicast, seq=2, size=69 bytes, dist=0, time=1.691ms 192.168.122.45 : multicast, seq=2, size=69 bytes, dist=0, time=2.897ms 192.168.122.45 : unicast, seq=3, size=69 bytes, dist=0, time=1.750ms 192.168.122.45 : multicast, seq=3, size=69 bytes, dist=0, time=2.674ms 192.168.122.45 : unicast, seq=4, size=69 bytes, dist=0, time=1.540ms 192.168.122.45 : multicast, seq=4, size=69 bytes, dist=0, time=2.438ms 192.168.122.45 : unicast, seq=5, size=69 bytes, dist=0, time=1.457ms 192.168.122.45 : multicast, seq=5, size=69 bytes, dist=0, time=2.549ms 192.168.122.45 : unicast, seq=6, size=69 bytes, dist=0, time=2.420ms 192.168.122.45 : multicast, seq=6, size=69 bytes, dist=0, time=2.599ms 192.168.122.45 : unicast, seq=7, size=69 bytes, dist=0, time=1.607ms 192.168.122.45 : multicast, seq=7, size=69 bytes, dist=0, time=2.512ms 192.168.122.45 : unicast, seq=8, size=69 bytes, dist=0, time=0.694ms 192.168.122.45 : multicast, seq=8, size=69 bytes, dist=0, time=0.978ms 192.168.122.45 : waiting for response msg ^C 192.168.122.45 : unicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.694/1.952/4.458/1.116 192.168.122.45 : multicast, xmt/rcv/%loss = 8/8/0%, min/avg/max/std-dev = 0.978/2.696/4.923/1.075 ^C% % uname -r 3.6.7-4.fc16.x86_64 % ifconfig virbr0 virbr0 Link encap:Ethernet HWaddr 52:54:00:2E:23:92 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:fe2e:2392/64 Scope:Link inet6 addr: fd00:1:2:3::1/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:155604 errors:0 dropped:0 overruns:0 frame:0 TX packets:2087 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:11989229 (11.4 MiB) TX bytes:234197 (228.7 KiB)
I'm using 3.6.7-4 with the same multicast flags as you. I can confirm this does _not_ work. I really wish it magically disappeared in the same fashion it showed up :( Try this, I can reproduce it easily this way. 1. omping on host 2. omping on guest multicast and unicast traffic should be going fine. 3. now start a second guest After a few seconds, only unicast traffic remains. ------------------------- . . . 192.168.122.80 : multicast, seq=253, size=69 bytes, dist=0, time=0.181ms 192.168.122.80 : unicast, seq=254, size=69 bytes, dist=0, time=0.153ms 192.168.122.80 : multicast, seq=254, size=69 bytes, dist=0, time=0.179ms 192.168.122.80 : unicast, seq=255, size=69 bytes, dist=0, time=0.164ms 192.168.122.80 : multicast, seq=255, size=69 bytes, dist=0, time=0.189ms 192.168.122.80 : unicast, seq=256, size=69 bytes, dist=0, time=0.147ms 192.168.122.80 : multicast, seq=256, size=69 bytes, dist=0, time=0.172ms 192.168.122.80 : unicast, seq=257, size=69 bytes, dist=0, time=0.288ms 192.168.122.80 : unicast, seq=258, size=69 bytes, dist=0, time=0.145ms 192.168.122.80 : unicast, seq=259, size=69 bytes, dist=0, time=0.192ms 192.168.122.80 : unicast, seq=260, size=69 bytes, dist=0, time=0.168ms . . . 192.168.122.80 : unicast, xmt/rcv/%loss = 328/328/0%, min/avg/max/std-dev = 0.116/0.268/1.015/0.072 192.168.122.80 : multicast, xmt/rcv/%loss = 328/256/21%, min/avg/max/std-dev = 0.129/0.297/1.003/0.067
Also, multicast between two guests doesn't work for me either, only unicast makes it through. I have noticed that using promiscuous mode on the virb0 bridge (ip link set virbr0 promisc on) will help host -> guest multicast. Meaning if I shutdown and restart guests, things appear to continue to work. Nothing I do helps guest -> guest multicast though. We really need to get to the bottom of this. I'm seeing others in the HA community encounter this problem as well. Let me know if there is anything else I can do to help you all reproduce/test this. ----- guest to guest, only unicast makes it ----- omping 192.168.122.80 192.168.122.81 192.168.122.80 : waiting for response msg 192.168.122.80 : joined (S,G) = (*, 232.43.211.234), pinging . . . 192.168.122.80 : unicast, seq=8, size=69 bytes, dist=0, time=0.400ms 192.168.122.80 : unicast, seq=9, size=69 bytes, dist=0, time=0.363ms 192.168.122.80 : unicast, seq=10, size=69 bytes, dist=0, time=0.306ms ^C 192.168.122.80 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.184/0.347/0.400/0.063 192.168.122.80 : multicast, xmt/rcv/%loss = 10/0/100%, min/avg/max/std-dev = 0.0
(In reply to comment #21) > Try this, I can reproduce it easily this way. > > 1. omping on host > 2. omping on guest > > multicast and unicast traffic should be going fine. > > 3. now start a second guest > > After a few seconds, only unicast traffic remains. Before I get everyone off track, I just want to point out that these steps to reproduce the issue have proven to be unreliable for me. Something about starting/stopping other guests appears to affect this issue, but I've yet to find a consistent set of steps that reproduces this 100% of the time. Also, I've just seen something else kind of interesting related to this. Now for some reason I'm in a state where the guest can receive both unicast+multicast, but the host only receives unicast. -- Vossel
David, did you add '224.0.0.0/4 dev virbr0' to your route table? If not, can you try to see if it will make any difference for you?
Ran another test today using the multicast route. # route -n | grep virbr0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 virbr0 # cat /sys/class/net/virbr0/bridge/multicast_snooping 1 # cat /sys/class/net/virbr0/bridge/multicast_querier 0 # uname -r 3.6.7-4.fc17.x86_64 Same results. Unicast traffic only between host and guest after round sequence number 260. It is interesting to me that it seems like multicast traffic falls apart around the same time consistently. Or at least in comment 21 the test appears to start failing around sequence number 256... In my latest test it was around 260. # omping 192.168.122.1 192.168.122.80 . . . 192.168.122.1 : unicast, seq=259, size=69 bytes, dist=0, time=0.198ms 192.168.122.1 : multicast, seq=259, size=69 bytes, dist=0, time=0.210ms 192.168.122.1 : unicast, seq=260, size=69 bytes, dist=0, time=0.316ms 192.168.122.1 : multicast, seq=260, size=69 bytes, dist=0, time=0.329ms 192.168.122.1 : unicast, seq=261, size=69 bytes, dist=0, time=0.251ms 192.168.122.1 : unicast, seq=262, size=69 bytes, dist=0, time=0.264ms 192.168.122.1 : unicast, seq=263, size=69 bytes, dist=0, time=0.318ms 192.168.122.1 : unicast, seq=264, size=69 bytes, dist=0, time=0.292ms 192.168.122.1 : unicast, seq=265, size=69 bytes, dist=0, time=0.306ms
I just posted a fix in upstream: http://marc.info/?l=linux-netdev&m=136270854710666&w=2
*** Bug 926953 has been marked as a duplicate of this bug. ***
(In reply to comment #26) > I just posted a fix in upstream: > http://marc.info/?l=linux-netdev&m=136270854710666&w=2 Sounds promising! Is there a fedora or rhel kernel that includes the fix so we can test?
Not at the moment, there was a lot of discussion in the upstream, and I didn't see a conclusion. Cong, is this going to make a linus tree?
Yes, I am still working on the patches.
In upstream, the following commits fix this bug: commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b Author: Cong Wang <amwang> Date: Tue May 21 21:52:55 2013 +0000 bridge: only expire the mdb entry when query is received commit 6b7df111ece130fa979a0c4f58e53674c1e47d3e Author: Cong Wang <amwang> Date: Tue May 21 21:52:56 2013 +0000 bridge: send query as soon as leave is received And both commits are backported to RHEL6 and RHEL5. I am sorry I don't have a fedora kernel build for this, you probably need to wait for rawhide to include them.
(In reply to Cong Wang from comment #31) > In upstream, the following commits fix this bug: > > commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b > Author: Cong Wang <amwang> > Date: Tue May 21 21:52:55 2013 +0000 > > bridge: only expire the mdb entry when query is received > commit 6b7df111ece130fa979a0c4f58e53674c1e47d3e > Author: Cong Wang <amwang> > Date: Tue May 21 21:52:56 2013 +0000 > > bridge: send query as soon as leave is received > > And both commits are backported to RHEL6 and RHEL5. I am sorry I don't have > a fedora kernel build for this, you probably need to wait for rawhide to > include them. Those aren't in Linus' tree. I can't find them in Dave Miller's net-next tree either. Could you point me to the tree that contains them?
(In reply to Josh Boyer from comment #32) > (In reply to Cong Wang from comment #31) > > In upstream, the following commits fix this bug: > > > > commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b > > Author: Cong Wang <amwang> > > Date: Tue May 21 21:52:55 2013 +0000 > > > > bridge: only expire the mdb entry when query is received > > commit 6b7df111ece130fa979a0c4f58e53674c1e47d3e > > Author: Cong Wang <amwang> > > Date: Tue May 21 21:52:56 2013 +0000 > > > > bridge: send query as soon as leave is received > > > > And both commits are backported to RHEL6 and RHEL5. I am sorry I don't have > > a fedora kernel build for this, you probably need to wait for rawhide to > > include them. > > Those aren't in Linus' tree. I can't find them in Dave Miller's net-next > tree either. Could you point me to the tree that contains them? Ah, nevermind. I found them. https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/bridge?id=9f00b2e7cf241fa389733d41b615efdaa2cb0f5b https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/bridge?id=6b7df111ece130fa979a0c4f58e53674c1e47d3e
*** Bug 970846 has been marked as a duplicate of this bug. ***
(In reply to Cong Wang from comment #31) > > And both commits are backported to RHEL6 and RHEL5. I am sorry I don't have > a fedora kernel build for this, you probably need to wait for rawhide to > include them. Any idea of which upstream and/or fedora/rawhide kernel(s) these might go/have went into?
(In reply to Brian J. Murrell from comment #35) > (In reply to Cong Wang from comment #31) > > > > And both commits are backported to RHEL6 and RHEL5. I am sorry I don't have > > a fedora kernel build for this, you probably need to wait for rawhide to > > include them. > > Any idea of which upstream and/or fedora/rawhide kernel(s) these might > go/have went into? None. The patches are only in linux-next at this point. Cong, if they're simple enough to backport stand-alone, I can put them into Fedora now. Do they depend on any other changes queued up in linux-next?
To add my voice to the din; I got hit by this on F19 beta TC5. If it's at all possible to backport, it would certainly be appreciated. I know a lot of people use KVM guests as nodes to learn HA clustering and that, in turn, requires fence_virtd/fence_xvm, which in turn needs this to work. It can be worked around *after* the user identifies the problem. Thanks for all the good work!
I use this udev rule to work around the issue for virbr* devices: # cat /etc/udev/rules.d/61-virbr-querier.rules ACTION=="add", SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/vnet_querier_enable" # cat /etc/sysconfig/network-scripts/vnet_querier_enable #!/bin/sh if [[ $INTERFACE == virbr* ]]; then /bin/echo 1 > /sys/devices/virtual/net/$INTERFACE/bridge/multicast_querier fi
Trapier's method is what I used as the work around as well, and it works perfectly for me.
(In reply to Josh Boyer from comment #36) > > Cong, if they're simple enough to backport stand-alone, I can put them into > Fedora now. Do they depend on any other changes queued up in linux-next? No, they don't depend on other patches. They are already backported to RHEL5 and RHEL6.
Sorry, merged with this patch, I found some problems in this code, it will result in the reboot of the system because of some problems about timer. Have you ever met ? [ 3.593054] device 108324108806000 entered promiscuous mode [ 3.595331] vsw75a3336f581d: port 3(108324108806000) entered forwarding state [ 3.595343] vsw75a3336f581d: port 3(108324108806000) entered forwarding state [ 1.436559] vsw75a3336f581d: port 3(108324108806000) entered forwarding state [ 4.178588] vsw75a3336f581d: port 3(108324108806000) entered disabled state [ 4.179013] vsw75a3336f581d: port 3(108324108806000) entered disabled state [ 4.179139] vsw75a3336f581d: port 3(108324108806000) entered disabled state [ 4.193998] device 108324108806000 entered promiscuous mode [ 4.196297] vsw75a3336f581d: port 3(108324108806000) entered forwarding state [ 4.196309] vsw75a3336f581d: port 3(108324108806000) entered forwarding state [ 1.430543] ------------[ cut here ]------------ [ 1.430560] kernel BUG at kernel/timer.c:1102! [ 1.430579] invalid opcode: 0000 [#1] SMP [ 1.430611] Modules linked in: nfsv4 bridge stp llc tun fuse nfsd auth_rpcgss nfs_acl nfs lockd dns_resolver fscache sunrpc iptable_filter ipt_MASQUERADE ipt able_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipat h scsi_dh watch_reboot(O) sffs(O) cl_lock(O) cl_softdog(O) kvm_intel kvm loop jo ydev hid_generic usbhid hid iTCO_wdt dm_mod evdev crc32c_intel aesni_intel aes_x 86_64 aes_generic ablk_helper cryptd microcode lpc_ich mfd_core thermal pcspkr e hci_hcd acpi_cpufreq mperf button usbcore e1000e usb_common processor thermal_sy s [last unloaded: kvm] [ 1.431188] CPU 0 [ 1.431195] Pid: 0, comm: swapper/0 Tainted: G O 3.6.7-vtp #126 Int el Corporation S1200BTL/S1200BTL [ 1.431322] RIP: 0010:[<ffffffff81046988>] [<ffffffff81046988>] cascade+0x51 /0x74 ... [ 1.432457] Call Trace: [ 1.432496] <IRQ> [ 1.432502] [ 1.432547] [<ffffffff81046b27>] ? run_timer_softirq+0x8f/0x29f [ 1.432597] [<ffffffff81288118>] ? timerqueue_add+0x80/0xa0 [ 1.432647] [<ffffffff81040c96>] ? __do_softirq+0xa3/0x1c0 [ 1.432698] [<ffffffff81072506>] ? ktime_get+0x62/0x6a [ 1.432747] [<ffffffff81076f09>] ? clockevents_program_event+0x9a/0xb6 [ 1.432802] [<ffffffff8146403c>] ? call_softirq+0x1c/0x30 [ 1.432853] [<ffffffff8100f8fa>] ? do_softirq+0x3a/0x78 [ 1.432902] [<ffffffff81040f70>] ? irq_exit+0x3a/0x91 [ 1.432951] [<ffffffff810281c4>] ? smp_apic_timer_interrupt+0x73/0x81 [ 1.433005] [<ffffffff8146394a>] ? apic_timer_interrupt+0x6a/0x70 [ 1.433055] <EOI> [ 1.433061] [ 1.433105] [<ffffffff81056fb2>] ? enqueue_hrtimer+0x1c/0x66 [ 1.433154] [<ffffffff812c74d4>] ? intel_idle+0xdf/0x10a [ 1.433204] [<ffffffff812c74af>] ? intel_idle+0xba/0x10a [ 1.433254] [<ffffffff81013795>] ? paravirt_sched_clock+0x5/0x8 [ 1.433307] [<ffffffff8138185f>] ? cpuidle_enter_state+0xa/0x31 [ 1.433358] [<ffffffff81381945>] ? cpuidle_idle_call+0xbf/0x154 [ 1.433410] [<ffffffff81014f8b>] ? cpu_idle+0x9c/0xe6 [ 1.433460] [<ffffffff81916b41>] ? start_kernel+0x39b/0x3a6 [ 1.433510] [<ffffffff819165bc>] ? repair_env_string+0x57/0x57 [ 1.433562] [<ffffffff819163d3>] ? x86_64_start_kernel+0x101/0x10c [ 1.433613] Code: 04 24 48 8b 46 08 48 89 44 24 08 48 89 20 48 89 36 48 89 76 08 48 8b 34 24 48 8b 1e eb 1d 48 8b 4e 18 48 83 e1 fe 48 39 cd 74 02 <0f> 0b 48 89 ef e8 82 e6 ff ff 48 89 de 48 8b 1b 4c 39 ee 75 de [ 1.434070] RIP [<ffffffff81046988>] cascade+0x51/0x74 [ 1.434122] RSP <ffff88042f003e48>
And I met various Call Trace about this problem, such as : [ 3.498986] Call Trace: [ 3.499032] [<ffffffff810786a2>] ? __tick_nohz_idle_enter+0x2ed/0x315 [ 3.499088] [<ffffffff81078772>] ? tick_nohz_idle_enter+0x53/0x5a [ 3.499144] [<ffffffff81014f4f>] ? cpu_idle+0x60/0xe6 [ 3.499196] [<ffffffff8144dff5>] ? start_secondary+0x1d2/0x1d8 [ 3.499248] Code: 93 28 1c 00 00 48 89 54 24 28 48 89 ca 4a 8b 6c 04 10 83 e2 3f 89 d6 48 63 fa 48 c1 e7 04 4c 8d 54 3d 00 49 8b 3a 4d 89 d5 eb 1a <f6> 47 18 01 75 11 4c 8b 57 10 41 b9 01 00 00 00 49 39 c2 49 0f [ 3.499747] RIP [<ffffffff81047161>] get_next_timer_interrupt+0x125/0x20f [ 3.499807] RSP <ffff88042cd49e38> [ 3.499850] CR2: 0000000000000018
Since you are using 3.6.7 kernel, I assume you backport them by yourself? I never see that before, can you try the latest net-next? Thanks.
Patches applied to F17-rawhide.
OK. I will try, thanks. But I still have a question, why this patch was not merged into the stable kernel 3.9.7, which is released a few days ago?
And I found that this code has a problem in the function of br_multicast_del_pg, it will modify the timer without confirming that the timer had been setup? And I guess the reboot problem is related to the network. When there is a multicast router sending igmp-query packets, the problem will appear in sometime.
kernel-3.9.8-100.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.8-100.fc17
kernel-3.9.8-300.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.9.8-300.fc19
Oh, comment 47.5 must have gotten dropped -- where the Fedora 18 kernel update announcement was made. :-) Seriously though, what about Fedora 18? It's the latest/stable, for those not able to jump onto beta/RCs, or even upgrade to F19 the moment it's released.
(In reply to Brian J. Murrell from comment #49) > Oh, comment 47.5 must have gotten dropped -- where the Fedora 18 kernel > update announcement was made. :-) > > Seriously though, what about Fedora 18? It's the latest/stable, for those > not able to jump onto beta/RCs, or even upgrade to F19 the moment it's > released. Heh, I was waiting for that, too. If/when it's posted, I'll test it and give karma.
(In reply to Brian J. Murrell from comment #49) > Oh, comment 47.5 must have gotten dropped -- where the Fedora 18 kernel > update announcement was made. :-) > > Seriously though, what about Fedora 18? It's the latest/stable, for those > not able to jump onto beta/RCs, or even upgrade to F19 the moment it's > released. It's in F18 and will be in the next build. We have 3 different maintainers handling the various releases so things get done at different times occasionally.
I want to know when this patch will be merged into the stable kernel in the https://kernel.org ?
(In reply to LiYonghua from comment #46) > And I found that this code has a problem in the function of > br_multicast_del_pg, it will modify the timer without confirming that the > timer had been setup? Right... something like this: diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 81befac..69af490 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -270,7 +270,7 @@ static void br_multicast_del_pg(struct net_bridge *br, del_timer(&p->timer); call_rcu_bh(&p->rcu, br_multicast_free_pg); - if (!mp->ports && !mp->mglist && + if (!mp->ports && !mp->mglist && mp->timer_armed && netif_running(br->dev)) mod_timer(&mp->timer, jiffies); Can you test this patch? Thanks!
(In reply to LiYonghua from comment #45) > OK. I will try, thanks. But I still have a question, why this patch was not > merged into the stable kernel 3.9.7, which is released a few days ago? Because the patches are not suitable for stable, too risky.
Package kernel-3.9.8-300.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.8-300.fc19' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-11901/kernel-3.9.8-300.fc19 then log in and leave karma (feedback).
kernel-3.9.8-200.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.9.8-200.fc18
(In reply to Fedora Update System from comment #56) > kernel-3.9.8-200.fc18 has been submitted as an update for Fedora 18. > https://admin.fedoraproject.org/updates/kernel-3.9.8-200.fc18 This did not fix the problem for me. I removed Trapier Marshall' udev rule/script from comment #38, installed the kernel in comment #56 and rebooted. == Host: lemass:/home/digimer# uname -a Linux lemass.alteeve.ca 3.9.8-200.fc18.x86_64 #1 SMP Fri Jun 28 14:45:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux lemass:/home/digimer# cat /sys/devices/virtual/net/virbr0/bridge/multicast_querier 0 lemass:/home/digimer# fence_xvm -o list pcmk1 83f6abdc-bb48-d794-4aca-13f091f32c8b on pcmk2 2d778455-de7d-a9fa-994c-69d7b079fda8 on == Guest [root@pcmk1 ~]# fence_xvm -o list Timed out waiting for response Operation failed == Host lemass:/home/digimer# echo 1 > /sys/devices/virtual/net/virbr0/bridge/multicast_querier == Guest [root@pcmk1 ~]# fence_xvm -o list pcmk1 83f6abdc-bb48-d794-4aca-13f091f32c8b on pcmk2 2d778455-de7d-a9fa-994c-69d7b079fda8 on
FWIW, fiddling with /sys/class/net/virbr0/bridge/multicast_querier never did correct the behaviour for me. I always have to do: echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping in order to have working multicast between guests. Maybe there is more than one issue lurking here.
kernel-3.9.8-100.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.9.8-300.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
Is there an fc18 kernel update with this fix in the pipeline as well?
(In reply to Brian from comment #61) > Is there an fc18 kernel update with this fix in the pipeline as well? It would be the one mentioned in comment #56.
Whoops - my apologies. I've been lurking this report for a couple of months and my pattern recog. didn't trigger on those changes. Thanks for the cluebyfour :)
(In reply to Josh Boyer from comment #62) > (In reply to Brian from comment #61) > > Is there an fc18 kernel update with this fix in the pipeline as well? > > It would be the one mentioned in comment #56. Which didn't work for me. Did it work for others?
(In reply to digimer from comment #64) > (In reply to Josh Boyer from comment #62) > > It would be the one mentioned in comment #56. > > Which didn't work for me. Did it work for others? Thus far yes; all my hypervisor-homed multicast applications have survived the 240 second IGMP interest discovery pruning they had previously been falling victim to. I'm migrating the VMs back now to see that they also function as desired.
When my KVM host system is running kernel-3.9.8-200.fc18, and I shut down a KVM guest, the host system throws an oops. It's 100% reproducible so far (4 oopses in 4 tests) and the start of the call trace is in br_multicast_* calls. I'm pretty sure the fix for this bug is itself broken. See bug 980254.
(In reply to James Ralston from comment #66) > When my KVM host system is running kernel-3.9.8-200.fc18, and I shut down a > KVM guest, the host system throws an oops. It's 100% reproducible so far (4 > oopses in 4 tests) and the start of the call trace is in br_multicast_* > calls. > > I'm pretty sure the fix for this bug is itself broken. See bug 980254. Can you try my patch in comment #53?
For all of those having trouble with vhost and/or bridging in guests, please try the scratch build below when it completes. It contains the patch from bug 880035 for the timer fix and the use-after-free fix for vhost-net backported to 3.9.8. http://koji.fedoraproject.org/koji/taskinfo?taskID=5569247
Sigh. Of course, it would help if I didn't typo the patch. Anyway, here is a scratch build that should actually finish building: http://koji.fedoraproject.org/koji/taskinfo?taskID=5569571
Third time is a charm. This one actually looks like it built. Sigh, sorry about that. http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631
kernel-3.9.9-201.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.9.9-201.fc18
kernel-3.9.9-201.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
The new kernel seems to solve a new problem. Running a RHEL 6.2 cluster on top on oVirt or KVM, both Fedora 18 and Fedora 19 seems to work fine now. Multicasti is running smoothly. After creating a cluster trying to create a GFS2 filesystem on a (virtaul) iscsi lun the KVM host shows in dmesg: [14991.447577] BUG: unable to handle kernel paging request at 000000020b38b01c [14991.447612] IP: [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [14991.447637] PGD 0 [14991.447647] Oops: 0000 [#1] SMP [14991.447663] Modules linked in: vhost_net macvtap macvlan fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack acer_wmi arc4 snd_hda_codec_hdmi snd_hda_codec_conexant ath9k ath9k_common rtsx_pci_sdmmc mmc_core iTCO_wdt mperf iTCO_vendor_support rtsx_pci_ms memstick sparse_keymap snd_hda_intel ath9k_hw ath mac80211 cfg80211 coretemp kvm_intel kvm snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media atl1c mei rfkill snd_page_alloc snd_timer snd rtsx_pci lpc_ich mfd_core i2c_i801 serio_raw soundcore joydev microcode wmi uinput binfmt_misc dm_crypt crc32_pclmul i915 crc32c_intel [14991.447988] ghash_clmulni_intel i2c_algo_bit drm_kms_helper drm i2c_core video [14991.448016] CPU 1 [14991.448026] Pid: 11006, comm: vhost-11005 Not tainted 3.9.9-302.fc19.x86_64 #1 Acer TravelMate 5760/BAV50_HR [14991.448059] RIP: 0010:[<ffffffff8113cada>] [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [14991.448088] RSP: 0018:ffff88017bd1dc50 EFLAGS: 00010286 [14991.448106] RAX: ffff8802201cd400 RBX: ffff8802201ce000 RCX: 0000000000000003 [14991.448128] RDX: 0000000000000140 RSI: 0000000000000246 RDI: ffff8802201ce000 [14991.448151] RBP: ffff88017bd1dc68 R08: 0000000000000001 R09: 0000000000000010 [14991.448173] R10: 0000000000000000 R11: 0000000000000000 R12: 000000020b38b000 [14991.448198] R13: ffffffffa0585fd4 R14: 000000000000000c R15: ffff88016ee24600 [14991.448220] FS: 0000000000000000(0000) GS:ffff88024fa40000(0000) knlGS:0000000000000000 [14991.448246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [14991.448265] CR2: 000000020b38b01c CR3: 0000000106530000 CR4: 00000000000427e0 [14991.448288] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [14991.448311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [14991.448334] Process vhost-11005 (pid: 11006, threadinfo ffff88017bd1c000, task ffff88018834c650) [14991.448361] Stack: [14991.448368] ffff8802201ce000 ffff88016ee24600 ffffffffa0585fd4 ffff88017bd1dc80 [14991.448397] ffffffff8113cceb 0000000000000012 ffff88017bd1dca0 ffffffff8152d0a7 [14991.448425] ffff88016ee24600 ffff88016ee24600 ffff88017bd1dcb8 ffffffff8152d13a [14991.448453] Call Trace: [14991.448466] [<ffffffffa0585fd4>] ? tun_get_user+0x724/0x810 [tun] [14991.448487] [<ffffffff8113cceb>] put_page+0x4b/0x60 [14991.448505] [<ffffffff8152d0a7>] skb_release_data+0x87/0x100 [14991.448524] [<ffffffff8152d13a>] __kfree_skb+0x1a/0xb0 [14991.448541] [<ffffffff8152d202>] kfree_skb+0x32/0x90 [14991.448559] [<ffffffffa0585fd4>] tun_get_user+0x724/0x810 [tun] [14991.448580] [<ffffffffa0586117>] tun_sendmsg+0x57/0x80 [tun] [14991.448601] [<ffffffffa05f2a78>] handle_tx+0x1c8/0x640 [vhost_net] [14991.448622] [<ffffffffa05f2f25>] handle_tx_kick+0x15/0x20 [vhost_net] [14991.448644] [<ffffffffa05ef81d>] vhost_worker+0xed/0x190 [vhost_net] [14991.448666] [<ffffffffa05ef730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [14991.449637] [<ffffffff810802a0>] kthread+0xc0/0xd0 [14991.450603] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [14991.451570] [<ffffffff8164f26c>] ret_from_fork+0x7c/0xb0 [14991.452535] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [14991.453499] Code: 48 c7 c7 27 08 9f 81 e8 55 02 f2 ff 48 89 df e8 1d f7 ff ff 85 c0 74 a4 eb 9a 4c 8b 67 30 48 8b 07 f6 c4 80 74 cd 4c 39 e7 74 c8 <41> 8b 54 24 1c 85 d2 74 bf 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c [14991.455725] RIP [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [14991.456757] RSP <ffff88017bd1dc50> [14991.457750] CR2: 000000020b38b01c [14991.462898] ---[ end trace 49f39f00e7965faf ]---