From https://bugs.launchpad.net/bugs/1078305: If I send a broadcast message like this to the limited broadcast address: echo a | nc -bu 255.255.255.255 5000 then the resulting packet looks like this on the sender side: 14:36:17.997662 02:00:c0:a8:7a:fb (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 44: (tos 0x0, ttl 64, id 25278, offset 0, flags [DF], proto UDP (17), length 30) 192.168.122.251.42141 > 255.255.255.255.5000: [bad udp cksum 476f!] UDP, length 2 However, an other VM on the same host sees the following packet: 14:36:19.247793 02:00:c0:a8:7a:fb (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 44: (tos 0x0, ttl 64, id 25278, offset 0, flags [DF], proto UDP (17), length 30) 192.168.122.1.42141 > 255.255.255.255.5000: [bad udp cksum 3b71!] UDP, length 2 So the source MAC address and other headers are untouched, but the source IP address is changed to the default gateway's! If I use a the subnet-specific broadcast, then the packets are left untouched, i.e.: echo a | nc -bu 192.168.122.255 5000 14:41:33.313490 02:00:c0:a8:7a:fb (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 44: (tos 0x0, ttl 64, id 38571, offset 0, flags [DF], proto UDP (17), length 30) 192.168.122.251.46821 > 192.168.122.255.5000: [bad udp cksum aee5!] UDP, length 2 14:41:34.563615 02:00:c0:a8:7a:fb (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 44: (tos 0x0, ttl 64, id 38571, offset 0, flags [DF], proto UDP (17), length 30) 192.168.122.251.46821 > 192.168.122.255.5000: [bad udp cksum aee5!] UDP, length 2 This is a workaround: IFACE=$(route -n | grep '^0.0.0.0 ' | sed 's/.* //g') main_broadcast_addr=$(ip -4 addr show "$IFACE" | grep 'inet .* brd ' | sed 's/.* brd //;s/\([0-9.]*\).*/\1/') iptables -t nat -A OUTPUT -d 255.255.255.255 -p tcp -j DNAT --to-destination $main_broadcast_addr iptables -t nat -A OUTPUT -d 255.255.255.255 -p udp -j DNAT --to-destination $main_broadcast_addr
I think I'm hitting this bug... This is quite serious! As it turns out, because of it, VRRP (keepalived) does not work! VRRP advertisement packets get sent out from all hosts that should be broadcasting. They get sent to "vrrp.mcast.net" which is 224.0.0.18 When the other hosts see this traffic it looks like: 14:28:11.947845 IP (tos 0xc0, ttl 255, id 18, offset 0, flags [none], proto VRRP (112), length 40) 192.168.142.1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 42, prio 254, authtype simple, intvl 3s, length 20, addrs: 192.168.142.3 auth "password" You'll see that 192.168.142.1 is actually the router IP, which is assigned to virbr1 For fun, I decided to forcefully delete this address: sudo ip a del 192.168.142.1/24 dev virbr1 And now things are still broken, but the source IP address seen in the packets is: 192.168.1.106 Hmmm what address is that? Turns out it's the wlan0 address which virbr1 is "nat"-ed to to give the vm's access to the outside world. $ virsh -c qemu:///system net-dumpxml gluster <network connections='5' ipv6='yes'> <name>gluster</name> <uuid>c858e830-3ed1-44d6-8df2-62fcb408b825</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr1' stp='on' delay='0' /> <mac address='52:54:00:85:5e:33'/> <ip address='192.168.142.1' netmask='255.255.255.0'> </ip> </network> So because the source address is wrong, the VRRP packets are discarded, and VRRP doesn't work! Setting the mcast_src_ip option in VRRP doesn't help, since even if you forcefully specify it, it doesn't matter. libvirt seems to just choose it's own address instead! I haven't decided on a proper workaround yet, since packets should really use the vrrp bcast address, and not the local broadcast address... Any suggestions welcome. Fixes even more so! Cheers
I think this might be related: http://www.redhat.com/archives/libvir-list/2013-September/msg01315.html
(In reply to purpleidea from comment #2) > I think this might be related: > > http://www.redhat.com/archives/libvir-list/2013-September/msg01315.html Sorry, start of thread looks like: http://www.redhat.com/archives/libvir-list/2013-September/msg01311.html And maybe this is fixed upstream. If so, maybe someone has a commit id and knows what version it is fixed in...
Looks to be related to: https://bugzilla.redhat.com/show_bug.cgi?id=709418
Looks like: 51e184e9821c3740ac9b52055860d683f27b0ab6
Which git tag --contains 51e184e9821c3740ac9b52055860d683f27b0ab6 looks like is available in F20 but not in 1.0.5.8 (F19) Oh well. I guess it's time to upgrade. Hope these comments help someone else.
Sounds like this was fixed