Description of problem: Running a HVM guest over xen (kernel-xen >= 2.6.18-261.el5xen), there will be network outage during live migration. This issue doesn't exist with kernel-xen-2.6.18-259.el5 (test with ping, usually 2~3 packets lost with <=259, may 30~40 packets lost with >= 261). Version-Release number of selected component (if applicable): kernel-xen-2.6.18-261.el5 (or may kernel-xen-2.6.18-260.el5 which removed from brew) How reproducible: 80% Steps to Reproduce: 1. boot up a HVM guest (I tested with RHEL5.6, should not related to specific guest) 2. live migrate the guest Actual results: There is outage during live migration Expected results: Should be little packets lost, for example Additional info: No issue with PV guest on 261 kernel.
There is little packets (1~2) lost while keeping ping some host from the guest during live migration.
bisection result show it's possible the commit: 2255bd6 [net] bridge: fix initial packet flood if !STP introduced the regression, at least this time I tried with: [1] 257 no issue on both the two pairs of hosts [2] 259 network outage on both the two pairs of hosts [3] revert 2255bd6 from 259 kernel, then no issue on both the two pairs of hosts I'll try more times tomorrow, our network environment may also have impact on this issue.
Thanks Qixiang, can you please attach "brctl show"?
(In reply to comment #9) > Thanks Qixiang, can you please attach "brctl show"? $ brctl show bridge name bridge id STP enabled interfaces virbr0 8000.000000000000 yes xenbr0 8000.fee60976b239 no vif3.0 tap0 peth0 vif0.0 after I tried 'brctl stp xenbr0 on' with one of the host, it lost connection immediately,and can't connect to it now even after reboot, finding someone to fix it.
(In reply to comment #10) > (In reply to comment #9) > > Thanks Qixiang, can you please attach "brctl show"? > > $ brctl show > bridge name bridge id STP enabled interfaces > virbr0 8000.000000000000 yes > xenbr0 8000.fee60976b239 no vif3.0 > tap0 > peth0 > vif0.0 > > after I tried 'brctl stp xenbr0 on' with one of the host, it lost connection > immediately,and can't connect to it now even after reboot, finding someone to > fix it. spanning tree should be always off, otherwise the switch ports are in learning state and drop packers. Make sure your forwarding delay is 0 too.
I have reproduced the issue using two 2.6.18-274.el5xen hosts (hp-z800-02.lab.bos.redhat.com and hp-z600-02.lab.bos.redhat.com), ping-pong migrating a RHEL-5.6 HVM guest. The root cause could be the same as with bug 720347: the tapX interface provided by qemu-dm may be added to the bridge only after the gARP appeared on tapX. With PV drivers, we were able to make the guest wait for the host. (See bug 713585 / bug 720347.) I'm not sure how we could make the HVM guest wait. Would it be an acceptable excuse to say "use PV-on-HVM drivers instead"? Normally, one should use the emulated drivers only during installation, then install / select the PV-on-HVM drivers. After upgrading both host kernels to https://brewweb.devel.redhat.com/taskinfo?taskID=3478097 and switching the RHEL-5.6 HVM guest to a type=netfront vif, the problem went away. (The above brew RPMs are also available under sftp://shell.devel.redhat.com/home/brq/lersek/public_html/bz719294.) Thoughts? Thanks.
We'll have to work the bisection/regression angle a bit more.
On 07/29/11 17:06, Paolo Bonzini wrote: > We might try reproducing it with KVM and, if it works, find how QEMU sends > gratuitous ARPs and whether ours does it differently. If not, clone the > bug, wait for KVM smart people to fix it and backport. :)
(In reply to comment #18) > On 07/29/11 17:06, Paolo Bonzini wrote: > > > We might try reproducing it with KVM and, if it works, find how QEMU sends > > gratuitous ARPs and whether ours does it differently. It works; I tested live migration of a RHEL-6.1 KVM guest between two RHEL-6.1 hosts. (Guest network driver: 8139cp.) Pinging from my laptop was undisturbed. I tried to capture some ARP packets with tcpdump, on both hosts, listening to the br0 interface. I couldn't see the expected gratuitous ARPs! Not on em1 either.
The latter.
(In reply to comment #19) > I tried to capture some ARP packets with tcpdump, on both hosts, listening to > the br0 interface. I couldn't see the expected gratuitous ARPs! The problem was that I specified "arp" on the tcpdump command line. When the migration completes, this seems to be sent, 4-5 times: 11:22:49.601381 52:54:00:fa:14:0d > Broadcast, ethertype Unknown (0x0835), length 60: 0x0000: ffff ffff ffff 5254 00fa 140d 0835 0001 ......RT.....5.. 0x0010: 0800 0604 0003 5254 00fa 140d 0000 0000 ......RT........ 0x0020: 5254 00fa 140d 0000 0000 0000 0000 0000 RT.............. 0x0030: 0000 0000 0000 0000 0000 0000 ............ This has been fixed in bug 715141; I should have updated qemu-kvm from .160 to .167.
According to the thread under [1], even Xen-3.4.3's qemu doesn't send grat ARPs (ie. for HVM domains). I'm claiming this is not a regression (see also comment 2 & comment 15) and removing the Regression keyword accordingly. It seems that we can't backport anything from upstream Xen. Paolo suggested to look at qemu-kvm-rhel6/savevm.c and port qemu_announce_self()'s functionality. We could perhaps extend the RHEL-5 xen-userspace code: - send the gARP at the end of qemu_loadvm() [tools/ioemu/vl.c], - send the packet by way of qemu_send_packet(), - list of NICs is stored in the "nd_table" array. Changing component to "xen". We have to decide if we want to implement this new feature, or just recommend the PV drivers (see comment 16). I think it's worth a single try, but not more effort than that. Upstream seems to have moved on, so I believe it would be RHEL only. [1] http://lists.xensource.com/archives/html/xen-users/2011-03/msg00314.html
(In reply to comment #23) > We could perhaps extend the RHEL-5 xen-userspace code: > - send the gARP at the end of qemu_loadvm() [tools/ioemu/vl.c], > - send the packet by way of qemu_send_packet(), > - list of NICs is stored in the "nd_table" array. (I mean Paolo figured out all of this.)
Reproduced network outage with two RHEL5-Server-U7 hosts and a fullvirt RHEL-5.7 guest, pinged from my laptop.
attachment 517836 [details] (see brew link in comment 29) seems to do the trick for fullvirt; I ping-pong migrated the guest a few times, and ping only lost a single reply each time. Also I was running tcpdump on one of the hosts; the following was captured at the end of each migration: # tcpdump -e -vvv -X -l -n rarp 13:12:11.118500 00:16:3e:27:0f:ef > Broadcast, ethertype Reverse ARP (0x8035), length 60: rarp who-is 00:16:3e:27:0f:ef tell 00:16:3e:27:0f:ef 0x0000: 0001 0800 0604 0003 0016 3e27 0fef 0000 ..........>'.... 0x0010: 0000 0016 3e27 0fef 0000 0000 0000 0000 ....>'.......... 0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
Changing the fullvirt RHEL-5.7 guest's vif to type=netfront (PV-on-HVM), the qemu-dm command line changed too: no "-net" switches at all (while with pure HVM, there's "-net nic,vlan=1,macaddr=00:16:3e:27:0f:ef,model=rtl8139 -net tap,vlan=1,bridge=xenbr0"). Ping-pong live-migrating the PV-on-HVM guest resulted in no RARPs (checked with tcpdump), but the netfront driver took care of the grat ARPs, so ping continued to work. TODO: - check PV - check if upstream needs / can have this
(In reply to comment #32) > - check if upstream needs / can have this They won't care: (1) recent Xen relies on upstream qemu: http://blog.xen.org/index.php/2011/05/13/xen-support-upstreamed-to-qemu/ (2) upstream qemu has been equipped with the RARP notification for ages; git blaming qemu-kvm-rhel6/savevm.c:announce_self_create(), commit 18995b98 is returned: commit 18995b9808dc48897bda6ed93ce3e978191f7251 Author: Nolan <nolan> Date: Thu Oct 15 16:53:55 2009 -0700 Send a RARP packet after migration. Currently, after a migration qemu sends a broadcast packet to update switches' MAC->port mappings. Unfortunately, it picks a random (constant) ethertype and crosses its fingers that no one else is using it. This patch causes it to send a RARP packet instead. RARP was chosen for 2 reasons. One, it is always harmless, and will continue to be so even as new ethertypes are allocated. Two, it is what VMware ESX sends, so people who write filtering rules for switches already know about it. I also changed the code to send SELF_ANNOUNCE_ROUNDS packets, instead of SELF_ANNOUNCE_ROUNDS + 1, and added a simple backoff scheme. Signed-off-by: Nolan Leake <nolan <at> sigbus.net> Signed-off-by: Anthony Liguori <aliguori.com>
Testing with a RHEL-5.7 PV guest: - qemu-dm doesn't have -net switches, - ping-pong live migration + ping from the outside works, - no RARP packets (tcpdump).
Fix built into xen-3.0.3-134.el5
Created attachment 542430 [details] ping results log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0160.html