Bug 719294 - HVM guest network outage during live migration
HVM guest network outage during live migration
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.7
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Laszlo Ersek
Virtualization Bugs
:
Depends On:
Blocks: 695369
  Show dependency treegraph
 
Reported: 2011-07-06 07:10 EDT by Qixiang Wan
Modified: 2013-01-09 19:02 EST (History)
15 users (show)

See Also:
Fixed In Version: xen-3.0.3-134.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-21 00:55:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ping results log (3.94 KB, text/plain)
2011-12-08 03:40 EST, Shengnan Wang
no flags Details

  None (edit)
Description Qixiang Wan 2011-07-06 07:10:22 EDT
Description of problem:
Running a HVM guest over xen (kernel-xen >= 2.6.18-261.el5xen), there will be network outage during live migration. This issue doesn't exist with kernel-xen-2.6.18-259.el5 (test with ping, usually 2~3 packets lost with <=259, may 30~40 packets lost with >= 261). 

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-261.el5 (or may kernel-xen-2.6.18-260.el5 which removed from brew)

How reproducible:
80%

Steps to Reproduce:
1. boot up a HVM guest (I tested with RHEL5.6, should not related to specific guest)
2. live migrate the guest
  
Actual results:
There is outage during live migration

Expected results:
Should be little packets lost, for example

Additional info:
No issue with PV guest on 261 kernel.
Comment 1 Qixiang Wan 2011-07-06 07:13:02 EDT
There is little packets (1~2) lost while keeping ping some host from the guest during live migration.
Comment 8 Qixiang Wan 2011-07-06 12:29:36 EDT
bisection result show it's possible the commit:

2255bd6 [net] bridge: fix initial packet flood if !STP

introduced the regression, at least this time I tried with:

[1] 257 no issue on both the two pairs of hosts
[2] 259 network outage on both the two pairs of hosts
[3] revert 2255bd6 from 259 kernel, then no issue on both the two pairs of hosts

I'll try more times tomorrow, our network environment may also have impact on this issue.
Comment 9 Paolo Bonzini 2011-07-07 03:03:15 EDT
Thanks Qixiang, can you please attach "brctl show"?
Comment 10 Qixiang Wan 2011-07-07 03:10:31 EDT
(In reply to comment #9)
> Thanks Qixiang, can you please attach "brctl show"?

$ brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.000000000000	yes		
xenbr0		8000.fee60976b239	no		vif3.0
							tap0
							peth0
							vif0.0

after I tried 'brctl stp xenbr0 on' with one of the host, it lost connection immediately,and can't connect to it now even after reboot, finding someone to fix it.
Comment 12 Dor Laor 2011-07-07 05:38:27 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > Thanks Qixiang, can you please attach "brctl show"?
> 
> $ brctl show
> bridge name bridge id  STP enabled interfaces
> virbr0  8000.000000000000 yes  
> xenbr0  8000.fee60976b239 no  vif3.0
>        tap0
>        peth0
>        vif0.0
> 
> after I tried 'brctl stp xenbr0 on' with one of the host, it lost connection
> immediately,and can't connect to it now even after reboot, finding someone to
> fix it.

spanning tree should be always off, otherwise the switch ports are in learning state and drop packers.
Make sure your forwarding delay is 0 too.
Comment 16 Laszlo Ersek 2011-07-13 10:07:46 EDT
I have reproduced the issue using two 2.6.18-274.el5xen hosts (hp-z800-02.lab.bos.redhat.com and hp-z600-02.lab.bos.redhat.com), ping-pong migrating a RHEL-5.6 HVM guest.

The root cause could be the same as with bug 720347: the tapX interface provided by qemu-dm may be added to the bridge only after the gARP appeared on tapX.

With PV drivers, we were able to make the guest wait for the host. (See bug 713585 / bug 720347.) I'm not sure how we could make the HVM guest wait. Would it be an acceptable excuse to say "use PV-on-HVM drivers instead"? Normally, one should use the emulated drivers only during installation, then install / select the PV-on-HVM drivers.

After upgrading both host kernels to

      https://brewweb.devel.redhat.com/taskinfo?taskID=3478097

and switching the RHEL-5.6 HVM guest to a type=netfront vif, the problem went away.

(The above brew RPMs are also available under sftp://shell.devel.redhat.com/home/brq/lersek/public_html/bz719294.)

Thoughts? Thanks.
Comment 17 Laszlo Ersek 2011-07-29 10:41:23 EDT
We'll have to work the bisection/regression angle a bit more.
Comment 18 Laszlo Ersek 2011-07-29 11:48:21 EDT
On 07/29/11 17:06, Paolo Bonzini wrote:

> We might try reproducing it with KVM and, if it works, find how QEMU sends 
> gratuitous ARPs and whether ours does it differently. If not, clone the 
> bug, wait for KVM smart people to fix it and backport. :)
Comment 19 Laszlo Ersek 2011-08-08 09:06:26 EDT
(In reply to comment #18)
> On 07/29/11 17:06, Paolo Bonzini wrote:
> 
> > We might try reproducing it with KVM and, if it works, find how QEMU sends 
> > gratuitous ARPs and whether ours does it differently.

It works; I tested live migration of a RHEL-6.1 KVM guest between two RHEL-6.1 hosts. (Guest network driver: 8139cp.) Pinging from my laptop was undisturbed.

I tried to capture some ARP packets with tcpdump, on both hosts, listening to the br0 interface. I couldn't see the expected gratuitous ARPs! Not on em1 either.
Comment 21 Paolo Bonzini 2011-08-08 10:05:47 EDT
The latter.
Comment 22 Laszlo Ersek 2011-08-08 11:54:26 EDT
(In reply to comment #19)

> I tried to capture some ARP packets with tcpdump, on both hosts, listening to
> the br0 interface. I couldn't see the expected gratuitous ARPs!

The problem was that I specified "arp" on the tcpdump command line. When the migration completes, this seems to be sent, 4-5 times:

11:22:49.601381 52:54:00:fa:14:0d > Broadcast, ethertype Unknown (0x0835), length 60: 
        0x0000:  ffff ffff ffff 5254 00fa 140d 0835 0001  ......RT.....5..
        0x0010:  0800 0604 0003 5254 00fa 140d 0000 0000  ......RT........
        0x0020:  5254 00fa 140d 0000 0000 0000 0000 0000  RT..............
        0x0030:  0000 0000 0000 0000 0000 0000            ............

This has been fixed in bug 715141; I should have updated qemu-kvm from .160 to .167.
Comment 23 Laszlo Ersek 2011-08-08 12:24:29 EDT
According to the thread under [1], even Xen-3.4.3's qemu doesn't send grat ARPs (ie. for HVM domains).

I'm claiming this is not a regression (see also comment 2 & comment 15) and removing the Regression keyword accordingly.

It seems that we can't backport anything from upstream Xen. Paolo suggested to look at qemu-kvm-rhel6/savevm.c and port qemu_announce_self()'s functionality. We could perhaps extend the RHEL-5 xen-userspace code:
- send the gARP at the end of qemu_loadvm() [tools/ioemu/vl.c],
- send the packet by way of qemu_send_packet(),
- list of NICs is stored in the "nd_table" array.

Changing component to "xen".

We have to decide if we want to implement this new feature, or just recommend the PV drivers (see comment 16). I think it's worth a single try, but not more effort than that. Upstream seems to have moved on, so I believe it would be RHEL only.

[1] http://lists.xensource.com/archives/html/xen-users/2011-03/msg00314.html
Comment 24 Laszlo Ersek 2011-08-08 12:31:36 EDT
(In reply to comment #23)

> We could perhaps extend the RHEL-5 xen-userspace code:
> - send the gARP at the end of qemu_loadvm() [tools/ioemu/vl.c],
> - send the packet by way of qemu_send_packet(),
> - list of NICs is stored in the "nd_table" array.

(I mean Paolo figured out all of this.)
Comment 30 Laszlo Ersek 2011-08-12 12:56:54 EDT
Reproduced network outage with two RHEL5-Server-U7 hosts and a fullvirt RHEL-5.7 guest, pinged from my laptop.
Comment 31 Laszlo Ersek 2011-08-12 13:16:15 EDT
attachment 517836 [details] (see brew link in comment 29) seems to do the trick for fullvirt; I ping-pong migrated the guest a few times, and ping only lost a single reply each time. Also I was running tcpdump on one of the hosts; the following was captured at the end of each migration:

# tcpdump -e -vvv -X -l -n rarp

13:12:11.118500 00:16:3e:27:0f:ef > Broadcast, ethertype Reverse ARP (0x8035), length 60: rarp who-is 00:16:3e:27:0f:ef tell 00:16:3e:27:0f:ef
        0x0000:  0001 0800 0604 0003 0016 3e27 0fef 0000  ..........>'....
        0x0010:  0000 0016 3e27 0fef 0000 0000 0000 0000  ....>'..........
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
Comment 32 Laszlo Ersek 2011-08-12 13:27:11 EDT
Changing the fullvirt RHEL-5.7 guest's vif to type=netfront (PV-on-HVM), the qemu-dm command line changed too: no "-net" switches at all (while with pure HVM, there's "-net nic,vlan=1,macaddr=00:16:3e:27:0f:ef,model=rtl8139 -net tap,vlan=1,bridge=xenbr0").

Ping-pong live-migrating the PV-on-HVM guest resulted in no RARPs (checked with tcpdump), but the netfront driver took care of the grat ARPs, so ping continued to work.

TODO:
- check PV
- check if upstream needs / can have this
Comment 33 Laszlo Ersek 2011-08-12 13:44:31 EDT
(In reply to comment #32)

> - check if upstream needs / can have this

They won't care:

(1) recent Xen relies on upstream qemu:
http://blog.xen.org/index.php/2011/05/13/xen-support-upstreamed-to-qemu/

(2) upstream qemu has been equipped with the RARP notification for ages; git blaming qemu-kvm-rhel6/savevm.c:announce_self_create(), commit 18995b98 is returned:

commit 18995b9808dc48897bda6ed93ce3e978191f7251
Author: Nolan <nolan@sigbus.net>
Date:   Thu Oct 15 16:53:55 2009 -0700

    Send a RARP packet after migration.
    
    Currently, after a migration qemu sends a broadcast packet to update
    switches' MAC->port mappings.
    
    Unfortunately, it picks a random (constant) ethertype and crosses its
    fingers that no one else is using it.
    
    This patch causes it to send a RARP packet instead.  RARP was chosen for
    2 reasons.  One, it is always harmless, and will continue to be so even
    as new ethertypes are allocated.  Two, it is what VMware ESX sends, so
    people who write filtering rules for switches already know about it.
    
    I also changed the code to send SELF_ANNOUNCE_ROUNDS packets, instead of
    SELF_ANNOUNCE_ROUNDS + 1, and added a simple backoff scheme.
    
    Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Comment 34 Laszlo Ersek 2011-08-12 14:10:02 EDT
Testing with a RHEL-5.7 PV guest:
- qemu-dm doesn't have -net switches,
- ping-pong live migration + ping from the outside works,
- no RARP packets (tcpdump).
Comment 41 Miroslav Rezanina 2011-09-07 07:44:23 EDT
Fix built into xen-3.0.3-134.el5
Comment 43 Shengnan Wang 2011-12-08 03:40:17 EST
Created attachment 542430 [details]
ping results log
Comment 45 errata-xmlrpc 2012-02-21 00:55:33 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0160.html

Note You need to log in before you can comment on or make changes to this bug.