Bug 788349 - live migration outage time increasing as the number of vifs increases
Summary: live migration outage time increasing as the number of vifs increases
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-08 05:46 UTC by Shengnan Wang
Modified: 2012-02-10 06:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-10 06:37:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Shengnan Wang 2012-02-08 05:46:02 UTC
Description of problem:
Create a pv guest with several vifs. There will be network outage during live migration. Some packets lost checking by ping. The number of the lost packets is about as much as the number of the vifs. According to the test result, there were 3 packets lost when ping one of the guest ips during the time that live migrate the 3 vifs guest; 8 packets lost when live migrate the 8 vifs guest; and 31~32 packets lost when live migrate the 31 vifs guest.

Version-Release number of selected component (if applicable):
host: 2.6.18-308.el5xen
guest: 2.6.18-308.el5xen
xen: xen-3.0.3-135.el5

How reproducible:
Always

Steps to Reproduce:
1.Create a PV guest with multiple vifs, for example 31.
2.Ping one of the guest ips.
3.Live migrate the guest.
4.After live migration, check the ping result.

Actual results:
There will be network outage during live migration and the number of the lost packets is about as much as the number of the vifs.

Expected results:
There is little packets (1~2) lost during live migration.

Additional info:
1. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-135. The problem occurs in both i386 and x86_64 pv guest(5.8-20120202.0).
2. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-135. The problem occurs in both 5.7 and 6.2 pv guest.(Test x86_64 pv guest only.)
3. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-134. The problem occurs.
4. It works well on 2.6.18-308.el5(5.8-20120202.0) with xen-133.

Comment 6 Paolo Bonzini 2012-02-09 13:15:08 UTC
I do not understand if it's 3 seconds, or it's 1 second but the ping is touching all the vifs (so you have 1 packet lost per vif).

Also, it's quite possible that the behavior was already there before we introduced the RFE at bug 609589, and was fixed as a side-effect of this.  In that case, calling it a regression would not be the whole story...

Comment 7 Qixiang Wan 2012-02-09 13:22:48 UTC
(In reply to comment #6)
> I do not understand if it's 3 seconds, or it's 1 second but the ping is
> touching all the vifs (so you have 1 packet lost per vif).
> 

3 packets in about 3 seconds, or you can say about 1 packet per second. So not only increased number of lost packets, but also the outage time.

Comment 8 Miroslav Rezanina 2012-02-10 06:37:07 UTC
(In reply to comment #6)
> I do not understand if it's 3 seconds, or it's 1 second but the ping is
> touching all the vifs (so you have 1 packet lost per vif).
> 
> Also, it's quite possible that the behavior was already there before we
> introduced the RFE at bug 609589, and was fixed as a side-effect of this.  In
> that case, calling it a regression would not be the whole story...

Yes, before RFE, source domain was destroyed before starting target one -> that mean even longer waiting time than after this RFE and follow up release devices waiting. 

As such and for risk involved with changing migration procedure this behavior won't be changed. As stated in #2 practical appearance of this problem is low probability.


Note You need to log in before you can comment on or make changes to this bug.