Hide Forgot
Description of problem: Create a pv guest with several vifs. There will be network outage during live migration. Some packets lost checking by ping. The number of the lost packets is about as much as the number of the vifs. According to the test result, there were 3 packets lost when ping one of the guest ips during the time that live migrate the 3 vifs guest; 8 packets lost when live migrate the 8 vifs guest; and 31~32 packets lost when live migrate the 31 vifs guest. Version-Release number of selected component (if applicable): host: 2.6.18-308.el5xen guest: 2.6.18-308.el5xen xen: xen-3.0.3-135.el5 How reproducible: Always Steps to Reproduce: 1.Create a PV guest with multiple vifs, for example 31. 2.Ping one of the guest ips. 3.Live migrate the guest. 4.After live migration, check the ping result. Actual results: There will be network outage during live migration and the number of the lost packets is about as much as the number of the vifs. Expected results: There is little packets (1~2) lost during live migration. Additional info: 1. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-135. The problem occurs in both i386 and x86_64 pv guest(5.8-20120202.0). 2. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-135. The problem occurs in both 5.7 and 6.2 pv guest.(Test x86_64 pv guest only.) 3. Test it on 2.6.18-308.el5(5.8-20120202.0) with xen-134. The problem occurs. 4. It works well on 2.6.18-308.el5(5.8-20120202.0) with xen-133.
I do not understand if it's 3 seconds, or it's 1 second but the ping is touching all the vifs (so you have 1 packet lost per vif). Also, it's quite possible that the behavior was already there before we introduced the RFE at bug 609589, and was fixed as a side-effect of this. In that case, calling it a regression would not be the whole story...
(In reply to comment #6) > I do not understand if it's 3 seconds, or it's 1 second but the ping is > touching all the vifs (so you have 1 packet lost per vif). > 3 packets in about 3 seconds, or you can say about 1 packet per second. So not only increased number of lost packets, but also the outage time.
(In reply to comment #6) > I do not understand if it's 3 seconds, or it's 1 second but the ping is > touching all the vifs (so you have 1 packet lost per vif). > > Also, it's quite possible that the behavior was already there before we > introduced the RFE at bug 609589, and was fixed as a side-effect of this. In > that case, calling it a regression would not be the whole story... Yes, before RFE, source domain was destroyed before starting target one -> that mean even longer waiting time than after this RFE and follow up release devices waiting. As such and for risk involved with changing migration procedure this behavior won't be changed. As stated in #2 practical appearance of this problem is low probability.