Bug 555910 - xen migration fails when a full virt guest uses the xen-vnif driver
Summary: xen migration fails when a full virt guest uses the xen-vnif driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Miroslav Rezanina
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 560577 (view as bug list)
Depends On:
Blocks: 514489 5.5_Known-Issues 573926 593725
TreeView+ depends on / blocked
 
Reported: 2010-01-15 21:44 UTC by Bill Braswell
Modified: 2018-11-14 17:51 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Migrating a guest that is using the xen-vnif drivers as a fully virtualized guest under Xen will produce a deadlock in the XenBus. This bug, however, does not present if the IOEMU driver is used or if the system has no active network interface. (BZ#555910)
Clone Of:
: 573926 (view as bug list)
Environment:
Last Closed: 2011-01-13 21:00:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
guest config used when reproducing (620 bytes, application/octet-stream)
2010-03-05 16:49 UTC, Andrew Jones
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Bill Braswell 2010-01-15 21:44:43 UTC
Description of problem:
When a guest is using the xen-vnif drivers as a full-virt guest under xen.  The guest will completely freeze when you migrate the guest.  The guest will migrate but when it loads on the new dom0 it will be frozen and will not respond on ether the network or on the console/virt-manager.  The new dom0 shows the guest using 100% of the allocated cpu at this time.

Version-Release number of selected component (if applicable):
rhel5u4 x86_64 dom0s
rhel4u8 and rhel5u4 x86_64 guests

Comment 1 Michal Novotny 2010-01-18 11:28:22 UTC
(In reply to comment #0)
> Description of problem:
> When a guest is using the xen-vnif drivers as a full-virt guest under xen.  The
> guest will completely freeze when you migrate the guest.  The guest will
> migrate but when it loads on the new dom0 it will be frozen and will not
> respond on ether the network or on the console/virt-manager.  The new dom0
> shows the guest using 100% of the allocated cpu at this time.
> 
> Version-Release number of selected component (if applicable):
> rhel5u4 x86_64 dom0s
> rhel4u8 and rhel5u4 x86_64 guests    

Well, I tried to install RHEL 5.4 guest and enable serial console. I was able to connect to console after migration but network was not available. Xen-vnif device returned:

netfront: Initialising virtual ethernet driver.
netfront: device eth1 has copying receive path.
netfront: device eth1 has copying receive path.

to dmesg log in the guest but when I did testing with PV drivers without console enabled, it was not working at all. It was stuck in virt-manager/virt-viewer and no console was available (since no console=ttyS0 setting were present in /boot/grub/grub.conf). When I enabled console for guest I was able to see graphics windows in virt-manager and enter login/password but when logging in I was unable to see anything in graphics window... When trying to connect to console, everything was working fine, except the network which was dead...

Michal

Comment 5 Miroslav Rezanina 2010-02-03 12:50:16 UTC
*** Bug 560577 has been marked as a duplicate of this bug. ***

Comment 18 Andrew Jones 2010-03-05 16:46:35 UTC
I can reproduce this on my own machine.  Here's what I did and I'll attach my config file.

* I installed a basic 5.4 x86_64 full virt guest.
* I booted it up and then installed the -187 kernel.
* I modified the config file to use vnif.
* booted
* xm save
* xm restore
* saw the guest was hung and xm top showed cpu% at 100.

It didn't reproduce my first try with 1 vcpu and 512Mb ram. I changed the config to 2 vcpus and 1024Mb ram and it reproduced. However, mrezanin tells me it reproduces with 1 vcpu and 512Mb ram as well, it just doesn't reproduce everytime.  I'll do some more experiments and extract a core dump to look at on Monday.

Comment 19 Andrew Jones 2010-03-05 16:49:37 UTC
Created attachment 398078 [details]
guest config used when reproducing

Comment 20 Michal Novotny 2010-03-05 16:56:08 UTC
(In reply to comment #18)
> I can reproduce this on my own machine.  Here's what I did and I'll attach my
> config file.
> 
> * I installed a basic 5.4 x86_64 full virt guest.
> * I booted it up and then installed the -187 kernel.
> * I modified the config file to use vnif.
> * booted
> * xm save
> * xm restore
> * saw the guest was hung and xm top showed cpu% at 100.
> 
> It didn't reproduce my first try with 1 vcpu and 512Mb ram. I changed the
> config to 2 vcpus and 1024Mb ram and it reproduced. However, mrezanin tells me
> it reproduces with 1 vcpu and 512Mb ram as well, it just doesn't reproduce
> everytime.  I'll do some more experiments and extract a core dump to look at on
> Monday.    

That's right according to my testing. In fact it's not 100% reproducible but only 98% or something like that. Problem is that even with the same kernel-xen, guest kernel and xen it can work for the first attempt but not only for second one.

Michal

Comment 21 Andrew Jones 2010-03-08 18:19:57 UTC
My current theory is that there is a deadlock involving the xen_suspend kernel thread and the xenbus_thread kernel thread. The stack from the core dump is below, but I also tried some other experiments, such as recreating with a 4.8 non-smp guest.

The stack was similar with the 4.8 non-smp kernel, namely we get hung in read_reply() on suspend, then when we resume we're still hung. I tested that it was the suspend path by repeatedly restoring the same save-file that worked (for me save-restore works about 75%). The good save-file always restored fine.

There was a commit (c5cae66) to upstream linux related to a hang on suspend and read_reply() a few months ago. Unfortunately we don't share much of the affected code, so it only serves as a reference.

PID: 2008   TASK: ffff81001e2c4820  CPU: 0   COMMAND: "suspend"
              START: thread_return (schedule) at ffffffff80063f96
  [ffff810013807ca8] thread_return at ffffffff80063ff8
  [ffff810013807d78] read_reply at ffffffff8815e1fa
  [ffff810013807d88] __bitmap_weight at ffffffff80015e83
  [ffff810013807d90] autoremove_wake_function at ffffffff800a19e6
  [ffff810013807dc0] ap_suspend at ffffffff8815d83c
  [ffff810013807dc8] __smp_call_function_many at ffffffff800775eb
  [ffff810013807e68] _spin_unlock_irqrestore at ffffffff80065b50
  [ffff810013807e98] __xen_suspend at ffffffff8815d96a
  [ffff810013807ea8] keventd_create_kthread at ffffffff800a17ce
  [ffff810013807ed0] xen_suspend at ffffffff8815d5f6
  [ffff810013807ed8] xen_suspend at ffffffff8815d605
  [ffff810013807ee0] xen_suspend at ffffffff8815d5f6
  [ffff810013807ee8] kthread at ffffffff80032bdc
  [ffff810013807f48] child_rip at ffffffff8005efb1
  [ffff810013807f58] keventd_create_kthread at ffffffff800a17ce
  [ffff810013807fc8] kthread at ffffffff80032ade
  [ffff810013807fd8] child_rip at ffffffff8005efa7

Comment 22 Michal Novotny 2010-03-09 09:20:47 UTC
(In reply to comment #21)
> My current theory is that there is a deadlock involving the xen_suspend kernel
> thread and the xenbus_thread kernel thread. The stack from the core dump is
> below, but I also tried some other experiments, such as recreating with a 4.8
> non-smp guest.
> 
> The stack was similar with the 4.8 non-smp kernel, namely we get hung in
> read_reply() on suspend, then when we resume we're still hung. I tested that it
> was the suspend path by repeatedly restoring the same save-file that worked
> (for me save-restore works about 75%). The good save-file always restored fine.
> 
> There was a commit (c5cae66) to upstream linux related to a hang on suspend and
> read_reply() a few months ago. Unfortunately we don't share much of the
> affected code, so it only serves as a reference.
> 
> PID: 2008   TASK: ffff81001e2c4820  CPU: 0   COMMAND: "suspend"
>               START: thread_return (schedule) at ffffffff80063f96
>   [ffff810013807ca8] thread_return at ffffffff80063ff8
>   [ffff810013807d78] read_reply at ffffffff8815e1fa
>   [ffff810013807d88] __bitmap_weight at ffffffff80015e83
>   [ffff810013807d90] autoremove_wake_function at ffffffff800a19e6
>   [ffff810013807dc0] ap_suspend at ffffffff8815d83c
>   [ffff810013807dc8] __smp_call_function_many at ffffffff800775eb
>   [ffff810013807e68] _spin_unlock_irqrestore at ffffffff80065b50
>   [ffff810013807e98] __xen_suspend at ffffffff8815d96a
>   [ffff810013807ea8] keventd_create_kthread at ffffffff800a17ce
>   [ffff810013807ed0] xen_suspend at ffffffff8815d5f6
>   [ffff810013807ed8] xen_suspend at ffffffff8815d605
>   [ffff810013807ee0] xen_suspend at ffffffff8815d5f6
>   [ffff810013807ee8] kthread at ffffffff80032bdc
>   [ffff810013807f48] child_rip at ffffffff8005efb1
>   [ffff810013807f58] keventd_create_kthread at ffffffff800a17ce
>   [ffff810013807fc8] kthread at ffffffff80032ade
>   [ffff810013807fd8] child_rip at ffffffff8005efa7    

Well, as I am looking into the comment you posted this may be right. Although our codebase may be really different isn't that a good point to start since we know that some read_reply() call should be hang which could result into this problem? Also, why does it happen only sometimes? Any ideas?

Michal

Comment 23 Andrew Jones 2010-03-09 10:04:48 UTC
(In reply to comment #22)
> our codebase may be really different isn't that a good point to start since we
> know that some read_reply() call should be hang which could result into this
> problem? Also, why does it happen only sometimes? Any ideas?

Please don't quote entire BZ comments, it clutters the bug. Yes, I think the read_reply() suspend hang similarities with upstream is a good place to start, which is why I made that comment. If this theory is correct, then it only happens sometimes because xenbus needs to be doing something at the same time that the suspend is started.

Comment 24 Michal Novotny 2010-03-09 10:32:03 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > our codebase may be really different isn't that a good point to start since we
> > know that some read_reply() call should be hang which could result into this
> > problem? Also, why does it happen only sometimes? Any ideas?
> 
> Please don't quote entire BZ comments, it clutters the bug. Yes, I think the
> read_reply() suspend hang similarities with upstream is a good place to start,
> which is why I made that comment. If this theory is correct, then it only
> happens sometimes because xenbus needs to be doing something at the same time
> that the suspend is started.    

Ok, good. So we have a reasonable explanation for why it does happen only sometimes and we have a good point to start investigation for the fix. That's perfect.

Just out of curiosity: Is it connected to netfront driver? By xenbus you mean xenbus code for netfront driver?

Thanks,
Michal

Comment 25 Andrew Jones 2010-03-09 14:43:27 UTC
(In reply to comment #24)
> Just out of curiosity: Is it connected to netfront driver? By xenbus you mean
> xenbus code for netfront driver?

I'm not sure if it's strictly connected to netfront. All xen split drivers use xenbus. However, I didn't get it to reproduce with blkfront when I tried a few times, so either it's netfront related, or netfront just does a better job at triggering it.

Comment 26 Michal Novotny 2010-03-09 14:49:21 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > Just out of curiosity: Is it connected to netfront driver? By xenbus you mean
> > xenbus code for netfront driver?
> 
> I'm not sure if it's strictly connected to netfront. All xen split drivers use
> xenbus. However, I didn't get it to reproduce with blkfront when I tried a few
> times, so either it's netfront related, or netfront just does a better job at
> triggering it.    

Ok, maybe it's somewhere in netfront code if you can't reproduce it with blkfront. Or is it reproducible with blkfront in at least one case of, let's say, 100 tries or is it not reproducible at all?

Michal

Comment 27 Miroslav Rezanina 2010-03-10 13:48:01 UTC
(In reply to comment #21)
> There was a commit (c5cae66) to upstream linux related to a hang on suspend and
> read_reply() a few months ago. Unfortunately we don't share much of the
> affected code, so it only serves as a reference.

Checking mentioned commit I found out that upstream call suspend after freeze_processes in case of preempting. As our code call xenbus_suspend before preempt_disable. After moving call after preempt_disable, I'm not able to reproduce problem.

I'm going to do more migration related testing to prove this fix solve the problem.

Patch is:
-------
diff --git a/drivers/xen/core/reboot.c b/drivers/xen/core/reboot.c
index 929c590..0297982 100644
--- a/drivers/xen/core/reboot.c
+++ b/drivers/xen/core/reboot.c
@@ -183,10 +183,11 @@ static int __do_suspend(void *ignore)
        if (err)
                return err;
 
-       xenbus_suspend();
 
        preempt_disable();
 
+       xenbus_suspend();
+
        mm_pin_all();
        local_irq_disable();
        preempt_enable();

Comment 32 Miroslav Rezanina 2010-03-16 09:37:15 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
Wrong order in suspend procedure causes deadlock in xenbus if xen-vnif driver is used. This deadlock is carried over suspend state and guest is blocked after resuming. 

This does not occurs with ioemu driver or not active network interface - there's no xenbus usage during suspending.

Comment 33 Miroslav Rezanina 2010-03-18 06:10:20 UTC
I did test of save/restore with new patch - 300 save/restore cycles without freeze.

Comment 35 Ryan Lerch 2010-03-21 22:39:42 UTC
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,3 +1 @@
-Wrong order in suspend procedure causes deadlock in xenbus if xen-vnif driver is used. This deadlock is carried over suspend state and guest is blocked after resuming. 
+Migrating a guest that is using the xen-vnif drivers as a fully virtualized guest under Xen will produce a deadlock in the XenBus. This bug, however, does not present if the IOEMU driver is used or if the system has no active network interface. (BZ#555910)-
-This does not occurs with ioemu driver or not active network interface - there's no xenbus usage during suspending.

Comment 41 Miroslav Rezanina 2010-04-07 07:09:58 UTC
Additional testing shows that patch fixes problem only for vcpus > 1 guests. When vcpus = 1 problem still occurs.

Comment 42 Miroslav Rezanina 2010-04-07 07:30:13 UTC
Additional testing shows that patch fixes problem only for vcpus > 1 guests. When vcpus = 1 problem still occurs.

Comment 50 Jarod Wilson 2010-08-11 00:11:52 UTC
in kernel-2.6.18-211.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 51 Miroslav Rezanina 2010-09-07 10:20:33 UTC
Hi Jarod,
we have complete fix for this problem in bz #629773. Will you keep this patch as it is no more needed? Or can be this handled as dup of #629773?

Comment 52 Paolo Bonzini 2010-09-07 12:00:37 UTC
The question is, does this bug still exist (and is made latent by 629773)?  Does upstream has this patch?

This patch still fixes the issue whenever the hypervisor cannot be upgraded, so I think it should go in.

Comment 53 Andrew Jones 2010-09-07 12:15:31 UTC
Paolo, I'm not sure we understand how this kernel patch helps (maybe just the latency), or if there could be unwanted side-effects from it. A good analysis on how it can't hurt us, plus some testing results on an HV with the new patch could convince me to keep it though, because as you say, it would be nice to improve things for guests not running on 5.6 HVs.

Comment 54 Jarod Wilson 2010-09-07 21:29:26 UTC
(In reply to comment #51)
> Hi Jarod,
> we have complete fix for this problem in bz #629773. Will you keep this patch
> as it is no more needed? Or can be this handled as dup of #629773?

I'll do whatever you guys tell me is best. :)

If the conclusion is that this does need to be reverted, then please send a revert patch to the mailing list for tracking purposes.

Comment 57 Binbin Yu 2010-12-22 06:17:14 UTC
Test steps:
1. create rhel5u5 HVM guest guest1 with 1 vcpus, 1G memory, and vif = [ "mac=00:16:36:6d:95:b1,bridge=xenbr0,script=vif-bridge,type=netfront", "mac=00:16:36:6d:95:b2,bridge=xenbr0,script=vif-bridge,type=netfront" ] 
2. prepare xend and nfs to migrate
3. xm migrate guest1 $dst-host-ip
4. see guest1 by virt-manager, vncviewer or console
5. do "ping www.redhat.com" inside guest1
6. do "ping $guest1-eth1" on source host and destinate host

host:
rhel5.5, x86_64
xen-3.0.3-120.el5
kernel-xen-2.6.18-231.el5

Reproduce: this bug was reported from rhel5u4 host with rhel4u8 or rhel5u4 guest, but I am verifying it on rhel5u5 host with rhel5u5 guest as rhel5.6 errata bugs. No reproduce this bug on rhel5.5-64bit host, but the rhel4u8-64-hvm guest reboot every time it loads on new dom0 during migration.
rhel4u8-64-hvm guest: kernel-2.6.9-89.EL

such as migrating it back to src host
[src-host xen]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     6892     4 r-----  26649.8
rhel6-32-hvm-2                             9     1031     1 -b----     25.6
[src-host xen]# xm migrate 9 10.66.65.81
[src-host xen]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     6892     4 r-----  26659.5
[src-host xen]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     6892     4 r-----  26692.7
rhel6-32-hvm-2                            11     1031     1 -b----     14.7



Verified with:
guest: rhel5.5 hvm x86_64 (kernel-2.6.18-194.el5)
At step4: login guest1 successfully by virtmanager,vncviewer and console 
At step5: ping always gets reply correctly inside guest1
At step5: ping always gets reply correctly on both hosts 


According to the result above, set bug status to VERIFIED.

Comment 58 Binbin Yu 2010-12-24 08:18:02 UTC
Also verified  with kernel-xen-2.6.18-238.el5

Comment 60 errata-xmlrpc 2011-01-13 21:00:40 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.