Bug 723854

Summary: windows i386 guest BSOD/hang after multiple migration on x86_64 host with AMD CPU
Product: Red Hat Enterprise Linux 5 Reporter: Pengzhen Cao <pcao>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.7CC: drjones, leiwang, lersek, pbonzini, qwan, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-30 15:09:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 514489    
Attachments:
Description Flags
win2003 32bit BSOD after migration screenshot
none
xm-dmesg
none
xend.log
none
qemu-dm.log none

Description Pengzhen Cao 2011-07-21 12:02:56 UTC
Description of problem:
windows i386 hvm guest will BSOD(win2003 32bit) or hang(winXP/win2003 32bit) after several local or live migration

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-274.el5, xen-3.0.3-132.el5

How reproducible:
30%

Steps to Reproduce:
1. Install kernel-xen/xen into 1~2 hosts with AMD CPU
2. create windows i386 32bit guest(e.g win2003 32bit, winxp 32bit)
3. do local or live migration for several times
  
Actual results:
Windows guest  will BSOD(win2k3) or hang(win2k3, winxp) after about 3 times migration

Expected results:
Guest running fine

Additional info:
1. This issue only happens with AMD CPU, the host we we tried are having Opteron 1220 or Athlon 5200 CPU. It is working fine with  Intel hosts.
2. Tried with kernel 272 and 273, guest will hang, but there is no BSOD(both winxp and win2003). only with 274 kernel, it will BSOD with win2k3 32bit guest.
3. Tried with older kernel 268 and 238, there is much less chance of guest hang (10% or so), and there is also no BSOD.

Comment 1 Pengzhen Cao 2011-07-21 12:09:16 UTC
Created attachment 514193 [details]
win2003 32bit BSOD after migration screenshot

Comment 2 Pengzhen Cao 2011-07-21 12:10:16 UTC
Created attachment 514194 [details]
xm-dmesg

Comment 3 Pengzhen Cao 2011-07-21 12:10:39 UTC
Created attachment 514195 [details]
xend.log

Comment 4 Pengzhen Cao 2011-07-21 12:11:01 UTC
Created attachment 514196 [details]
qemu-dm.log

Comment 5 Andrew Jones 2011-07-21 12:26:28 UTC
This is possibly a dup of bug 605617. We should test the patch posted for that bug on these systems.

Comment 6 Pengzhen Cao 2011-07-22 02:10:17 UTC
(In reply to comment #5)
> This is possibly a dup of bug 605617. We should test the patch posted for that
> bug on these systems.

Maybe, but not exactly same error. It seems higher chance of failure after 272 kernel fix, and there is more chance of BSOD on 274 kernel than 272/273 and any other earlier one. This is why I file this bug.
I can test if you build a kernel with that fix.

Comment 7 RHEL Program Management 2011-08-04 04:20:28 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Paolo Bonzini 2011-08-30 14:23:11 UTC
Stop 0xB8 was also observed by Andrew (bug 605617 comment 17), so closing as duplicate.

Comment 12 Paolo Bonzini 2011-08-30 15:09:18 UTC

*** This bug has been marked as a duplicate of bug 605617 ***