Bug 613187

Summary: xen Windows 2008 guest crashes on RHEL 5.4
Product: Red Hat Enterprise Linux 5 Reporter: Bill Braswell <bbraswel>
Component: kernel-xenAssignee: Michal Novotny <minovotn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: areis, drjones, herrold, james.brown, kzhang, leiwang, minovotn, mshao, pbonzini, tao, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:42:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test fix for xen HV none

Description Bill Braswell 2010-07-09 21:49:03 UTC
The customer is running Windows 2008 guests on a RHEL 5.4 xen system that uses Intel Xeon processors.  When running a cygwin bash shell on a Windows guest, the guest crashes with the following:

Exception: STATUS_ACCESS_VIOLATION at eip=7710FBA2
eax=00000001 ebx=00000000 ecx=00000000 edx=00000000 esi=0028C850 edi=004C5490
ebp=0028C7EC esp=0028C798 program=C:\cygwin\bin\bash.exe, pid 2456, thread main
cs=0023 ds=002B es=002B fs=0053 gs=002B ss=002B
Stack trace:
Frame     Function  Args
0028C7EC  7710FBA2  (0028C850, 0028C8C8, 00010000, 00000000)
0028C854  74CC7ED4  (004C50E8, 40000001, 00000000, 004C5490)
0028C888  76A31AE9  (0028C8C8, 40000001, 00000000, 00000001)
0028C9D8  61095772  (611588E0, 0028CA18, 0028CA14, 00010000)
0028CA28  61095BFD  (0028CA64, 00010000, 00010000, 00000000)
0028CA78  6109689A  (0028CAF0, 0028CB00, 0028CAFC, 00000000)
0028CB18  610B5178  (00DDECA0, 00000000, FFFFFFFF, FFFFFFFF)
0028CB48  00411DB3  (00DDECA0, 00000000, 004784B4, 0042F953)
0028CB78  00403868  (00000001, 00DD84C8, 004784B4, 72FB1748)
0028CD38  00403377  (0028D008, 00000002, 611A0CCE, 61006DDA)
0028CD78  61006DDA  (00000000, 0028CDB0, 610066E0, 7EFDE000) 

This is discussed in http://www.mail-archive.com/kvm@vger.kernel.org/msg09173.html, http://sourceware.org/ml/cygwin/2008-01/msg00582.html, http://old.nabble.com/Xen-3.2.1---Win-2003-2008-Server-64-bit-guests:-cygwin-bash-builtin-%22test%22-crashes-td19001336.html and http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00164.html

There is a problem in the processing of the “mov    %gs...” instruction.  On the AMD processor the gs register is being restored somehow but not on the Intel processor.  Even the technical commenter's do not sound like they know what restores the register on the AMD.

It appears that it still crashes in 3.4.0, but something in 3.4.1 did fix the problem.  However, it looks like it is one of those, “something fixed it but we're not sure what” issues.

However, I have been told we have no plans of moving to xen 3.4.1.

Steps to reproduce, from customer
Build 64bit windows 2008 server on a xen vm.
Install cygwin with bash shell.
Run the following command in a bash shell:
test -a test

Comment 1 Michal Novotny 2010-07-12 09:53:36 UTC
Well, I'm investigating this now and I found something relevant in KVM kernel code at [1]. Unfortunately it appears that vmx_get_msr() function is KVM only and not available in Xen. According to the path in the patch file (kernel/x86/vmx.c) it's most probably done in the KVM kernel module itself and therefore it may be a kernel-xen bug but I'm not 100% sure about this one. It still may be a problem in VMX Assist in Xen user-space which is being available for Intel CPUs but most probably I think this is the bug in the hypervisor which belongs to kernel-xen component.

Bill, you wrote that it still crashes with 3.4.0 but it's been fixed in the 3.4.1. Was the same version of xen kernel used or not? What version was used with xen-3.4.0 and what version with xen-3.4.1 ? This information could help a lot to determine the component.

Thanks,
Michal

[1] http://patchwork.kernel.org/patch/7092/

Comment 2 Michal Novotny 2010-07-12 11:07:59 UTC
Created attachment 431139 [details]
Test fix for xen HV

(In reply to comment #1)
> Well, I'm investigating this now and I found something relevant in KVM kernel
> code at [1]. Unfortunately it appears that vmx_get_msr() function is KVM only
> and not available in Xen. According to the path in the patch file
> (kernel/x86/vmx.c) it's most probably done in the KVM kernel module itself and
> therefore it may be a kernel-xen bug but I'm not 100% sure about this one. It
> still may be a problem in VMX Assist in Xen user-space which is being available
> for Intel CPUs but most probably I think this is the bug in the hypervisor
> which belongs to kernel-xen component.
> 
> Bill, you wrote that it still crashes with 3.4.0 but it's been fixed in the
> 3.4.1. Was the same version of xen kernel used or not? What version was used
> with xen-3.4.0 and what version with xen-3.4.1 ? This information could help a
> lot to determine the component.
> 
> Thanks,
> Michal
> 
> [1] http://patchwork.kernel.org/patch/7092/    

Well, I found some relevant information about this one. This has been fixed in the xen-unstable.hg c/s 19953 (vmx: Fix handling of FS/GS base MSRs) available at [1]. Since this is in the xen/arch/x86/hvm/vmx/vmx.c file it's the hypervisor related therefore the component is kernel-xen.

Also, I've tried code as described on [2] to make it fail on Linux 64-bit guest but it didn't crash:

#include <setjmp.h>

jmp_buf env; main() { if(setjmp(env)) return; longjmp(env, 1); }

This didn't crash the application when compiled with gcc. I tried also with Windows 2003 x64 but it didn't crash at well. I'm currently downloading and installing Windows 2008 to test the patch.

Michal

[1] http://xenbits.xensource.com/xen-unstable.hg?rev/fe4c6845a9d7
[2] http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00164.html

Comment 4 Michal Novotny 2010-07-12 13:10:56 UTC
Bill,
I did try installing Windows 2008 x64 edition with following package versions and I was unable to reproduce it:

kernel-xen-2.6.18-194.3.1.el5
xen-3.0.3-113.el5(virttest30.g9810091)

According to comment #2 I guess this was fixed in the kernel-xen component already since I was unable to reproduce.

Just a note: Windows 2008 is *not* Windows 2008R2 - Windows 2008R2 is a successor of Windows 2008 (R1). Could you please guide customers to reproduce using the package version as described above?

You can try with the following packages:
 kernel-xen - http://people.redhat.com/jwilson/el5/194.el5/
 xen - http://people.redhat.com/mrezanin/xen/

Could you please guide customers to those links (they are available to public so they should have no problem to access them) and download appropriate versions for their architecture and try again?

Thanks,
Michal

Comment 5 Michal Novotny 2010-07-12 13:44:02 UTC
Oh, sorry, my bad. The problem was with the Windows permissions on the C:\cygwin folder (even for Administrator user) so that's why I was unable to reproduce it, after moving to some read-write location (e.g. C:\Documents and Settings\Administrator\Data) I was able to reproduce it.

Also, I did try it using my patch applied and it was working fine.

Test command: "test -e / ; echo hi"
Before my patch applied: bash shell just exited
After my patch applied: bash shell echoed "hi" and continued

So according to my testing I guess this is the plausible fix.

Michal

Comment 6 Michal Novotny 2010-07-12 17:20:17 UTC
Bill, could you please guide customers to [1] I've created and put on people page right now? It's working fine for me so I'd like customers to test this kernel/hypervisor version. According to the IT they're using x86_64 architecture so there's RPM for their architecture and just kernel-xen should be enough.

Could you please provide me test results from their testing?

Thanks,
Michal

[1] http://people.redhat.com/minovotn/kernel-xen/

Comment 7 Michal Novotny 2010-07-20 13:04:56 UTC
Bill,
any updates on this ?

Thanks,
Michal

Comment 13 RHEL Program Management 2010-08-27 18:29:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Jarod Wilson 2010-09-10 21:40:21 UTC
in kernel-2.6.18-219.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 20 Lei Wang 2010-12-23 10:05:35 UTC
Test with:
host: x86_64 (Intel(R) Xeon(R) CPU           W3520  @ 2.67GHz)
xen-3.0.3-120.el5
guest: Win2008-64

Test steps:
1. install cygwin with shells on Win2008-64 guest
2. run  the following command in the bash shell:
   test -e / ; echo hi

reproduced the bug with kernel-xen-2.6.18-215.el5:
no guest crash but bash shell just exited at step2.
same as described in comment 5.

verified the bug with kernel-xen-2.6.18-238.el5:
bash shell echoed "hi" and continued at step2.

According to the test results above, move to VERIFIED.

Comment 22 errata-xmlrpc 2011-01-13 21:42:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html