Bug 1879149

Summary: Fail to boot Win10 32bit guest(BSOD) on EPYC host
Product: Red Hat Enterprise Linux 7 Reporter: FuXiangChun <xfu>
Component: qemu-kvm-rhevAssignee: Marek Kedzierski <mkedzier>
Status: CLOSED CURRENTRELEASE QA Contact: liunana <nanliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: ailan, chayang, dgilbert, jinzhao, juzhang, lijin, mkedzier, qzhang, virt-maint, yfu, yuhuang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-25 08:15:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot
none
analysis of Windows kernel crashdump none

Description FuXiangChun 2020-09-15 14:28:22 UTC
Description of problem:
Fail to boot Win10 32bit guest on EPYC host.

Version-Release number of selected component (if applicable):
qemu-kvm: qemu-kvm-rhev-2.10.0-21.el7_5.10.x86_64
kernel: kernel-3.10.0-862.14.4.el7.x86_64
spice: spice-server-0.14.0-2.el7_5.5.x86_64
seabios: seabios-bin-1.11.0-2.el7.noarch
seavgabios: seavgabios-bin-1.11.0-2.el7.noarch
sgabios: sgabios-bin-0.20110622svn-4.el7.noarch
ipxe: ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
virtio-win: virtio-win-1.9.12-4.el7.iso

How reproducible:
100%

Steps to Reproduce:
1./usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc  \
-nodefaults \
-device qxl-vga,bus=pci.0,addr=0x2 \
-device pci-bridge,id=pci_bridge,bus=pci.0,addr=0x4,chassis_nr=1 \
-m 30720  \
-smp 32,maxcpus=32,cores=16,threads=1,sockets=2  \
-cpu 'EPYC',+kvm_pv_unhalt \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/home/win10-32-virtio.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,serial=SYSTEM_DISK0,bus=pci.0,addr=0x8 \
-rtc base=localtime,clock=host  \
-boot menu=off,strict=off,order=cdn,once=d  \
-no-hpet \
-enable-kvm \
-vnc :1 \
-monitor stdio \
2.
3.

Actual results:
Hit BSOD

Expected results:
VM work well

Additional info:

Comment 2 FuXiangChun 2020-09-15 14:38:36 UTC
Created attachment 1714947 [details]
screenshot

Comment 3 FuXiangChun 2020-09-15 14:39:18 UTC
It works if use Opteron_G5 to boot guest.

Comment 12 Marek Kedzierski 2020-09-24 12:35:05 UTC
Created attachment 1716330 [details]
analysis of Windows kernel crashdump

Comment 13 Marek Kedzierski 2020-09-24 12:40:49 UTC
This problem is not related to virtio-win drivers.

According to KVM traces:
msr_read c001102c = 0x (#GP)

which is consistent with Windows kernel analysis:

....
STACK_TEXT:  
8a403a38 83c505af 00000011 00000000 83c5035c hal!HalpErrataApplyPerProcessor+0x2112
8a403a4c 83c4ffb2 00000011 00000000 80d6c150 hal!HalpErrataInitSystem+0x4f
8a403a70 83c4ff78 80d6c150 83c4ff4d 8a403c2c hal!HalpInitSystemHelper+0x2c
8a403a78 83c4ff4d 8a403c2c 83acd216 00000001 hal!HalpInitSystemPhase1+0x18
8a403a80 83acd216 00000001 80d6c150 00000000 hal!HalInitSystem+0x1d
8a403c2c 83877f7f 00000000 8a403c70 8343a166 nt!Phase1InitializationDiscard+0x15e
8a403c38 8343a166 80d6c150 8f048f88 00000000 nt!Phase1Initialization+0x21
8a403c70 8358d9bd 83877f5e 80d6c150 00000000 nt!PspSystemThreadStartup+0x4a
8a403c7c 00000000 00000000 38573847 38793866 nt!KiThreadStartup+0x15
...

EXCEPTION_RECORD:  8a4038d8 -- (.exr 0xffffffff8a4038d8)
ExceptionAddress: 83c51f4a (hal!HalpErrataApplyPerProcessor+0x00002112)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 0
 
CONTEXT:  8a403480 -- (.cxr 0xffffffff8a403480)
eax=8a403a02 ebx=00000117 ecx=c001102c edx=8a403a37 esi=00000000 edi=00000004
eip=83c51f4a esp=8a403a30 ebp=8a403a38 iopl=0         nv up ei ng nz ac pe cy
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210297
hal!HalpErrataApplyPerProcessor+0x2112:
83c51f4a 0f32            rdmsr
....

So guest tries to read rdmsr (ecx=c001102c) resulting in fault.

Similar task was already reported (and provides a fix):

https://bugzilla.redhat.com/show_bug.cgi?id=1593190

Comment 14 Marek Kedzierski 2020-09-24 13:19:53 UTC
(In reply to Marek Kedzierski from comment #13)
> This problem is not related to virtio-win drivers.
> 
> According to KVM traces:
> msr_read c001102c = 0x (#GP)
> 
> which is consistent with Windows kernel analysis:
> 
> ....
> STACK_TEXT:  
> 8a403a38 83c505af 00000011 00000000 83c5035c
> hal!HalpErrataApplyPerProcessor+0x2112
> 8a403a4c 83c4ffb2 00000011 00000000 80d6c150 hal!HalpErrataInitSystem+0x4f
> 8a403a70 83c4ff78 80d6c150 83c4ff4d 8a403c2c hal!HalpInitSystemHelper+0x2c
> 8a403a78 83c4ff4d 8a403c2c 83acd216 00000001 hal!HalpInitSystemPhase1+0x18
> 8a403a80 83acd216 00000001 80d6c150 00000000 hal!HalInitSystem+0x1d
> 8a403c2c 83877f7f 00000000 8a403c70 8343a166
> nt!Phase1InitializationDiscard+0x15e
> 8a403c38 8343a166 80d6c150 8f048f88 00000000 nt!Phase1Initialization+0x21
> 8a403c70 8358d9bd 83877f5e 80d6c150 00000000 nt!PspSystemThreadStartup+0x4a
> 8a403c7c 00000000 00000000 38573847 38793866 nt!KiThreadStartup+0x15
> ...
> 
> EXCEPTION_RECORD:  8a4038d8 -- (.exr 0xffffffff8a4038d8)
> ExceptionAddress: 83c51f4a (hal!HalpErrataApplyPerProcessor+0x00002112)
>    ExceptionCode: c0000005 (Access violation)
>   ExceptionFlags: 00000000
> NumberParameters: 0
>  
> CONTEXT:  8a403480 -- (.cxr 0xffffffff8a403480)
> eax=8a403a02 ebx=00000117 ecx=c001102c edx=8a403a37 esi=00000000 edi=00000004
> eip=83c51f4a esp=8a403a30 ebp=8a403a38 iopl=0         nv up ei ng nz ac pe cy
> cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210297
> hal!HalpErrataApplyPerProcessor+0x2112:
> 83c51f4a 0f32            rdmsr
> ....
> 
> So guest tries to read rdmsr (ecx=c001102c) resulting in fault.
> 
> Similar task was already reported (and provides a fix):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1593190

So the fix is to add ignore_msrs=1 to the KVM configuration.
After that machine boots.

Comment 15 Dr. David Alan Gilbert 2020-09-24 14:58:09 UTC
(In reply to Marek Kedzierski from comment #14)
> (In reply to Marek Kedzierski from comment #13)
> > This problem is not related to virtio-win drivers.
> > 
> > According to KVM traces:
> > msr_read c001102c = 0x (#GP)
> > 
> > which is consistent with Windows kernel analysis:
> > 
> > ....
> > STACK_TEXT:  
> > 8a403a38 83c505af 00000011 00000000 83c5035c
> > hal!HalpErrataApplyPerProcessor+0x2112
> > 8a403a4c 83c4ffb2 00000011 00000000 80d6c150 hal!HalpErrataInitSystem+0x4f
> > 8a403a70 83c4ff78 80d6c150 83c4ff4d 8a403c2c hal!HalpInitSystemHelper+0x2c
> > 8a403a78 83c4ff4d 8a403c2c 83acd216 00000001 hal!HalpInitSystemPhase1+0x18
> > 8a403a80 83acd216 00000001 80d6c150 00000000 hal!HalInitSystem+0x1d
> > 8a403c2c 83877f7f 00000000 8a403c70 8343a166
> > nt!Phase1InitializationDiscard+0x15e
> > 8a403c38 8343a166 80d6c150 8f048f88 00000000 nt!Phase1Initialization+0x21
> > 8a403c70 8358d9bd 83877f5e 80d6c150 00000000 nt!PspSystemThreadStartup+0x4a
> > 8a403c7c 00000000 00000000 38573847 38793866 nt!KiThreadStartup+0x15
> > ...
> > 
> > EXCEPTION_RECORD:  8a4038d8 -- (.exr 0xffffffff8a4038d8)
> > ExceptionAddress: 83c51f4a (hal!HalpErrataApplyPerProcessor+0x00002112)
> >    ExceptionCode: c0000005 (Access violation)
> >   ExceptionFlags: 00000000
> > NumberParameters: 0
> >  
> > CONTEXT:  8a403480 -- (.cxr 0xffffffff8a403480)
> > eax=8a403a02 ebx=00000117 ecx=c001102c edx=8a403a37 esi=00000000 edi=00000004
> > eip=83c51f4a esp=8a403a30 ebp=8a403a38 iopl=0         nv up ei ng nz ac pe cy
> > cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210297
> > hal!HalpErrataApplyPerProcessor+0x2112:
> > 83c51f4a 0f32            rdmsr
> > ....
> > 
> > So guest tries to read rdmsr (ecx=c001102c) resulting in fault.
> > 
> > Similar task was already reported (and provides a fix):
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1593190
> 
> So the fix is to add ignore_msrs=1 to the KVM configuration.
> After that machine boots.

Well maybe, but we have the fix in kernel-3.10.0-1053.el7 accoding to:
https://bugzilla.redhat.com/show_bug.cgi?id=1593190#c46

why are you running with such an old host kernel?

Comment 16 FuXiangChun 2020-09-25 05:21:47 UTC
(In reply to Dr. David Alan Gilbert from comment #15)
> (In reply to Marek Kedzierski from comment #14)
> > (In reply to Marek Kedzierski from comment #13)
> > > This problem is not related to virtio-win drivers.
> > > 
> > > According to KVM traces:
> > > msr_read c001102c = 0x (#GP)
> > > 
> > > which is consistent with Windows kernel analysis:
> > > 
> > > ....
> > > STACK_TEXT:  
> > > 8a403a38 83c505af 00000011 00000000 83c5035c
> > > hal!HalpErrataApplyPerProcessor+0x2112
> > > 8a403a4c 83c4ffb2 00000011 00000000 80d6c150 hal!HalpErrataInitSystem+0x4f
> > > 8a403a70 83c4ff78 80d6c150 83c4ff4d 8a403c2c hal!HalpInitSystemHelper+0x2c
> > > 8a403a78 83c4ff4d 8a403c2c 83acd216 00000001 hal!HalpInitSystemPhase1+0x18
> > > 8a403a80 83acd216 00000001 80d6c150 00000000 hal!HalInitSystem+0x1d
> > > 8a403c2c 83877f7f 00000000 8a403c70 8343a166
> > > nt!Phase1InitializationDiscard+0x15e
> > > 8a403c38 8343a166 80d6c150 8f048f88 00000000 nt!Phase1Initialization+0x21
> > > 8a403c70 8358d9bd 83877f5e 80d6c150 00000000 nt!PspSystemThreadStartup+0x4a
> > > 8a403c7c 00000000 00000000 38573847 38793866 nt!KiThreadStartup+0x15
> > > ...
> > > 
> > > EXCEPTION_RECORD:  8a4038d8 -- (.exr 0xffffffff8a4038d8)
> > > ExceptionAddress: 83c51f4a (hal!HalpErrataApplyPerProcessor+0x00002112)
> > >    ExceptionCode: c0000005 (Access violation)
> > >   ExceptionFlags: 00000000
> > > NumberParameters: 0
> > >  
> > > CONTEXT:  8a403480 -- (.cxr 0xffffffff8a403480)
> > > eax=8a403a02 ebx=00000117 ecx=c001102c edx=8a403a37 esi=00000000 edi=00000004
> > > eip=83c51f4a esp=8a403a30 ebp=8a403a38 iopl=0         nv up ei ng nz ac pe cy
> > > cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00210297
> > > hal!HalpErrataApplyPerProcessor+0x2112:
> > > 83c51f4a 0f32            rdmsr
> > > ....
> > > 
> > > So guest tries to read rdmsr (ecx=c001102c) resulting in fault.
> > > 
> > > Similar task was already reported (and provides a fix):
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1593190
> > 
> > So the fix is to add ignore_msrs=1 to the KVM configuration.
> > After that machine boots.
> 
> Well maybe, but we have the fix in kernel-3.10.0-1053.el7 accoding to:
> https://bugzilla.redhat.com/show_bug.cgi?id=1593190#c46
> 
> why are you running with such an old host kernel?

Hit this bug in 7.5 host. Just I tried to update host kernel verion to the latest RHEL7.5.z(3.10.0-862.52.1.el7.x86_64). This issue is gone.  So, The latest RHEL7.5.z fixed this bug. Thanks.

Comment 17 Dr. David Alan Gilbert 2020-09-25 08:15:05 UTC
as comment 16; this was actually fixed a long time ago.