Bug 754704

Summary: BSOD of guest occurs when shutting down multiple screens Windows guest.
Product: Red Hat Enterprise Linux 8 Reporter: Marian Krcmarik <mkrcmari>
Component: spice-qxl-xddmAssignee: Alon Levy <alevy>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: ---CC: acathrow, bcao, dblechte, pvine
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-09 08:20:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marian Krcmarik 2011-11-17 13:05:59 UTC
Description of problem:
BSOD of guest sometimes occurs when shutting down multiple screens Windows guest. I have setup with 4 monitors and I connect through user portal using spice client to 4 screen guest (4 qxl devices). I have tools ic147 installed on the guest (the non-surface qxl driver). I will attach the link to the memory dump produces which indicates the fault in qxl driver. Moreover qemu outputs sometimes (i dont know if related) sth like:
virtio_ioport_write: unexpected address 0x13 value 0x1

BSOD does not happen really often. More often guest ends up in PAUSED state and qemu outputs sth like:
KVM internal error. Suberror: 1
rax 0000000000000050 rbx 0000000099c111bc rcx 0000000000000050 rdx 00000000fbe9ec54
rsi 00000000fbe9ec54 rdi 0000000099c111d0 rsp 0000000088cfa33c rbp 0000000088cfa344
r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
rip 0000000090604ac6 rflags 00010202
cs 0008 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type b l 0 g 1 avl 0)
ds 0023 (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
es 0023 (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0010 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 0030 (82977c00/00003748 p 1 dpl 0 db 1 s 1 type 3 l 0 g 0 avl 0)
gs 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
tr 0028 (801db000/000020ab p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
gdt 80b95000/3ff
idt 80b95400/7ff
cr0 80010031 cr2 86b59840 cr3 3e9de400 cr4 6f8 cr8 0 efer 800
emulation failure, check dmesg for details

and

# dmesg | grep kvm
kvm: 23622: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 23622: cpu1 unhandled wrmsr: 0x198 data 0
kvm: 25252: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 25252: cpu1 unhandled wrmsr: 0x198 data 0

Then I tried to produce BSOD of such paused guest but guest does not "crash" after sending the nmi command through qemu monitor once the guest is in PAUSED state.
I am not really sure this is related to the BSOD.

No problems when shutting down happen with single monitor guest setup. I could not reproduce without tools but with qxl driver as well (but with BSOD I was maybe just lucky) but the second problem with PAUSED state seems to be not reproducible with only qxl and no virtio-serial.
I dont know how much those two problem are related, for sure mentioning it. The problem with PAUSED state guest I am able to reproduce with new qxl driver (off-screen) surface one -13. Not yet BSOD, will try more.

Version-Release number of selected component (if applicable):
RHEVM ic147
Windows7x32 guest with ic147 tools 

How reproducible:
BSOD sometimes, PAUSED state often

Steps to Reproduce:
1. Connect to multiple screen windows guest using spice
2. Perform Shut down
3.
  
Actual results:
BSOD or guest ends up in PAUSED state and does not reponses

Expected results:
Graceful shutdown

Additional info:

Comment 2 Marian Krcmarik 2011-11-20 21:35:01 UTC
I reproduced with off-screen surface driver installed on guest qxl-win-0.1-13 but unfortunately I was not able to get dump since creating of dump always got stuck in "Initializing disk fro crash dump" phase.

Reproducing the "PAUSED state guest" is much easier and more often, I am not really sure It is related to the BSOD or maybe BSOD is the cause of "emulation error".

I would appreciate if anyone can take a look and maybe say little bit more.

I own information about host:

qemu-kvm-0.12.1.2-2.207.el6.x86_64
libvirt-python-0.9.4-21.el6.x86_64
libvirt-0.9.4-21.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.207.el6.x86_64
spice-server-0.8.2-5.el6.x86_64
vdsm-4.9-110.el6.x86_64
qemu-img-0.12.1.2-2.207.el6.x86_64
libvirt-client-0.9.4-21.el6.x86_64
vdsm-cli-4.9-110.el6.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

Comment 3 Marian Krcmarik 2012-02-07 19:33:54 UTC
I took the KVM internal error bug out from this and created a new one. This bug remains to be bug for BSOD of guest, qxl related.

Comment 4 Alon Levy 2012-02-22 11:52:55 UTC
What is the KVM internal error bug number?

Comment 5 Marian Krcmarik 2012-02-22 12:35:53 UTC
(In reply to comment #4)
> What is the KVM internal error bug number?

https://bugzilla.redhat.com/show_bug.cgi?id=788227

I debugged with Gleb, so you can possibly ask him about details.

Comment 6 Alon Levy 2013-05-15 20:02:58 UTC
Marian, the "virtio_ioport_write: unexpected address 0x13 value 0x1" reference suggests to check the newest virtio drivers, can you see if you can reproduce?

(your later comment also suggests this:
"""
I could not reproduce without tools but with qxl driver as well (but with BSOD I was maybe just lucky) but the second problem with PAUSED state seems to be not reproducible with only qxl and no virtio-serial.
""")

Thanks,
Alon

Comment 7 Marian Krcmarik 2013-05-21 15:51:29 UTC
(In reply to Alon Levy from comment #6)
> Marian, the "virtio_ioport_write: unexpected address 0x13 value 0x1"
> reference suggests to check the newest virtio drivers, can you see if you
> can reproduce?
> 
> (your later comment also suggests this:
> """
> I could not reproduce without tools but with qxl driver as well (but with
> BSOD I was maybe just lucky) but the second problem with PAUSED state seems
> to be not reproducible with only qxl and no virtio-serial.
> """)
> 
> Thanks,
> Alon

Well, It's been a while I reported this bug (1.5 yrs) so all of the involved components changed and as I stated I could not reproduce with the off-screen surface driver but that time older driver was used.
I tried to reproduce but not very hard hard and nothing showed up, The dumps are still available though.

Comment 8 Alon Levy 2013-08-08 15:11:39 UTC
Since I cannot reproduce, and the original reporter cannot either, even though I'm sure there is a bug in shutdown per Gleb's comment 9 on #788227, I prefer to close this for now and reopen if someone can reproduce.

Alon

Comment 9 David Blechter 2013-08-09 08:20:07 UTC
based on https://bugzilla.redhat.com/show_bug.cgi?id=754704#c8 this bug is closed.