Bug 788227

Summary: KVM internal error when shutting down guest with multi-monitor spice session
Product: Red Hat Enterprise Linux 6 Reporter: Marian Krcmarik <mkrcmari>
Component: kernelAssignee: David Blechter <dblechte>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: knoel, kraxel
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-18 14:53:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
backtrace from PAUSED qemu-kvm process none

Description Marian Krcmarik 2012-02-07 19:29:21 UTC
Description of problem:
I have setup with 4 monitors and I connect with using spice client to 4 screens guest (4 qxl devices). KVM internal error sometimes occurs when shutting down multiple screens Windows guest. The guest has virtio-serial installed (as well as qxl graphic driver and spice-vdagent). Guest ends up in PAUSED state, qemu monitor is responsive. I have a feeling, that It happens when I move/click with mouse on screens during shutting down, I have not seen it on single monitor guest setup yet .

qemu-kvm outputs:
KVM internal error. Suberror: 1
rax 0000000000000050 rbx 0000000000000050 rcx 0000000000000050 rdx 00000000fcc39b54
rsi 00000000fcc39b54 rdi 000000009ddeb800 rsp 000000008cc132b8 rbp 000000008cc132c0
r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
rip 00000000ac2e69c6 rflags 00010202
cs 0008 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type b l 0 g 1 avl 0)
ds 0023 (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
es 0023 (00000000/ffffffff p 1 dpl 3 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0010 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 0030 (82744c00/00003748 p 1 dpl 0 db 1 s 1 type 3 l 0 g 0 avl 0)
gs 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
tr 0028 (801da000/000020ab p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/ffffffff p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0)
gdt 80b95000/3ff
idt 80b95400/7ff
cr0 80010031 cr2 88570840 cr3 7f1973e0 cr4 6f8 cr8 0 efer 800
emulation failure, check dmesg for details

and 

dmesg | grep kvm
kvm: 20793: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
kvm: 21084: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
kvm: 23719: cpu0 unhandled wrmsr: 0x198 data 0
kvm: 23719: cpu1 unhandled wrmsr: 0x198 data 0


qemu-kvm cli:
/usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name Win7 -uuid 807c6eca-0eb0-b4c4-7164-47afd27c036b -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=d,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/dev/rootvg/Windows7_test,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/usr/share/rhev-guest-tools-iso/RHEV-toolsSetup_3.0_29.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:03:d9:0c,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -usb -spice port=3010,addr=0.0.0.0,disable-ticketing,disable-copy-paste -vga qxl -global qxl-vga.vram_size=67108864 -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 -device qxl,id=video2,vram_size=9437184,bus=pci.0,addr=0x8 -device qxl,id=video3,vram_size=9437184,bus=pci.0,addr=0x9 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Version-Release number of selected component (if applicable):
$ uname -a
Linux dhcp-29-89.brq.redhat.com 2.6.32-217.el6.x86_64 #1 SMP Sat Nov 5 17:49:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
qemu-kvm-0.12.1.2-2.209.el6_2.1.x86_64
spice-server-0.8.2-5.el6.x86_64
Windows7x32 guest 

How reproducible:
1/5

Steps to Reproduce:
1. Shutdown multiple monitors guest with opened spice session and possibly click or move mouse during shut down (guest with virtio-serial installed).
  
Actual results:
KVM Internal error

Expected results:
smooth shutdown

Additional info:

Comment 1 Marian Krcmarik 2012-02-07 19:32:25 UTC
Created attachment 560041 [details]
backtrace from PAUSED qemu-kvm process

Comment 2 Gleb Natapov 2012-02-07 19:36:47 UTC
When guest is paused after the error do "x/i $rip" in the monitor.

Comment 4 Marian Krcmarik 2012-02-07 20:13:47 UTC
(In reply to comment #2)
> When guest is paused after the error do "x/i $rip" in the monitor.

virsh # qemu-monitor-command Win7 --hmp "x/i $rip"
unknown register

This?

Comment 5 Gleb Natapov 2012-02-07 20:22:00 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > When guest is paused after the error do "x/i $rip" in the monitor.
> 
> virsh # qemu-monitor-command Win7 --hmp "x/i $rip"
> unknown register
> 
> This?

May be it is $eip. Or just copy rip address from the register dump. For the dump from comment #1 it will be: "x/i 0xac2e69c6"

Comment 6 Marian Krcmarik 2012-02-07 20:29:55 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #2)
> > > When guest is paused after the error do "x/i $rip" in the monitor.
> > 
> > virsh # qemu-monitor-command Win7 --hmp "x/i $rip"
> > unknown register
> > 
> > This?
> 
> May be it is $eip. Or just copy rip address from the register dump. For the
> dump from comment #1 it will be: "x/i 0xac2e69c6"

virsh # qemu-monitor-command Win7 --hmp "x/i $eip"
0x00000000ac2e69c6:  movntdq %xmm0,(%edi)


virsh # qemu-monitor-command Win7 --hmp "x/i 0xac2e69c6"
0x00000000ac2e69c6:  movntdq %xmm0,(%edi)

Comment 7 Gleb Natapov 2012-02-07 20:50:39 UTC
(In reply to comment #6)
> virsh # qemu-monitor-command Win7 --hmp "x/i 0xac2e69c6"
> 0x00000000ac2e69c6:  movntdq %xmm0,(%edi)

Can you also provide output of "info pci" monitor command for the guest?

Comment 8 Marian Krcmarik 2012-02-07 20:55:04 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > virsh # qemu-monitor-command Win7 --hmp "x/i 0xac2e69c6"
> > 0x00000000ac2e69c6:  movntdq %xmm0,(%edi)
> 
> Can you also provide output of "info pci" monitor command for the guest?

qemu-monitor-command Win7 --hmp "info pci"
  Bus  0, device   0, function 0:
    Host bridge: PCI device 8086:1237
      id ""
  Bus  0, device   1, function 0:
    ISA bridge: PCI device 8086:7000
      id ""
  Bus  0, device   1, function 1:
    IDE controller: PCI device 8086:7010
      BAR4: I/O at 0xc000 [0xc00f].
      id ""
  Bus  0, device   1, function 2:
    USB controller: PCI device 8086:7020
      IRQ 11.
      BAR4: I/O at 0xc020 [0xc03f].
      id ""
  Bus  0, device   1, function 3:
    Bridge: PCI device 8086:7113
      IRQ 9.
      id ""
  Bus  0, device   2, function 0:
    VGA controller: PCI device 1b36:0100
      IRQ 10.
      BAR0: 32 bit memory at 0xf0000000 [0xf3ffffff].
      BAR1: 32 bit memory at 0xe0000000 [0xe3ffffff].
      BAR2: 32 bit memory at 0xf4000000 [0xf4001fff].
      BAR3: I/O at 0xc040 [0xc05f].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device   3, function 0:
    Ethernet controller: PCI device 10ec:8139
      IRQ 5.
      BAR0: I/O at 0xc100 [0xc1ff].
      BAR1: 32 bit memory at 0xf4020000 [0xf40200ff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id "net0"
  Bus  0, device   4, function 0:
    Class 0403: PCI device 8086:2668
      IRQ 11.
      BAR0: 32 bit memory at 0xffffffffffffffff [0x00003ffe].
      id "sound0"
  Bus  0, device   5, function 0:
    RAM controller: PCI device 1af4:1002
      IRQ 10.
      BAR0: I/O at 0xc200 [0xc21f].
      id "balloon0"
  Bus  0, device   6, function 0:
    Class 0780: PCI device 1af4:1003
      IRQ 10.
      BAR0: I/O at 0xffffffffffffffff [0x001e].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
      id "virtio-serial0"
  Bus  0, device   7, function 0:
    Display controller: PCI device 1b36:0100
      IRQ 5.
      BAR0: 32 bit memory at 0xffffffffffffffff [0x03fffffe].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x03fffffe].
      BAR2: 32 bit memory at 0xffffffffffffffff [0x00001ffe].
      BAR3: I/O at 0xffffffffffffffff [0x001e].
      id "video1"
  Bus  0, device   8, function 0:
    Display controller: PCI device 1b36:0100
      IRQ 11.
      BAR0: 32 bit memory at 0xec000000 [0xefffffff].
      BAR1: 32 bit memory at 0xf5000000 [0xf5ffffff].
      BAR2: 32 bit memory at 0xf6000000 [0xf6001fff].
      BAR3: I/O at 0xc260 [0xc27f].
      id "video2"
  Bus  0, device   9, function 0:
    Display controller: PCI device 1b36:0100
      IRQ 10.
      BAR0: 32 bit memory at 0xf8000000 [0xfbffffff].
      BAR1: 32 bit memory at 0xfd000000 [0xfdffffff].
      BAR2: 32 bit memory at 0xf6002000 [0xf6003fff].
      BAR3: I/O at 0xc280 [0xc29f].
      id "video3

Comment 9 Gleb Natapov 2012-02-08 15:24:19 UTC
Here are finding after long IRC debug session:

The instruction that fails is movntdq %xmm0,(%edi) and we indeed do not emulate it, but it should not be used to do mmio usually. The instruction tries to access address in %edi (0xddeb800). After walking page table we saw that it maps to a physical address 0xeb400000. Looking at "info pci" output in comment #8 there is no pci device that claims this address, but there is one unconfigured QXL at device 7. After reboot this device look like:

  Bus  0, device   7, function 0:
    Display controller: PCI device 1b36:0100
      IRQ 5.
      BAR0: 32 bit memory at 0xe8000000 [0xebffffff].
      BAR1: 32 bit memory at 0xe4000000 [0xe7ffffff].
      BAR2: 32 bit memory at 0xf4046000 [0xf4047fff].
      BAR3: I/O at 0xc240 [0xc25f].
      id "video1"

So it claims the address movntdq instruction tried to access. It looks like QXL driver tries to access device's memory after it is unconfigured. During normal operation such accesses are not emulated since QXL bars are memory, not mmio.

Comment 11 Karen Noel 2012-04-27 19:18:35 UTC
Gleb indicates in this IRC chat that the fix should be in the qlx driver.

<gleb_> knoel_wfh: and 788227 is technically is kvm bug since we do not emulate the mmx instruction, but it triggers due to windows driver bug
<gleb_> knoel_wfh: and for rhel6 we'd rather fix it in Windows driver
<knoel_wfh> gleb_: Which windows driver?
<gleb_> knoel_wfh: qxl is our driver

Comment 12 RHEL Program Management 2012-05-03 05:43:08 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 14 David Blechter 2014-11-18 14:53:08 UTC
closing as WONTFIx. This bug is about multiple qxl devices, and  is not supported any more.