Bug 688883

Summary: qemu-kvm process quits when windows guest doing S3 w/ qxl device
Product: Red Hat Enterprise Linux 8 Reporter: Xiaoqing Wei <xwei>
Component: spice-qxl-xddmAssignee: Alon Levy <alevy>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: acathrow, alevy, bazulay, chayang, cmeadors, cpelland, dblechte, ddumas, gcosta, iheim, juzhang, mhasko, michen, mkenneth, mkrcmari, qzhang, Rhev-m-bugs, rhod, sgordon, tburke, uril, virt-maint
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.183.el6 do -0.1-12 Doc Type: Bug Fix
Doc Text:
When Windows guests suspended to memory (S3) the qemu-kvm process would end, sometimes generating a core dump. Updates have been made to the Windows SPICE driver, in conjunction with updates to qemu-kvm, ensure that suspend and resume now works for Windows guests.
Story Points: ---
Clone Of:
: 706711 (view as bug list) Environment:
Last Closed: 2014-01-21 15:10:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 723480    
Bug Blocks: 565939, 706711    
Attachments:
Description Flags
echo nop > current_tracer // echo kvm > set_event
none
win7 doing s3 with sm101 qxl driver none

Description Xiaoqing Wei 2011-03-18 11:42:46 UTC
Created attachment 486217 [details]
echo nop > current_tracer     // echo  kvm > set_event

Description of problem:
windows guest can  do S3 when using qxl video adapter,
when windows guest do s3,the qemu-kvm process will quit,
sometimes can produce a core dump

guest tested:
winXP-32,win2003-32/64,win7-32

win7-64 also tested,but this driver not work on it,it shows a yellow "!" in device manager after installed driver and reboot.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.150.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot a winXP guest with qxl video adapter using:
qemu-kvm -name 'XP' -monitor stdio -chardev socket,id=serial_gJRf,path=/tmp/serial-20110307-181453-vBnV,server,nowait -device isa-serial,chardev=serial_gJRf -drive file='kvm-test/tests/kvm/images/winXP-32-virtio.qcow2',index=0,if=ide,media=disk,cache=none,format=qcow2,aio=native -device virtio-net-pci,netdev=idY5P6kp,mac=9a:ef:ac:4d:b3:92,netdev=idY5P6kp,id=ndev00idY5P6kp,bus=pci.0,addr=0x3 -netdev tap,id=idY5P6kp,vhost=on,ifname='t0-181453-vBnV',script='kvm-test/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -spice port=8000,disable-ticketing -rtc base=localtime,clock=host,driftfix=none  -boot order=cdn,once=c,menu=off   -usbdevice tablet -enable-kvm \
 -vga qxl
2.install the lateset video driver for guest   0.1-4,and reboot guest to apply the driver
3.after reboot,select stanby ,now the qemu-kvm process will quit and print:

" handle_dev_destroy_surfaces: 
handle_dev_destroy_surfaces: 
handle_dev_destroy_surfaces: 
qxl_phys2virt: PANIC !qxl->guest_slots[slot].active failed"
  
Actual results:
host qemu-kvm quits
Expected results:
guest S3 successfully

Additional info:
kernel-2.6.32-120.el6.x86_64
vgabios-0.6b-3.5.el6.noarch

less /proc/cpuinfo
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2659.665
cache size      : 3072 KB





info of xp core dump:
Core was generated by `qemu-kvm -name XP -monitor stdio -chardev socket,id=serial_gJRf,path=/tmp/seria'.
Program terminated with signal 6, Aborted.
#0  0x00000039144329a5 in raise () from /lib64/libc.so.6

(gdb) bt
#0  0x00000039144329a5 in raise () from /lib64/libc.so.6
#1  0x0000003914434185 in abort () from /lib64/libc.so.6
#2  0x00000037b0010727 in validate_virt (info=<value optimized out>, virt=<value optimized out>, 
    slot_id=<value optimized out>, add_size=<value optimized out>, group_id=<value optimized out>)
    at red_memslots.c:83
#3  0x00000037b00107cc in get_virt (info=<value optimized out>, addr=<value optimized out>, 
    add_size=<value optimized out>, group_id=1) at red_memslots.c:122
#4  0x00000037b002cb98 in handle_dev_create_primary_surface (listener=0x7fc4cfe2a8e0, 
    events=<value optimized out>) at red_worker.c:9865
#5  handle_dev_input (listener=0x7fc4cfe2a8e0, events=<value optimized out>) at red_worker.c:9989
#6  0x00000037b002c295 in red_worker_main (arg=<value optimized out>) at red_worker.c:10286
#7  0x0000003914c077e1 in start_thread () from /lib64/libpthread.so.0
#8  0x00000039144e151d in clone () from /lib64/libc.so.6



also tried to ftrace qemu-kvm,collected ftrace when the qemu-kvm quits.but when I enable ftrace,the qemu-kvm quit but no produce core dump,so the attached ftrace file is no  corresponding to the bt info.

Comment 2 Alon Levy 2011-03-20 10:27:32 UTC
Hi Xiaoqing Wei,
Can you try this scratch build and see if it solves the problem?
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3188963

When trying to reproduce (with an older driver, doing suspend) I didn't get any
assert, but I also didn't manage to exit the suspend without doing a
system_reset from the monitor - how do you usually get out of S3?

Alon

Comment 3 Xiaoqing Wei 2011-03-22 10:52:32 UTC
(In reply to comment #2)
> Hi Xiaoqing Wei,
> Can you try this scratch build and see if it solves the problem?
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3188963
> 

Hi Alon,
This build still fails,and core dump.
qemu outputs:

(qemu) reds_handle_main_link: 
reds_show_new_channel: channel 1:0, connected successfully, over Non Secure link
handle_dev_input: mouse mode 2
reds_main_handle_message: net test: latency 0.515000 ms, bitrate 90343641 bps (86.158410 Mbps)
reds_show_new_channel: channel 4:0, connected successfully, over Non Secure link
red_dispatcher_set_cursor_peer: 
handle_dev_input: cursor connect
reds_show_new_channel: channel 2:0, connected successfully, over Non Secure link
red_dispatcher_set_peer: 
handle_dev_input: connect
handle_new_display_channel: jpeg disabled
handle_new_display_channel: zlib-over-glz disabled
reds_show_new_channel: channel 3:0, connected successfully, over Non Secure link
inputs_link: 
handle_dev_destroy_surfaces: 
handle_dev_destroy_surfaces: 
handle_dev_destroy_surfaces: 
handle_dev_destroy_surfaces: 
id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
validate_virt: panic: virtual address out of range
    virt=0x0+0x1d4c00 slot_id=1 group_id=1
    slot=0x0-0x0 delta=0x0
Aborted (core dumped)

(gdb) bt
#0  0x00000036daa329a5 in raise () from /lib64/libc.so.6
#1  0x00000036daa34185 in abort () from /lib64/libc.so.6
#2  0x00000036dfe10727 in ?? () from /usr/lib64/libspice-server.so.1
#3  0x00000036dfe107cc in ?? () from /usr/lib64/libspice-server.so.1
#4  0x00000036dfe2cb78 in ?? () from /usr/lib64/libspice-server.so.1
#5  0x00000036dfe2c275 in ?? () from /usr/lib64/libspice-server.so.1
#6  0x00000036db2077e1 in start_thread () from /lib64/libpthread.so.0
#7  0x00000036daae151d in clone () from /lib64/libc.so.6
(gdb) 

qemu-kvm-0.12.1.2-2.113.el6.test.x86_64

> When trying to reproduce (with an older driver, doing suspend) I didn't get any
> assert, but I also didn't manage to exit the suspend without doing a
> system_reset from the monitor - how do you usually get out of S3?
> 
Win guest will back from S3 automatically,so I didn't do anything till it prompt me to login.


> Alon


Best Regards,
Xiaoqing

Comment 4 Marian Krcmarik 2011-03-22 19:18:29 UTC
I reproduced the bug after a while. The important note is that It appears when using the qxl-win build from brew in version 0.1-4, with older builds of qxl-win It does not appear.

Note: I used latest scratch build of qemu:
qemu-kvm-0.12.1.2-2.151.spice_hdg2.el6.x86_64

Comment 5 Xiaoqing Wei 2011-03-23 02:55:17 UTC
(In reply to comment #4)
> I reproduced the bug after a while. The important note is that It appears when
> using the qxl-win build from brew in version 0.1-4, with older builds of
> qxl-win It does not appear.
> 
> Note: I used latest scratch build of qemu:
> qemu-kvm-0.12.1.2-2.151.spice_hdg2.el6.x86_64

hi ,when I use the qemu-kvm Alon builds,even the oldest qxl-win(0.1-1)can reproduce .
could you pls give me your scratch build  ?

BTW,what's your kernel,vgabios,spice-server when it didn't appear ?

mine is kernel-2.6.32-120.el6.x86_64 /vgabios-0.6b-3.5.el6.noarch/spice-server-0.7.2-4.el6.x86_64,and always reproducible

Best Regards
Xiaoqing

Comment 7 Qunfang Zhang 2011-03-24 11:00:30 UTC
(In reply to comment #4)
> I reproduced the bug after a while. The important note is that It appears when
> using the qxl-win build from brew in version 0.1-4, with older builds of
> qxl-win It does not appear.
> 
> Note: I used latest scratch build of qemu:
> qemu-kvm-0.12.1.2-2.151.spice_hdg2.el6.x86_64

Hi, Marian
I tried this scratch build of qemu as well. Install qxl-win-0.1-4 on win2k8-32 guest. After do S3, guest quit with the same error as bug description. So copied a fresh backup image and install qxl-win-0.1-3 again. Problem still exists.

Comment 8 Qunfang Zhang 2011-03-28 09:30:16 UTC
Add test blocker keywords because whql need windows guest do S3.

Comment 9 Uri Lublin 2011-04-07 14:35:10 UTC
*** Bug 676826 has been marked as a duplicate of this bug. ***

Comment 16 Xiaoqing Wei 2011-04-19 06:51:24 UTC
Created attachment 493099 [details]
win7 doing s3 with sm101 qxl driver

Comment 24 Dor Laor 2011-04-21 11:24:08 UTC
I cleared the blocker flag and the testBlocker keyword at the same time..
Alon please transfer it to the right qxl driver component.

Comment 25 Qunfang Zhang 2011-05-18 08:01:11 UTC
Set to "Testblocker" keyword because without correct spic+qxl driver, we can not pass whql job "NDIS Test 6.5 (MPE)".

Comment 26 Alon Levy 2011-05-22 13:20:08 UTC
Dor, Actually our solution involves a patch to qemu-kvm as well (qxl device only), and to the driver. I'll clone this bug for qemu-kvm.

btw, the solution is to add two io's for preparing to sleep and returning from sleep, which tell the qxl device to ignore resets. crude, but who needs to support complex logic of taking down a device when there is a better solution? of course, if it was possible to tell why we are being reset even that wouldn't be required, but I understand there is no way to get that via acpi in real hardware and that is what qemu is modelled after.

Alon

Comment 30 Alon Levy 2011-10-05 19:10:51 UTC
skipped 0.1-10 and 0.1-11. The former a tagging error, the later had a version of the surprise removal hack with 2 seconds timeout, replaced it with 60 seconds, verified the surprise removal test still passes (didn't rerun the whole WHQL).

Comment 32 Michal Haško 2011-10-21 07:49:25 UTC
VERIFIED on:
Windows XP
Windows 7 32 bit
WIndows 7 64 bit
qemu-kvm-0.12.1.2-2.184.el6.x86_64
qxl-win-0.1-12
seabios-0.6.1.2-4.el6.x86_64 (older seabios because the newest has S3 disabled)

Comment 33 Stephen Gordon 2012-02-29 18:47:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When Windows guests suspended to memory (S3) the qemu-kvm process would end, sometimes generating a core dump. Updates have been made to the Windows SPICE driver, in conjunction with updates to qemu-kvm, ensure that suspend and resume now works for Windows guests.