Bug 803187

Summary: Guest mouse and keyboard got unresponsive after resume from S3 with virtio devices
Product: Red Hat Enterprise Linux 6 Reporter: Qunfang Zhang <qzhang>
Component: kernelAssignee: Amit Shah <amit.shah>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, amit.shah, areis, bcao, bsarathy, dyasny, flang, juzhang, michen, mkenneth, rhod, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-262.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 08:34:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 804019    
Attachments:
Description Flags
isa serial log with kernel-2.6.32-254.el6bz803187 none

Description Qunfang Zhang 2012-03-14 07:19:42 UTC
Description of problem:
Boot guest with virtio blk and suspend to mem, and then resume the guest. But guest mouse and keyboard got unresponsive. Tested without kvm clock as there is an S3 issue related kvm clock, please check Bug 803132. Tried some scenarios as below and it maybe relate to virtio block driver:

0. No any virtio devices -> guest works well after s3
1. drop virtio serial, but add virtio nic, blk, balloon -> guest unresponsive after s3
2. only add virtio net,  no other virtio devices  -> guest works well after s3
3. only add virtio blk, no other virtio devices. -> guest unresponsive after s3
4. only add virtio balloon,  no other virtio devices  -> guest works well after s3
5. only add virtio serial  no other virtio devices  -> guest works well after s3

Version-Release number of selected component (if applicable):
Host:
kernel-2.6.32-251.el6.x86_64
qemu-kvm-0.12.1.2-2.246.el6.x86_64
seabios-0.6.1.2-12.enableS3S4.v1.el6.x86_64

Guest:
kernel-2.6.32-251.el6.x86_64

The seabios is provided by Amit, and enabled S3&S4.Download from:
https://bugzilla.redhat.com/show_bug.cgi?id=761586#c7

How reproducible:
Always

Steps to Reproduce:
1.Boot a guest with virtio block without kvmclock:
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe,-kvmclock -enable-kvm -m 4G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -uuid 4c84db67-faf8-4498-9829-19a3d6431d9d -rtc base=localtime,driftfix=slew -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,addr=0x5 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=net0,mac=00:1a:2a:42:10:66,bus=pci.0,addr=0x3 -usb -device usb-tablet,id=input0   -boot c -monitor stdio  -drive file=/home/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc :10  -qmp tcp:0:4444,server,nowait

2. Suspend guest to mem inside guest by:
#echo mem > /sys/power/state

3. Resume guest by send "system_wakeup" qemu command or press the keyboard in guest screen. 
  
Actual results:
Guest keyboard and mouse got unresponsive after resume. Most of time it will be unresponsive immediately after resume. And sometimes will be after a while. In the second situation, please suspend to mem again. And can reproduce the issue.

Expected results:
Guest should work well after resume from S3.

Additional info:

Comment 1 Qunfang Zhang 2012-03-14 08:09:04 UTC
Update:
Ping works fine after guest resumes from S3 though the mouse and keyboard don't response.
And test old kernel-231, the problem exists as well.

Comment 2 Qunfang Zhang 2012-03-19 06:51:44 UTC
Today I found guest will hang during resuming from S3 state with other virtio devices attached. Guest consumes 100% cpu resource.
Command line is: 
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe,-kvmclock -enable-kvm -m 4G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -uuid 4c84db67-faf8-4498-9829-19a3d6431d9d -rtc base=localtime,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/home/rhel6.3-64.raw,if=none,format=raw,id=scsi0 -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=scsi0 -netdev tap,vhost=on,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:2a:42:10:66,bus=pci.0 -chardev socket,id=charchannel0,path=/tmp/qzhang-test,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -boot c -monitor stdio -device virtio-balloon-pci,bus=pci.0,id=balloon0 -qmp tcp:0:4444,server,nowait -chardev socket,id=charserial0,path=/tmp/qzhang-isa,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432


And I re-test again and remove all virtio device in the above command line except the virtio scsi system disk.  Guest does not hang any more.

Comment 3 Qunfang Zhang 2012-03-19 07:56:36 UTC
Hit the problem in Comment 0 even without virtio block but have other virtio devices, such as virtio balloon, serial and nic.

Comment 4 Dor Laor 2012-03-19 12:49:44 UTC
(In reply to comment #3)
> Hit the problem in Comment 0 even without virtio block but have other virtio
> devices, such as virtio balloon, serial and nic.

Can you test w/o *any* virtio device? We like to isolate the mouse from virtio.
Another request is to disable kvmclock (-cpu MODEL,-kvmclock).

Comment 5 Qunfang Zhang 2012-03-20 02:22:24 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Hit the problem in Comment 0 even without virtio block but have other virtio
> > devices, such as virtio balloon, serial and nic.
> 
> Can you test w/o *any* virtio device? We like to isolate the mouse from virtio.
Hi, Dor
As described in Comment 0, "0. No any virtio devices -> guest works well after s3"

> Another request is to disable kvmclock (-cpu MODEL,-kvmclock).
Yes, I have disabled kvmclock when hit this problem, also please refer to the command line in Comment 0 and Comment 2. :)

Comment 6 Amit Shah 2012-03-28 06:44:53 UTC
Cause is in qemu; qemu resets devices after waking up from s3, causing the guest and host virtio state to go out of sync.

Comment 7 Amit Shah 2012-03-28 08:30:39 UTC
Mike just confirmed that this happened with virtio-net on Windows too.  On Windows, disabling the adapter and then enabling it gets network working again.

Comment 8 Amit Shah 2012-03-28 08:42:13 UTC
(In reply to comment #7)
> On
> Windows, disabling the adapter and then enabling it gets network working again.

Similarly, for Linux guests, rmmod and then modprobe the virtio-net driver makes it work again.

Comment 9 Amit Shah 2012-03-29 08:34:09 UTC
(In reply to comment #6)
> Cause is in qemu; qemu resets devices after waking up from s3, causing the
> guest and host virtio state to go out of sync.

Moving back to kernel.  Gerd tells me a reset is to be expected even for S3.  The fix then will be to handle this the same way in the kernel as S4 is done.  Patches are posted upstream.

Comment 12 Qunfang Zhang 2012-03-30 07:04:41 UTC
Created attachment 573894 [details]
isa serial log with kernel-2.6.32-254.el6bz803187

Comment 14 Qunfang Zhang 2012-03-30 10:30:18 UTC
File a new bug 808391 to track the issue described in comment 11 and comment 12.

Comment 17 RHEL Program Management 2012-04-01 06:39:44 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 18 Aristeu Rozanski 2012-04-10 15:08:17 UTC
Patch(es) available on kernel-2.6.32-262.el6

Comment 22 Qunfang Zhang 2012-04-16 07:52:09 UTC
Verified this bug on kernel-2.6.32-262.el6, repeat more than 20 times and can not reproduce it any more.

CLI:
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -m 2G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -uuid 4c84db67-faf8-4498-9829-19a3d6431d9d -rtc base=localtime,driftfix=slew -drive file=/home/rhel6.3-64-new.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,addr=0x5 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:2a:42:10:66,bus=pci.0,addr=0x3 -usb -device usb-tablet,id=input0 -boot c -monitor stdio -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc :10 -qmp tcp:0:4444,server,nowait -bios /usr/share/seabios/bios-pm.bin -chardev socket,path=/tmp/qzhang-test,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,max_ports=31,vectors=4,bus=pci.0 -chardev socket,id=channel0,path=/tmp/virtio-serial,server,nowait -device virtserialport,chardev=channel0,name=org.linux-kvm.port.0,bus=virtio-serial0.0,id=port0 -device virtio-balloon-pci,bus=pci.0,id=balloon0

Steps:
1. boot the guest with the above command line.
2. ping external host with the virtio nic device.
3. ballooning mem to a smaller value.
4. transfer some data through virtio serial.
5. suspend guest to mem
#pm-suspend
6. resume guest by press keyboard.
7. repeat step 2~6 for 20 times.

Result: Guest can resume successfully and the guest devices (mouse, keyboard, network, balloon, virtio serial devices) still work well.
So this issue is fixed.  I will change the status to VERIFIED.

Comment 24 errata-xmlrpc 2012-06-20 08:34:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html