Bug 803132

Summary: [Kernel-251] Guest got reboot instead of wakeup after resume from S3 with kvmclock
Product: Red Hat Enterprise Linux 6 Reporter: Qunfang Zhang <qzhang>
Component: kernelAssignee: Marcelo Tosatti <mtosatti>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: amit.shah, areis, flang, ibolling, jjaburek, juzhang, michen, riel, shu, syeghiay, tburke, vbenes, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-258.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 08:34:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qunfang Zhang 2012-03-14 02:39:13 UTC
Description of problem:
Boot a RHEL6.3 guest and suspend guest to mem, I used Amit's private seabios tree that enabled S3. But guest got reboot instead wakeup. This issue happens no matter guest is attached with virtio driver or not. Re-test with kernel-250, have not this problem. So should be a regression, not sure whether related the "KVM steal time suspend/resume bugfix" patches in kernel-251.

seabios tree that enabled S3:
https://bugzilla.redhat.com/show_bug.cgi?id=761586#c7

Version-Release number of selected component (if applicable):
Host:
kernel-2.6.32-251.el6.x86_64
qemu-kvm-0.12.1.2-2.246.el6.x86_64
seabios-0.6.1.2-12.enableS3S4.v1.el6.x86_64

Guest:
kernel-2.6.32-251.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot a guest even there's no virtio driver
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -m 4G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -uuid 4c84db67-faf8-4498-9829-19a3d6431d9d -rtc base=localtime,driftfix=slew -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=net0,mac=00:1a:2a:42:10:66,bus=pci.0,addr=0x3 -boot c -monitor stdio -drive file=/home/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432 -qmp tcp:0:4444,server,nowait

2. Inside guest: echo mem > /sys/power/state

3. Resume guest by click PS/2 mouse or enter some keys with keyboard or send "system_wakeup" qemu command.
  
Actual results:
Guest reboot.

Expected results:
Guest should resume and come back to the state before s3.

Additional info:
Re-test with virtio block and virtio nic, hit the same problem.
Re-test with kernel-250, succeed to resume with the given command line.

Comment 3 Qunfang Zhang 2012-03-14 05:33:15 UTC
Update:
Test on the host installed kernel-152. then suspend host to mem. Host can resume successfully.

Comment 4 Qunfang Zhang 2012-03-14 05:56:36 UTC
(In reply to comment #3)
> Update:
> Test on the host installed kernel-152. then suspend host to mem. Host can
> resume successfully.

Sorry for the typo, host kernel should be kernel-2.6.32-251.el6.x86_64.

Comment 5 Amit Shah 2012-03-14 06:33:08 UTC
QE has confirmed the guest resumes from s3 as expected (without reboots) if kvmclock is disabled by adding '-cpu host,-kvmclock' to the qemu invocation.

There is also a separate bug where with virtio devices, the mouse and keyboard stop responding after resume from s3, this may be related to spice or virtio-serial, a new bug will be opened for that.

Comment 6 RHEL Program Management 2012-03-21 22:29:29 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 7 Aristeu Rozanski 2012-03-28 20:59:14 UTC
Patch(es) available on kernel-2.6.32-258.el6

Comment 10 Marcelo Tosatti 2012-03-29 03:04:09 UTC
*** Bug 803874 has been marked as a duplicate of this bug. ***

Comment 11 Matthew Garrett 2012-03-30 14:14:51 UTC
*** Bug 806820 has been marked as a duplicate of this bug. ***

Comment 12 Matthew Garrett 2012-03-30 14:15:11 UTC
*** Bug 806748 has been marked as a duplicate of this bug. ***

Comment 13 Qunfang Zhang 2012-04-05 09:38:17 UTC
Hi, Marcelo and Amit
I'm verifying this bug with the kernel-260, but blocked by bug 808391. Could you guys help look at bug 808391? It's not only 1/20 reproduced. Sometimes it happens at the first time s3, and sometimes it will happens after 1 or more cycles s3/resume and then reboot guest.

Comment 14 Qunfang Zhang 2012-04-05 10:33:39 UTC
And I tested rhel6.3-32 guest, sometimes guest will still reboot instead of wakeup. But it's not 100% reproduced.  I hit it twice after 15~20 times attempts. 
So, I will re-assign this bug.

Comment 15 Marcelo Tosatti 2012-04-10 14:51:09 UTC
(In reply to comment #14)
> And I tested rhel6.3-32 guest, sometimes guest will still reboot instead of
> wakeup. But it's not 100% reproduced.  I hit it twice after 15~20 times
> attempts. 
> So, I will re-assign this bug.

Please collect logs with the rhel6.3-32 bits guest (which appears immune to bug 808391) and kernel newer than kernel-2.6.32-258.el6.

Comment 16 Qunfang Zhang 2012-04-11 07:55:56 UTC
Marcelo,
I could not reproduce it after update guest kernel to latest 262 version after 40 times repeat.  Will test 64bit rhel6.3 guest later and if both of them pass, I will change the status to VERIFIED.

Comment 17 Marcelo Tosatti 2012-04-13 00:06:56 UTC
(In reply to comment #16)
> Marcelo,
> I could not reproduce it after update guest kernel to latest 262 version after
> 40 times repeat.  Will test 64bit rhel6.3 guest later and if both of them pass,
> I will change the status to VERIFIED.

OK, please do that. It should be enough to verify on 32-bits (BZ 808391 is a separate bug).

Comment 18 Karen Noel 2012-04-13 14:33:49 UTC
Because the fix is already submitted, change back to ON_QA.

Bug 808391 is moved to 6.4.0, so you cannot complete the testing until 6.4. Should you mark this fix verified based on only the 32-bit tests?  That should be appropriate for S3, which is not supported anyway.

Comment 19 Qunfang Zhang 2012-04-16 06:37:15 UTC
Hi, Karen and Marcelo
In kernel-251:
rhel6.3-32: not very easy to reproduce, i just reproduced twice after about 30 times attempts.
rhel6.3-64: 100% reproduced on all my attempts.

In kernel-262:
rhel6.3-32: Based on comment 16, can not reproduce after 40 times repeat.
rhel6.3-64: can not reproduce this bug.  Although it may suffer bug 808391, but in some rounds of test, guest hits neither 808391 nor this bug 803132. 

So, based on above, I would like to set the status to VERIFIED.  Please correct me if it's wrong to do this. 

Thanks,
Qunfang

Comment 21 errata-xmlrpc 2012-06-20 08:34:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html