Bug 694801

Summary:

Guest fail to resume from S4 if guest using kvmclock

Product:

Red Hat Enterprise Linux 6

Reporter:

Chao Yang <chayang>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.1

CC:

amit.shah, bcao, drjones, ehabkost, juzhang, michen, mkenneth, qzhang, shuang, shu, sluo, tburke, virt-maint

Target Milestone:

Keywords:

Reopened

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-251.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-20 07:39:52 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

720669, 748534, 753024, 756082, 767187, 786141

Attachments:

Description	Flags
rhel6-kvmclock-s4.patch	none

Description Chao Yang 2011-04-08 13:18:35 UTC

Description of problem:
Boot a rhel5.6.z guest on AMD host, then succeed to suspend guest to disk, but when resuming from S4 only get a black screen. 
CLI:
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot c -drive file=/dev/chayang/RHEL5.6-64,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none -device ide-drive,drive=drive-virtio-0-0,id=virt0-0-0 -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.156.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
If add clock=pmtmr to kernel line, this issue won't exist.

Comment 2 Chao Yang 2011-04-08 13:23:08 UTC

And anther issue, suspend guest with clock=pmtmr to mem, after resuming from
s4, networking is unavailable, cannot ping remote, need to reboot network to
restore networking.

Comment 3 RHEL Program Management 2011-04-09 06:00:12 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Chao Yang 2011-04-11 10:45:42 UTC

Hit same issue on rhel6.1 guest with/without kvmclock on AMD host.
Guest kernel: 2.6.32-128.el6.x86_64
CLI: 
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot c -drive file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server
In guest:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource   
kvm-clock

Steps:
1. do S4 in guest
# echo disk>/sys/power/state
2. after suspend, boot again to resume from s4
Actual Result:
Only got a black screen and monitor exited with:
(qemu) Guest moved used index from 36574 to 0

# nc -U /home/seri 
�could not read byte from child: Success

Comment 5 juzhang 2011-04-11 11:03:05 UTC

Since kvmclock is our default clocksource.so suggest fix in rhel6.1.mark rhel‑6.1.0 ? and  blocker ?

Comment 8 juzhang 2011-04-12 02:08:29 UTC

(In reply to comment #4)
> Hit same issue on rhel6.1 guest with/without kvmclock on AMD host.
> Guest kernel: 2.6.32-128.el6.x86_64
> CLI: 
> /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64
> -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection
> -boot c -drive
> file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none
> -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev
> tap,id=hostnet1 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device
> usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial
> unix:/home/seri,nowait,server
> In guest:
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource   
> kvm-clock
> Steps:
> 1. do S4 in guest
> # echo disk>/sys/power/state
> 2. after suspend, boot again to resume from s4
> Actual Result:
> Only got a black screen and monitor exited with:
> (qemu) Guest moved used index from 36574 to 0
> # nc -U /home/seri 
> �could not read byte from child: Success

Hi,chayang

   Would you please retest it without virtio related devices.comment0 CML indicate that no virtio related devices.however,this comment includes virtio nic and virtio device.

Comment 9 Chao Yang 2011-04-12 05:18:12 UTC

(In reply to comment #8)
> (In reply to comment #4)
> > Hit same issue on rhel6.1 guest with/without kvmclock on AMD host.
> > Guest kernel: 2.6.32-128.el6.x86_64
> > CLI: 
> > /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64
> > -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection
> > -boot c -drive
> > file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none
> > -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev
> > tap,id=hostnet1 -device
> > virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device
> > usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial
> > unix:/home/seri,nowait,server
> > In guest:
> > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource   
> > kvm-clock
> > Steps:
> > 1. do S4 in guest
> > # echo disk>/sys/power/state
> > 2. after suspend, boot again to resume from s4
> > Actual Result:
> > Only got a black screen and monitor exited with:
> > (qemu) Guest moved used index from 36574 to 0
> > # nc -U /home/seri 
> > �could not read byte from child: Success
> 
> Hi,chayang
> 
>    Would you please retest it without virtio related devices.comment0 CML
> indicate that no virtio related devices.however,this comment includes virtio
> nic and virtio device.

Hi Junyi, Dor,
  It's my fault that I didn't keep my eye on the cli, but I tried 3 times without virtio devices, vnc still shows a black screen when resuming from S4 with kvm-clock but didn't complains "(qemu) Guest moved used index from 36574 to 0". This image is a fresh installed one with kernel 2.6.32-130.el6.x86_64
Steps:
1. boot rhel6.1-64 image
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 4 -name rhel6.1-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot nc -drive file=/root/RHEL-Server-6.1-64.qcow2,if=none,id=drive-ide0-0-0,media=disk,format=qcow2,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server 
2. connect with nc
# nc -U /home/seri 
3. echo disk > /sys/power/state
4. resume from S4
Actual result:
guest can suspend to S4, but got a black screen when resuming from S4.
# nc -U /home/seri 
�


Host info:
# rpm -qa|grep qemu
qemu-kvm-debuginfo-0.12.1.2-2.156.el6.x86_64
qemu-kvm-0.12.1.2-2.156.el6.x86_64
qemu-img-0.12.1.2-2.156.el6.x86_64
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
qemu-kvm-tools-0.12.1.2-2.156.el6.x86_64
# uname -r
2.6.32-130.el6.x86_64
processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 9600B Quad-Core Processor
stepping	: 3
cpu MHz		: 1150.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4587.42
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

----------------S4 without kvm-clock------------------------
boot same guest with same cli except adding "-cpu cpu64-rhel6,-kvmclock" to cli and clock=pmtmr to guest kernel line, seems guest succeed to suspend and resume from S4.
CLI:
/usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 4 -cpu cpu64-rhel6,-kvmclock -name rhel6.1-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot nc -drive file=/root/RHEL-Server-6.1-64.qcow2,if=none,id=drive-ide0-0-0,media=disk,format=qcow2,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server

Comment 10 juzhang 2011-04-14 13:02:07 UTC

Based on comment7 and comment9,reopen this issue temporary which propose to rhel6.2.any mistake,please correct.

Comment 11 Eduardo Habkost 2011-05-30 14:27:19 UTC

Moving to NEW, bugs assigned to virt-maint shouldn't be on ASSIGNED.

Comment 13 Zachary Amsden 2011-06-07 01:25:06 UTC

I totally misunderstood this bug.  The guest needs to be able to survive an S4 suspend of the HOST, but here we are doing S4 suspend of the GUEST.

Not sure why that is being done, but I'm pretty sure that has never been tested before with KVM clock.  There are a number of possible things that could go wrong here, we may be failing to re-register the kvm clock on boot and then end up using stale data.

In all probability, this is a guest bug and not going to be fixable from the hypervisor (which can't detect that the guest is doing an S4 resume).

Comment 14 Rik van Riel 2011-09-12 13:43:45 UTC

Time to figure out how exactly suspend & resume interact with the clock code.  Chances are the clock code is not designed to deal with the clock source changing address between S4 suspend and S4 resume...

Comment 16 Rik van Riel 2011-11-11 21:19:11 UTC

I just fixed some potentially related issues in bug 751742. Might as well look at this one next (after Jarod has built some official test kernels with the up to date clock code).

Comment 17 Rik van Riel 2011-11-14 20:46:22 UTC

Well, it reproduces here with the very latest RHEL6 kernel.

I caught a core dump and will be extracting serial console debugging info too, to see what is going on.

Comment 18 Dor Laor 2011-12-08 10:13:12 UTC

*** Bug 716706 has been marked as a duplicate of this bug. ***

Comment 19 Amit Shah 2012-02-08 10:15:30 UTC

So I've been trying a few things to fix this, and turns out the guest doesn't hang on resume, it merely stalls for a while.  It takes about 15 seconds on my testing for the guest to resume after the hibernation image is loaded.

The guest's timekeeping does go off after resume, though.  Also, a second s4 attempt freezes the resumed guest for a very long period.  I gave up waiting for the guest to unstall after about 20 minutes.

However, the stall vs hang thing clarifies things a lot.

Comment 20 Dor Laor 2012-02-08 21:34:56 UTC

Amit, Marcelo said that the page that the guest & host share for the pvclock is not being synced post resume - so its a basic issue we need to first to start w/

Comment 21 Amit Shah 2012-02-09 12:23:47 UTC

(In reply to comment #20)
> Amit, Marcelo said that the page that the guest & host share for the pvclock is
> not being synced post resume - so its a basic issue we need to first to start
> w/

I just saw Marcelo's patch upstream but that's not enough; I have a similar patch cooked up for a while but it still doesn't solve the issue for me completely.  I'll take this discussion upstream.

Comment 23 Marcelo Tosatti 2012-02-16 17:02:05 UTC

Created attachment 562544 [details]
rhel6-kvmclock-s4.patch

Comment 24 Aristeu Rozanski 2012-03-12 13:50:24 UTC

Patch(es) available on kernel-2.6.32-251.el6

Comment 27 Qunfang Zhang 2012-04-27 05:05:44 UTC

Test on kernel-2.6.32-262.el6 and repeat more than 20 times with rhel6.3 32bit and 64bit with kvmclock.  Guest can hibernate and resume successfully.

Related packages:
kernel-2.6.32-262.el6.x86_64 (for both host and guest)
qemu-kvm-0.12.1.2-2.285.el6.x86_64
seabios-0.6.1.2-19.el6.x86_64

CLI:

/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -uuid d782bf5c-e817-411b-a9cf-545ae7c0f101 -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/tmp/socket-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -spice port=5930,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -bios /usr/share/seabios/bios-pm.bin  -chardev socket,path=/tmp/qzhang-test,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1

So this bug is fixed.

Comment 28 Qunfang Zhang 2012-04-27 05:11:32 UTC

Based on above comment 27, setting to VERIFIED.

Comment 29 juzhang 2012-05-02 03:35:16 UTC

Set this issue against on kernel component since this issue is fixed in kernel part.

Comment 31 RHEL Program Management 2012-05-02 03:49:57 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 33 errata-xmlrpc 2012-06-20 07:39:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html