Bug 996038

Summary:

it takes 8~30 minutes or more to resume rhel guest from S4

Product:

Red Hat Enterprise Linux 6

Reporter:

Chao Yang <chayang>

Component:

qemu-kvm

Assignee:

Marcel Apfelbaum <marcel>

Status:

CLOSED WONTFIX

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.5

CC:

acathrow, amit.shah, bsarathy, chayang, dyuan, flang, juzhang, michen, mkenneth, qzhang, rhod, shuang, shu, virt-maint, zhwang

Target Milestone:

Keywords:

Regression

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-03-23 12:50:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

912287, 1056252

Attachments:

Description	Flags
dmesg from serial	none

Description Chao Yang 2013-08-12 09:29:14 UTC

Description of problem:
Booted a rhel6.4 guest with newer kernel installed, suspended to disk, then tried to resume it, but it took a long time to get resumed. 

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.382.el6.x86_64
seabios-0.6.1.2-28.el6.x86_64
2.6.32-407.el6.x86_64(both guest and host)


How reproducible:
100%

Steps to Reproduce:
1. boot a rhel guest by:
 /usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 4,sockets=4,cores=2,threads=1,maxcpus=8 -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device ich9-usb-ehci1,id=ehci,addr=3.0 -drive file=/home/usb-storage.qcow2,if=none,id=usb-storage,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device usb-storage,drive=usb-storage,id=usb_storage_1 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=4.0 -chardev socket,id=channel0,path=/tmp/socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=port0 -device virtio-scsi-pci,id=scsi,addr=5.0 -drive file=/home/scsi-storage.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device scsi-disk,bus=scsi.0,drive=drive-virtio-disk0,id=virtio-disk0,lun=0 -drive file=/home/ide.qcow2,if=none,id=ide,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,drive=ide,bus=ide.1,unit=1 -drive file=/home/device_interrupt.raw,if=none,id=drive-virtio-0-0,media=disk,format=raw,cache=none -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0,bootindex=1,addr=6.0 -netdev tap,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=00:1a:4a:42:48:12,bus=pci.0,addr=a.0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:ab,bus=pci.0,addr=7.0 -spice port=5900,disable-ticketing,seamless-migration=on -k en-us -vga cirrus -vnc :1 -device intel-hda,id=sound0,bus=pci.0,addr=8.0 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=9.0 -monitor stdio -serial unix:/tmp/serial,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. suspend to disk by:
 echo disk > /sys/power/state

3. resume it with exactly same cli in step 1

Actual results:
It took more than 8 minutes. 

Expected results:


Additional info:
1. It looked like hung at "Suspending console(s) (use no_console_suspend to debug)" for long time.
2. Guest could catch up to real time after resuming from S4

Comment 1 Chao Yang 2013-08-12 09:29:56 UTC

Created attachment 785624 [details]
dmesg from serial

Comment 3 Chao Yang 2013-08-12 10:01:08 UTC

I also tested S3 with same cli, guest got resumed quickly instead of taking many minutes.

Comment 4 Ademar Reis 2013-08-12 22:37:12 UTC

(In reply to chayang from comment #0)
> Description of problem:
> Booted a rhel6.4 guest with newer kernel installed, suspended to disk, then
> tried to resume it, but it took a long time to get resumed. 
> 

Looks like a regression then, can you confirm? Even though we don't support S3/S4, it would be good to keep this use-case working, specially considering it's RHEL6/RHEL6 setup.

Comment 5 Chao Yang 2013-08-13 03:14:20 UTC

(In reply to Ademar de Souza Reis Jr. from comment #4)
> (In reply to chayang from comment #0)
> > Description of problem:
> > Booted a rhel6.4 guest with newer kernel installed, suspended to disk, then
> > tried to resume it, but it took a long time to get resumed. 
> > 
> 
> Looks like a regression then, can you confirm? Even though we don't support
> S3/S4, it would be good to keep this use-case working, specially considering
> it's RHEL6/RHEL6 setup.

Yes, this is a regression bug. 
Tried with qemu-kvm-0.12.1.2-2.355.el6.x86_64 using exactly same CLI in Comment 0. It took guest less than 20 seconds to resume from S4. Adding 'Regression' keyword.

Comment 7 Amit Shah 2013-08-13 11:09:29 UTC

Please narrow down the builds which introduce the regression.  Right now we know -355 is the good build and -382 is the bad one.

Comment 8 Chao Yang 2013-08-14 14:51:01 UTC

(In reply to Amit Shah from comment #7)
> Please narrow down the builds which introduce the regression.  Right now we
> know -355 is the good build and -382 is the bad one.

I retested with -382 and -381, this time it just took about 1 minute to resume. But with -377, it only took about 5 seconds to resume.

Comment 9 Qunfang Zhang 2013-08-15 02:50:10 UTC

Some of our guys say it takes more than half an hour to resume a guest from S4. So our cases including S4 steps will be blocked.

Comment 10 langfang 2013-08-15 03:11:51 UTC

Hit this problem on the latest version:
Host:
# uname -r
2.6.32-412.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.387.el6.x86_64

Guest:
2.6.32-412.el6.x86_64

Steps:
1.Boot guest:
 /usr/libexec/qemu-kvm -M rhel6.5.0 -m 4G -smp 2 -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=/mnt/rhel6.5-newinstall.qcow2,if=none,id=drive-scsi0-0-0,media=disk,cache=none,format=qcow2,werror=stop,rerror=stop,aio=native -device scsi-hd,drive=drive-scsi0-0-0,bus=scsi0.0,scsi-id=0,lun=0,id=flang,bootindex=1 -spice port=5840,disable-ticketing -vga qxl -qmp tcp:0:5556,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/tty0,server,nowait  -boot menu=on -monitor stdio -device virtio-balloon-pci,bus=pci.0,id=balloon0  -netdev tap,vhost=on,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,mac=00:10:20:2c:45:23,bus=pci.0,addr=0x4,id=net0 -drive file=/home/RHEL6.4-20130123.0-Server-x86_64-DVD1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw  -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7  -chardev socket,id=channel0,path=/tmp/serial,server,nowait -device virtserialport,chardev=channel0,name=org.linux-kvm.port.0,bus=virtio-serial0.0,id=port1

2.Do S4

Results:

Do S4(wait about 4 min,then qemu quit)-->Boot guest with same CLI---> wait about 20 min,still can not resume

Comment 13 Amit Shah 2013-08-21 08:31:01 UTC

(In reply to chayang from comment #8)
> (In reply to Amit Shah from comment #7)
> > Please narrow down the builds which introduce the regression.  Right now we
> > know -355 is the good build and -382 is the bad one.
> 
> I retested with -382 and -381, this time it just took about 1 minute to
> resume. But with -377, it only took about 5 seconds to resume.

The difference in 377 and 381 doesn't highlight anything that touches acpi or anything that should affect S4.

Comment 18 langfang 2013-08-27 12:13:36 UTC

Test this on latest version:

Host:
# uname -r 
2.6.32-414.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.398.el6.x86_64
# rpm -q seabios
seabios-0.6.1.2-28.el6.x86_64


Guest:Windows


Results:
win8(32)--->S4--->resume--->successfully
win8(64)--->S4--->resume---->successfully
win7(32)---->S4--->resume--->successfully
win7(64)--->S4-->resume--->BSOD

Bug 1001616 - win7-64 guest bsod while enter s4 state

Comment 22 Qunfang Zhang 2014-09-01 01:43:38 UTC

*** Bug 1135383 has been marked as a duplicate of this bug. ***