Bug 888626

Summary: Windows guest consumes higher cpu usages when do S3/S4 with qemu-ga service enabled
Product: Red Hat Enterprise Linux 6 Reporter: Qunfang Zhang <qzhang>
Component: qemu-kvmAssignee: Jeff Cody <jcody>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, amit.shah, areis, bsarathy, jcody, juzhang, lcapitulino, lnovich, michen, mkenneth, rhod, sluo, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-05 22:15:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 912287    

Description Qunfang Zhang 2012-12-19 03:23:46 UTC
Description of problem:
Boot a win7 guest and install the windows qemu-ga.exe, and then implement S3/S4.
(1) S3: Guest consumes 100% cpu (-smp 2) after resume, and sometimes guest can not resume.
(2) S4: Guest consumes 200% cpu (-smp 2), guest failed to suspend to disk (I wait for about 5 mins).
(3) Stop qemu-ga service and do s3/s4, work well.

Version-Release number of selected component (if applicable):
Host:
kernel-2.6.32-348.el6.x86_64
qemu-kvm-0.12.1.2-2.346.el6.x86_64

Guest:
Win7-32, installed the executable from qemu-guest-agent-win32-0.12.1.2-2.346.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot a win7 guest.

 /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu SandyBridge -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name win7 -uuid 255874cf-ceee-458a-b9e7-757dcf4d97bb -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/home/win7-32-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=disk0,id=disk0  -drive file=/home/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:5E,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -spice port=5930,disable-ticketing -vga qxl -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global  PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Install the qemu-guest-agent-win32-0.12.1.2-2.346.el6.x86_64 on a rhel host and get the executable.

3. Install the qemu-ga.exe inside windows guest.
#c:\qemu-ga>qemu-ga.exe --service install

4. In guest:
In guest, open "services.msc" and check if qemu-ga service is started. If not, start the qemu-ga service.

5. Suspend guest to mem or disk.
  
Actual results:
(1) S3: Guest consumes 100% cpu (-smp 2) after resume, and sometimes guest can not resume.
(2) S4: Guest consumes 200% cpu (-smp 2), guest failed to suspend to disk, it always stuck on the block screen. (I wait for about 5 mins).

Expected results:
Guest S3/S4 should work well with qemu-ga service started.

Additional info:

Comment 2 Qunfang Zhang 2012-12-19 03:37:06 UTC
Host top info after resume from S3:

Swap:  8028152k total,     1032k used,  8027120k free,  4249044k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                    
32152 root      20   0 2608m 2.1g 5432 R 120.3 28.1   1:59.91 qemu-kvm     


Host top info during S4 (always stuck on the suspend stage, not finish)

top - 19:20:44 up 1 day,  6:05,  6 users,  load average: 1.70, 0.85, 0.36
Tasks: 225 total,   2 running, 223 sleeping,   0 stopped,   0 zombie
Cpu(s): 48.4%us,  1.7%sy,  0.0%ni, 31.5%id, 18.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   7883316k total,  6805156k used,  1078160k free,   151508k buffers
Swap:  8028152k total,     1152k used,  8027000k free,  4083496k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                    
23749 root      20   0 2410m 2.0g 4572 S 202.5 26.5   2:06.45 qemu-kvm

Comment 3 Amit Shah 2012-12-19 06:00:23 UTC
Can you please test Linux guest as well?

Comment 4 Qunfang Zhang 2012-12-19 06:14:44 UTC
Amit, 
This issue only happens on windows guest. 

Thanks,
Qunfang

Comment 5 Qunfang Zhang 2012-12-20 08:07:57 UTC
This issue also happens after repeat other qemu-ga command.

After finish:
 for i in $(seq 1 1000) ; do echo { "execute": "guest-ping"} | nc -U /tmp/qga.sock ; sleep 2; echo $i;  done

qemu-ga process inside guest will consume 25% host cpu resource. Before running the above script, the qemu-ga cpu usage is 0.  Even the script has been finished for a few mins, the cpu usage is still 25%.

Comment 6 Qunfang Zhang 2012-12-20 08:59:23 UTC
Implement S3 will easily reproduce the issue. And running the script in comment 5 just randomly reproduce. 
And I tested win7-32&64, winXP-32, win2k8-r2, all have the problem.

Comment 17 Ademar Reis 2014-06-05 22:15:26 UTC
S3/S4 support is tech-preview in RHEL6 and it'll be promoted to fully supported
at some point, but only in RHEL7.

Therefore we're closing all S3/S4 related bugs in RHEL6. New bugs will be
considered only if they're regressions or break some important use-case or
certification.

RHEL7 is being more extensively tested and effort from QE is underway in
certifying that this particular bug is not present there.

Please reopen with a justification if you believe this bug should not be
closed. We'll consider them on a case-by-case basis following a best effort
approach.


Thank you.