Bug 816893 - qemu-ga: commands may fail before a 'guest-ping'
qemu-ga: commands may fail before a 'guest-ping'
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.3
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Luiz Capitulino
Virtualization Bugs
:
: 826696 (view as bug list)
Depends On:
Blocks: 804141 820481 822062 831387
  Show dependency treegraph
 
Reported: 2012-04-27 05:10 EDT by Qunfang Zhang
Modified: 2013-02-21 02:51 EST (History)
16 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.297.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 02:34:30 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
when second time do S4 (24.63 KB, image/png)
2012-10-09 22:52 EDT, langfang
no flags Details

  None (edit)
Description Qunfang Zhang 2012-04-27 05:10:16 EDT
Description of problem:
Boot guest and start "qemu-ga" service inside guest. Then send some commands to guest for example {"execute":"guest-suspend-ram"} or {"execute":"guest-suspend-disk"}. But it prompts "Unsupported" error instead of suspend guest and prompts {"return": {}}. 
Then if I send {"execute":"guest-ping"} and re-send {"execute":"guest-suspend-ram"} again, it works. 
This issue happens when start qemu-ga service inside guest.
If I start command "#qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0" instead of start the qemu-ga service, have no this problem.

Version-Release number of selected component (if applicable):
Guest: 
qemu-guest-agent-0.12.1.2-2.285.el6.x86_64
kernel-2.6.32-262.el6.x86_64

Host:
kernel-2.6.32-262.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.285.el6.x86_64
seabios-0.6.1.2-19.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot a guest with virtio serial 
 /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -uuid d782bf5c-e817-411b-a9cf-545ae7c0f101 -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -bios /usr/share/seabios/bios-pm.bin  -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device  virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0


2. Install qemu-guest-agent-0.12.1.2-2.285.el6.x86_64 inside guest

3. Inside guest: #service qemu-ga start

4. On host: 
#nc -U /tmp/qga.sock
{"execute":"guest-suspend-ram"}

5. On host:
{"execute":"guest-ping"}
{"execute":"guest-suspend-ram"}
  
Actual results:
After step 4: 
{"execute":"guest-suspend-ram"}
{"error": {"class": "Unsupported", "data": {}}}

After step 5:
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-suspend-ram"}
{"return": {}}

Guest suspend to mem correctly.

Expected results:
After step 4, guest can suspend to mem successfully.

Additional info:
If:
Step 3:
#qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0
(instead of "service qemu-ga start")
This problem will be gone.
Comment 1 Luiz Capitulino 2012-04-27 08:56:07 EDT
Is this 100% reproducible or is it difficult to reproduce? If it's difficult to reproduce, then this is likely to be bug 805533.
Comment 2 Luiz Capitulino 2012-04-27 08:57:21 EDT
By the way, the bug description says "some commands", have you tested this against other commands or does it only happen with guest-suspend-ram?
Comment 3 Alex Jia 2012-04-27 11:35:20 EDT
(In reply to comment #1)
> Is this 100% reproducible or is it difficult to reproduce? If it's difficult to
> reproduce, then this is likely to be bug 805533.

Hi Luiz,
I originally found the bug in libvirt side(bug 766958, see comment 48 ), if I started qema-ga as a service such as 'service qemu-ga start' then I can 100% reproduce the issue.

Regards,
Alex
Comment 4 Luiz Capitulino 2012-04-27 14:20:22 EDT
Ok, I'll investigate this soon.
Comment 5 Qunfang Zhang 2012-04-27 22:56:54 EDT
(In reply to comment #1)
> Is this 100% reproducible or is it difficult to reproduce? If it's difficult to
> reproduce, then this is likely to be bug 805533.
Yes as Alex replied, if using qemu-ga service, it's 100% reproduced.

(In reply to comment #2)
> By the way, the bug description says "some commands", have you tested this
> against other commands or does it only happen with guest-suspend-ram?

The following commands will not work before a 'guest-ping':
{"execute":"guest-suspend-ram"}
{"error": {"class": "Unsupported", "data": {}}}

{"execute":"guest-suspend-disk"}
{"error": {"class": "Unsupported", "data": {}}}

{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}

{"execute":"guest-sync"}
{"error": {"class": "InvalidParameterType", "data": {"name": "id", "expected": "integer"}}}

Other supported commands works before 'guest-ping'.
Comment 6 Luiz Capitulino 2012-05-02 12:45:17 EDT
Thanks a lot Qunfang for the clarification.

Haven't started looking at this yet, is this urgent?
Comment 7 Daniel Veillard 2012-05-08 03:51:03 EDT
C.f. the comment 63 on bug 766958 that's annoying for QE testing of the
S4 feature at least !

Daniel
Comment 8 Luiz Capitulino 2012-05-09 16:22:12 EDT
I've started looking at this today, so let me update you about my progress.

First, yes, it's a qemu-ga bug and I can reproduce it on RHEL6.3 and on upstream. It took very long for me to debug this because most debugging code I added made the bug go away. Even enabling logging on qemu-ga upstream setup makes the bug go away.

One important information is that the bug only happens when --daemon is passed to qemu-ga. That's probably why it works if you run it by hand. Another important info is that I already know what causes the Unsupported error to be returned, although I don't know why nor why --daemon is related. Lastly, a call to g_logv() seems to serialize things and that's why guest-ping makes things work.

We have three options here:

1. keep investigating the real cause and propose a fix
2. (test and if it works) backport the "make guest-shutdown and guest-suspend-* synchronous" patches (not posted upstream yet though)
3. Add a hack to libvirt to issue guest-ping before calling the suspend functions

As I'm almost sure this is a race or stupid bug in bios_supports_mode() and as the series mentioned in item 2 re-works that function entirely, I really think that that's going to be the upstream fix.

In any case, I'll debug this a bit more tomorrow and could do item 1 for RHEL6.3 if it turns out to be simple.

Item 3 should be our last resource.
Comment 9 Luiz Capitulino 2012-05-09 16:32:42 EDT
Oh, knew it could be something "stupid". Looks like we're messing with fds in qemu-ga. Will only have time to fully confirm this tomorrow though.
Comment 10 Luiz Capitulino 2012-05-10 15:56:56 EDT
Yes, this is caused by a bug on how qemu-ga handles its fds. I've posted the fix upstream (will post the link here when it appears in the archive).

Now, we're past snapshot3 for RHEL6.3 and for now on only blockers will be accepted. This obviously isn't a blocker, but will certainly impact suspend testing. So I'll check if it's possible to get this into a z-stream, otherwise this will have to be moved to 6.4.
Comment 12 Luiz Capitulino 2012-05-10 17:08:50 EDT
Upstream fix:

http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg01507.html
Comment 14 Luiz Capitulino 2012-05-14 14:25:48 EDT
As this one has missed the deadline, I'll get it fixed for 6.4 first and then will propose it for 6.3.z.

This is not urgent, but I think it can hurt qemu-ga testing.
Comment 15 Luiz Capitulino 2012-05-31 10:22:19 EDT
*** Bug 826696 has been marked as a duplicate of this bug. ***
Comment 20 langfang 2012-10-09 06:45:14 EDT
test this bug as follow version:
host:
# uname -r
2.6.32-315.el6.x86_64
rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.320.el6.x86_64
guest:
# uname -r
2.6.32-325.el6.x86_64

steps:
1.boot guest:
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -enable-kvm -uuid `uuidgen` -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=04:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6  -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device  virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0  -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
2.Inside guest: #service qemu-ga start

3. On host:
nc -U /tmp/qga.sock
{"execute":"guest-suspend-ram"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-suspend-ram"}--->do S3
{"return": {}}
{"execute":"guest-suspend-disk"}--->do S4
{"return": {}}

4.boot guest with same CLI (guest resume from S4)
5.Inside guest: 
#service qemu-ga status
qemu-ga (pid  2227) is running ...
6.on host
# nc -U /tmp/qga.sock
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}
{"return": {}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}--->do S3
{"return": {}}



above above test ,this issue still exist ,so reassign this bug.
Comment 21 Luiz Capitulino 2012-10-09 08:49:33 EDT
Which version of qemu-ga you installed in the *guest*? You should install qemu-guest-agent .297 or later.
Comment 22 langfang 2012-10-09 22:49:53 EDT
Luiz ,thank very much your reminder.

verify this bug again as follow version:
host:
# uname -r
2.6.32-315.el6.x86_64
rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.320.el6.x86_64
guest:
# uname -r
2.6.32-325.el6.x86_64
qemu-guest-agent-0.12.1.2-2.321.el6.x86_64


steps:
1.boot guest 
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -enable-kvm -uuid `uuidgen` -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=04:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6  -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device  virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0  -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
2.Inside guest: #service qemu-ga start

3. On host:
# nc -U /tmp/qga.sock
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-sync"}
{"error": {"class": "InvalidParameterType", "data": {"name": "id", "expected": "integer"}}}
{"execute":"guest-suspend-ram"}--->do S3
{"execute":"guest-suspend-disk"}---->do S4
4.boot guest with same CLI
inside guest:
#service qemu-ga status
qemu-ga (pid  2227) is running ...

on host:
# nc -U /tmp/qga.sock
{"execute":"guest-sync","arguments":{"id":1234}}
{"return": 1234}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}-->second time do S3
{"execute":"guest-suspend-disk"}--->second time do S4,guest hang,see the attachment about show on guest


above test ,when execute "service qemu-ga start" ,guest can suspend to mem/disk successfully first time,but when after resume,do S4 second time,guest hang.
Comment 23 langfang 2012-10-09 22:52:15 EDT
Created attachment 624476 [details]
when second time do S4
Comment 24 langfang 2012-10-09 23:11:06 EDT
when guest hang.

#top
Tasks: 161 total,   1 running, 160 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1%us,  0.0%sy,  0.0%ni, 74.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7509328k total,  5932360k used,  1576968k free,    48968k buffers
Swap: 58720240k total,      628k used, 58719612k free,  3930996k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                          
 1968 root      20   0 8764m 1.5g 4408 S 101.4 21.6   2:29.89 qemu-kvm  


addinfo:
tied step2 on comment22 to use 
#qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0
(instead of "service qemu-ga start")
also have the problem



attention:when guest hang ,please wait about 2 min,guest will show Call Trce.

do you think we need report another bug to track the problem?
Comment 25 langfang 2012-10-09 23:21:32 EDT
from above test ,i will change this bug to verify ,about i hit the problem,i will find whether there have a exist bug or new issue.thanks
Comment 26 Luiz Capitulino 2012-10-10 09:36:07 EDT
Yes, the issue you found is unrelated to qemu-ga and this bug. It's either, a qemu issue or a guest kernel issue.

Please, open a new bz for it. I also recommend reproducing the problem without qemu-ga (ie. doing echo mem > /sys/power/state directly), as this will make it easier to investigate the problem.
Comment 27 langfang 2012-10-12 01:05:52 EDT
(In reply to comment #26)
> Yes, the issue you found is unrelated to qemu-ga and this bug. It's either,
> a qemu issue or a guest kernel issue.
> 
> Please, open a new bz for it. I also recommend reproducing the problem
> without qemu-ga (ie. doing echo mem > /sys/power/state directly), as this
> will make it easier to investigate the problem.

hi Luiz,thanks very much your suggestion,for this issue ,i have open a new bug https://bugzilla.redhat.com/show_bug.cgi?id=864780.
Comment 29 errata-xmlrpc 2013-02-21 02:34:30 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0527.html

Note You need to log in before you can comment on or make changes to this bug.