Bug 816893
| Summary: | qemu-ga: commands may fail before a 'guest-ping' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Qunfang Zhang <qzhang> | ||||
| Component: | qemu-kvm | Assignee: | Luiz Capitulino <lcapitulino> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.3 | CC: | acathrow, ajia, areis, bsarathy, bugproxy, dyasny, flang, jcody, jkachuck, juzhang, lcapitulino, michen, minovotn, mkenneth, veillard, virt-maint | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.297.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-21 07:34:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 804141, 820481, 822062, 831387 | ||||||
| Attachments: |
|
||||||
Is this 100% reproducible or is it difficult to reproduce? If it's difficult to reproduce, then this is likely to be bug 805533. By the way, the bug description says "some commands", have you tested this against other commands or does it only happen with guest-suspend-ram? (In reply to comment #1) > Is this 100% reproducible or is it difficult to reproduce? If it's difficult to > reproduce, then this is likely to be bug 805533. Hi Luiz, I originally found the bug in libvirt side(bug 766958, see comment 48 ), if I started qema-ga as a service such as 'service qemu-ga start' then I can 100% reproduce the issue. Regards, Alex Ok, I'll investigate this soon. (In reply to comment #1) > Is this 100% reproducible or is it difficult to reproduce? If it's difficult to > reproduce, then this is likely to be bug 805533. Yes as Alex replied, if using qemu-ga service, it's 100% reproduced. (In reply to comment #2) > By the way, the bug description says "some commands", have you tested this > against other commands or does it only happen with guest-suspend-ram? The following commands will not work before a 'guest-ping': {"execute":"guest-suspend-ram"} {"error": {"class": "Unsupported", "data": {}}} {"execute":"guest-suspend-disk"} {"error": {"class": "Unsupported", "data": {}}} {"execute":"guest-suspend-hybrid"} {"error": {"class": "Unsupported", "data": {}}} {"execute":"guest-sync"} {"error": {"class": "InvalidParameterType", "data": {"name": "id", "expected": "integer"}}} Other supported commands works before 'guest-ping'. Thanks a lot Qunfang for the clarification. Haven't started looking at this yet, is this urgent? C.f. the comment 63 on bug 766958 that's annoying for QE testing of the S4 feature at least ! Daniel I've started looking at this today, so let me update you about my progress. First, yes, it's a qemu-ga bug and I can reproduce it on RHEL6.3 and on upstream. It took very long for me to debug this because most debugging code I added made the bug go away. Even enabling logging on qemu-ga upstream setup makes the bug go away. One important information is that the bug only happens when --daemon is passed to qemu-ga. That's probably why it works if you run it by hand. Another important info is that I already know what causes the Unsupported error to be returned, although I don't know why nor why --daemon is related. Lastly, a call to g_logv() seems to serialize things and that's why guest-ping makes things work. We have three options here: 1. keep investigating the real cause and propose a fix 2. (test and if it works) backport the "make guest-shutdown and guest-suspend-* synchronous" patches (not posted upstream yet though) 3. Add a hack to libvirt to issue guest-ping before calling the suspend functions As I'm almost sure this is a race or stupid bug in bios_supports_mode() and as the series mentioned in item 2 re-works that function entirely, I really think that that's going to be the upstream fix. In any case, I'll debug this a bit more tomorrow and could do item 1 for RHEL6.3 if it turns out to be simple. Item 3 should be our last resource. Oh, knew it could be something "stupid". Looks like we're messing with fds in qemu-ga. Will only have time to fully confirm this tomorrow though. Yes, this is caused by a bug on how qemu-ga handles its fds. I've posted the fix upstream (will post the link here when it appears in the archive). Now, we're past snapshot3 for RHEL6.3 and for now on only blockers will be accepted. This obviously isn't a blocker, but will certainly impact suspend testing. So I'll check if it's possible to get this into a z-stream, otherwise this will have to be moved to 6.4. As this one has missed the deadline, I'll get it fixed for 6.4 first and then will propose it for 6.3.z. This is not urgent, but I think it can hurt qemu-ga testing. *** Bug 826696 has been marked as a duplicate of this bug. *** test this bug as follow version:
host:
# uname -r
2.6.32-315.el6.x86_64
rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.320.el6.x86_64
guest:
# uname -r
2.6.32-325.el6.x86_64
steps:
1.boot guest:
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -enable-kvm -uuid `uuidgen` -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=04:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
2.Inside guest: #service qemu-ga start
3. On host:
nc -U /tmp/qga.sock
{"execute":"guest-suspend-ram"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-ping"}
{"return": {}}
{"execute":"guest-suspend-ram"}--->do S3
{"return": {}}
{"execute":"guest-suspend-disk"}--->do S4
{"return": {}}
4.boot guest with same CLI (guest resume from S4)
5.Inside guest:
#service qemu-ga status
qemu-ga (pid 2227) is running ...
6.on host
# nc -U /tmp/qga.sock
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}
{"return": {}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}--->do S3
{"return": {}}
above above test ,this issue still exist ,so reassign this bug.
Which version of qemu-ga you installed in the *guest*? You should install qemu-guest-agent .297 or later. Luiz ,thank very much your reminder.
verify this bug again as follow version:
host:
# uname -r
2.6.32-315.el6.x86_64
rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.320.el6.x86_64
guest:
# uname -r
2.6.32-325.el6.x86_64
qemu-guest-agent-0.12.1.2-2.321.el6.x86_64
steps:
1.boot guest
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Penryn -enable-kvm -uuid `uuidgen` -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/RHEL-Server-6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=04:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
2.Inside guest: #service qemu-ga start
3. On host:
# nc -U /tmp/qga.sock
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-sync"}
{"error": {"class": "InvalidParameterType", "data": {"name": "id", "expected": "integer"}}}
{"execute":"guest-suspend-ram"}--->do S3
{"execute":"guest-suspend-disk"}---->do S4
4.boot guest with same CLI
inside guest:
#service qemu-ga status
qemu-ga (pid 2227) is running ...
on host:
# nc -U /tmp/qga.sock
{"execute":"guest-sync","arguments":{"id":1234}}
{"return": 1234}
{"execute":"guest-suspend-hybrid"}
{"error": {"class": "Unsupported", "data": {}}}
{"execute":"guest-suspend-ram"}-->second time do S3
{"execute":"guest-suspend-disk"}--->second time do S4,guest hang,see the attachment about show on guest
above test ,when execute "service qemu-ga start" ,guest can suspend to mem/disk successfully first time,but when after resume,do S4 second time,guest hang.
Created attachment 624476 [details]
when second time do S4
when guest hang. #top Tasks: 161 total, 1 running, 160 sleeping, 0 stopped, 0 zombie Cpu(s): 25.1%us, 0.0%sy, 0.0%ni, 74.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7509328k total, 5932360k used, 1576968k free, 48968k buffers Swap: 58720240k total, 628k used, 58719612k free, 3930996k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1968 root 20 0 8764m 1.5g 4408 S 101.4 21.6 2:29.89 qemu-kvm addinfo: tied step2 on comment22 to use #qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0 (instead of "service qemu-ga start") also have the problem attention:when guest hang ,please wait about 2 min,guest will show Call Trce. do you think we need report another bug to track the problem? from above test ,i will change this bug to verify ,about i hit the problem,i will find whether there have a exist bug or new issue.thanks Yes, the issue you found is unrelated to qemu-ga and this bug. It's either, a qemu issue or a guest kernel issue. Please, open a new bz for it. I also recommend reproducing the problem without qemu-ga (ie. doing echo mem > /sys/power/state directly), as this will make it easier to investigate the problem. (In reply to comment #26) > Yes, the issue you found is unrelated to qemu-ga and this bug. It's either, > a qemu issue or a guest kernel issue. > > Please, open a new bz for it. I also recommend reproducing the problem > without qemu-ga (ie. doing echo mem > /sys/power/state directly), as this > will make it easier to investigate the problem. hi Luiz,thanks very much your suggestion,for this issue ,i have open a new bug https://bugzilla.redhat.com/show_bug.cgi?id=864780. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0527.html |
Description of problem: Boot guest and start "qemu-ga" service inside guest. Then send some commands to guest for example {"execute":"guest-suspend-ram"} or {"execute":"guest-suspend-disk"}. But it prompts "Unsupported" error instead of suspend guest and prompts {"return": {}}. Then if I send {"execute":"guest-ping"} and re-send {"execute":"guest-suspend-ram"} again, it works. This issue happens when start qemu-ga service inside guest. If I start command "#qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0" instead of start the qemu-ga service, have no this problem. Version-Release number of selected component (if applicable): Guest: qemu-guest-agent-0.12.1.2-2.285.el6.x86_64 kernel-2.6.32-262.el6.x86_64 Host: kernel-2.6.32-262.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.285.el6.x86_64 seabios-0.6.1.2-19.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot a guest with virtio serial /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -uuid d782bf5c-e817-411b-a9cf-545ae7c0f101 -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -vnc :10 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -bios /usr/share/seabios/bios-pm.bin -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 2. Install qemu-guest-agent-0.12.1.2-2.285.el6.x86_64 inside guest 3. Inside guest: #service qemu-ga start 4. On host: #nc -U /tmp/qga.sock {"execute":"guest-suspend-ram"} 5. On host: {"execute":"guest-ping"} {"execute":"guest-suspend-ram"} Actual results: After step 4: {"execute":"guest-suspend-ram"} {"error": {"class": "Unsupported", "data": {}}} After step 5: {"execute":"guest-ping"} {"return": {}} {"execute":"guest-suspend-ram"} {"return": {}} Guest suspend to mem correctly. Expected results: After step 4, guest can suspend to mem successfully. Additional info: If: Step 3: #qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0 (instead of "service qemu-ga start") This problem will be gone.