Bug 1049858
Summary: | Guest agent command hang there after restore the guest from the save file | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | zhenfeng wang <zhwang> | ||||||
Component: | qemu-kvm | Assignee: | Ademar Reis <areis> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.5 | CC: | acathrow, ajia, amit.shah, areis, bsarathy, dyuan, gsun, jdenemar, jiahu, juzhang, juzhou, marcel, mkenneth, mzhan, qzhang, sluo, virt-maint, ydu, zhwang | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1049860 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-06-05 22:16:45 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 912287 | ||||||||
Attachments: |
|
Description
zhenfeng wang
2014-01-08 11:33:58 UTC
Can you provide libvirtd logs? Created attachment 847502 [details]
The libvirtd's log while the guest agent hang
According to libvirt logs qemu-agent responded to "guest-sync" command and libvirt is waiting for "guest-suspend-ram" command to either return an error or result in a suspended domain. This is a known issue with qemu-agent design and our interaction with it which is covered by bug 1028927. The question is why qemu-agent does not report any error while still failing to actually suspend the guest. I'm moving this bug to qemu-kvm for further investigation. BTW, what OS runs in the guest? And does changing it (as in RHEL6 vs. RHEL7) make any difference? *** Bug 1049860 has been marked as a duplicate of this bug. *** Hi jiri My guest os is a rhel6 guest, and i can also reproduce this issue in rhel7 with a rhel7 guest, I will attach rhel7's libivrtd log to the attachment later. BTW, The bug 1028927 was cloned from bug 890648, As Peter said in bug 890648 in commen12 that the issue in this bug was a separate issue with bug 890648, so i think the bug 1028927 can't cover the bug 1049860, we should still regard it as a bug which cloned from this bug and continue use it to trace this issue in rhel7 ,right? please help recheck it thanks. Ah, thanks. In that case, I'll reopen 1049860 as this issue affects both RHEL-6 and RHEL-7 guests. Created attachment 847551 [details]
The libvirtd's log while the guest agent hang in rhel7 host
Tried this issue using savevm/loadvm with virt-agent which guest agent commands did not hang after restore the guest from the save. host info: # uname -r && rpm -q qemu-kvm-rhev 2.6.32-448.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64 guest info: 2.6.32-448.el6.x86_64 qemu-guest-agent-0.12.1.2-2.424.el6.x86_64 Steps: 1.launch a QEMU guest. # /usr/libexec/qemu-kvm -M pc -S -cpu SandyBridge -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -drive file=/home/RHEL6.5-20131019.1_Server_x86_64.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,vectors=0,bus=pci.0,addr=0x4,scsi=off,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:01:02:B6:40:21,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :2 -spice disable-ticketing,port=5932 -monitor stdio 2.install guest agent RPM and start the virtagent service in guest. # service qemu-ga restart Stopping qemu-ga: [ OK ] Starting qemu-ga: [ OK ] 3.check thevirtagent if work well. # nc -U /tmp/qga.sock {"execute":"guest-ping"} {"return": {}} 4.savevm. (qemu) savevm snap1 (qemu) info status VM status: running 5.check thevirtagent if work well. {"execute":"guest-ping"} {"return": {}} 6.loadvm. (qemu) loadvm snap1 inputs_detach_tablet: (qemu) info status VM status: running 7.check thevirtagent if work well. {"execute":"guest-ping"} {"return": {}} {"execute":"guest-shutdown"} <-----shutdown guest correctly. Best Regards, sluo Also tried the windows guest (win7 64bit), the results is same to comment #8. areis, does the virsh save/restore equal to QEMU savevm/loadvm ? As i cann't reproduce this issue with QEMU command line. host info: # uname -r && rpm -q qemu-kvm-rhev 2.6.32-448.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64 guest info: win7 64bit qemu-ga-win-7.0-8 virtio-win-prewhql-0.1-79 Best Regards, sluo (In reply to Sibiao Luo from comment #9) > Also tried the windows guest (win7 64bit), the results is same to comment #8. > > areis, does the virsh save/restore equal to QEMU savevm/loadvm ? As i cann't > reproduce this issue with QEMU command line. Jiri, can you please tell us what's the difference between the direct save/restore procedure (by hand) and the save/restore feature from virsh? > > host info: > # uname -r && rpm -q qemu-kvm-rhev > 2.6.32-448.el6.x86_64 > qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64 > guest info: > win7 64bit > qemu-ga-win-7.0-8 > virtio-win-prewhql-0.1-79 > savevm/loadvm is used by libvirt for doing snapshots, which is a bit different. The save/restore virsh commands make use of migration. That is, save will migrate the domain to a file and restore will take the file and feed it to a new qemu process with -incoming option. (In reply to Jiri Denemark from comment #11) > savevm/loadvm is used by libvirt for doing snapshots, which is a bit > different. Yes, savevm/loadvm just for snapshots. > The save/restore virsh commands make use of migration. That is, > save will migrate the domain to a file and restore will take the file and > feed it to a new qemu process with -incoming option. I also tried the offline migration which savevm/loadvm to an external state file with both rhel6 and win7 64bit guests which virtagent still work well after offline migration, So I don't think it was qemu-kvm issue, maybe zhwang used the wrong qemu-ga-win package. host info: # uname -r && rpm -q qemu-kvm-rhev 2.6.32-448.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64 guest info: rhel6: kernel-2.6.32-448.el6.x86_64 win7 64bit qemu-ga-win-7.0-8 virtio-win-prewhql-0.1-79 1.boot up a KVM guest with virtagent. e.g.:...-device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 2.check the virtagent if work. # nc -U /tmp/qga.sock {"execute":"guest-ping"} 3.save VM state into a compressed file. (qemu) stop (qemu) info status VM status: paused (qemu) migrate "exec:gzip -c > /home/sluo.gz" 4.load VM state from the compressed file. <qemu-command-line> -incoming "exec: gzip -c -d /home/sluo.gz" (qemu) info status VM status: running 5.check the virtagent if work. # nc -U /tmp/qga.sock {"execute":"guest-ping"} {"execute":"guest-shutdown"} Results: after step 2, virtagent work correctly. # nc -U /tmp/qga.sock {"execute":"guest-ping"} {"return": {}} after step 5, virtagent still work correctly. # nc -U /tmp/qga.sock {"execute":"guest-ping"} {"return": {}} {"execute":"guest-shutdown"} <-------shutdown guest successfully. Best Regards, sluo (In reply to zhenfeng wang from comment #0) > Version: > virtio-win-1.6.7-2.el6.noarch Maybe the qemu-ga-win package (virtio-win-1.6.7-2.el6) you used was too old, we have so much change for qemu-ga-win recently, you should use the latest qemu-ga-win package for your libvirt ordinary testing, like: https://brewweb.devel.redhat.com/packageinfo?packageID=44209 Please refer to comment #12, and could you help check if work well using above latest qemu-ga-win package, thanks. > qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64 > kernel-2.6.32-432.el6.x86_64 > libvirt-0.10.2-29.el6_5.2.x86_64 Best Regards, sluo Hi Sibiao Maybe this bug didn't have relationship with virtio-win package, since i met this issue on the rhel guest originally and i can also reproduce this bug on libvirt side with rhel6 guest on rhel6 host, the following was my reproduce steps pkg info libvirt-0.10.2-29.el6_5.7.x86_64 kernel-2.6.32-431.14.1.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.424.el6.x86_64 qemu-guest-agent-0.12.1.2-2.424.el6.x86_64 steps 1.Prepare a guest with qemu-ga env, start the guest #virsh start rhel6n # ps aux|grep rhel6n qemu 25257 1.9 0.8 1729292 288924 ? Sl 11:21 0:05 /usr/libexec/qemu-kvm -name rhel6n -S -M rhel6.5.0 -enable-kvm -m 1024 -realtime mlock=off -smp 2,maxcpus=3,sockets=3,cores=1,threads=1 -uuid ce3a1930-0c66-9229-8cf8-6aad9da6ade1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6n.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/rhel6n.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8a:31:05,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/rhel6n.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming fd:24 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 2. Run the following cmds, The guest agent will hang there after i restore the guest from the save file # virsh dompmsuspend rhel6n --target mem Domain rhel6n successfully suspended # virsh list Id Name State ---------------------------------------------------- 5 rhel6n pmsuspended # virsh dompmwakeup rhel6n Domain rhel6n successfully woken up # virsh list Id Name State ---------------------------------------------------- 5 rhel6n running # virsh save rhel6n /tmp/rhel6n123.save Domain rhel6n saved to /tmp/rhel6n123.save # virsh restore /tmp/rhel6n123.save Domain restored from /tmp/rhel6n123.save # virsh dompmsuspend rhel6n --target mem ^C 3.check the guest's status in another terminal # virsh list Id Name State ---------------------------------------------------- 4 rhel6n running 4.login the guest, then do the S3 inside the guest, the guest will wakeup automatically immediately after i do the S3 operation inside the guest guest#pm-suspend Hi amit, Could you help check this bug whether is ACPI problem while not the qemu-ga issue? thanks in advance. Best Regards, sluo I'm sorry I haven't been able to get to this; I hope Marcel can provide more info. I am going to try to reproduce this without the qemu agent in the next few days. S3/S4 support is tech-preview in RHEL6 and it'll be promoted to fully supported at some point, but only in RHEL7. Therefore we're closing all S3/S4 related bugs in RHEL6. New bugs will be considered only if they're regressions or break some important use-case or certification. RHEL7 is being more extensively tested and effort from QE is underway in certifying that this particular bug is not present there. Please reopen with a justification if you believe this bug should not be closed. We'll consider them on a case-by-case basis following a best effort approach. Thank you. SInce the bug is closed, I stopped looking into it. |