Bug 890648
Summary: | guest agent commands will hang if the guest agent crashes while executing a command | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zhenfeng wang <zhwang> | ||||||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.0 | CC: | cwei, dyuan, eblake, gsun, jdenemar, juzhang, mprivozn, mzhan, pkrempa, rbalakri, shyu, zhwang | ||||||||
Target Milestone: | rc | Keywords: | Upstream | ||||||||
Target Release: | 7.2 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libvirt-1.2.17-1.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 892079 970161 1028927 1080376 (view as bug list) | Environment: | |||||||||
Last Closed: | 2015-11-19 05:36:46 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 970161, 1122151 | ||||||||||
Bug Blocks: | 892079, 896690, 1028927, 1105185, 1167336, 1167392 | ||||||||||
Attachments: |
|
Description
zhenfeng wang
2012-12-28 13:04:05 UTC
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Created attachment 672705 [details]
The dll file about qemu-agent
Created attachment 672706 [details]
The guest's xml
I think this bug is fixed by this patch: https://www.redhat.com/archives/libvir-list/2013-January/msg00520.html However, the patch fixes bug 892079. What means, either this is clone of the other one or vice versa. I just report a new bug 928661 that Libvirtd crash when destroyed the linux guest which excuted a series of operations about S3 and save /restore. Maybe this bug have relationship with that one,you can reference it while fix this one. thanks The main reason that blocks the virsh command is that the qemu guest agent in windows crashes on the request to perform a suspend to disk. I created bug https://bugzilla.redhat.com/show_bug.cgi?id=970161 to track the issue. This might happen also with the linux guest agent or in case of a malicious guest that would ignore the command read after libvirt syncs with the guest agent. I'm changing the summary of this bug to reflect this. The fix for this bug would require major rework of the guest agent infrastructure. Additionally the problem is only limited to one guest that has problem with the guest agent and hasn't the potential to influence other guests and management connections. I'm moving it to 6.6 to re-evaluate the fix afterwards. I found the similar issues with steps from bug 928661 on rhel6.5. Version: libvirt-0.10.2-29.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.411.el6.x86_64 qemu-guest-agent-0.12.1.2-2.411.el6.x86_64.rpm kernel-2.6.32-421.el6.x86_64 1.# getenforce Enforcing 2.Prepare a guest with qemu-ga ENV,add below config to domain xml. ... <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> ... <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/r6.agent'/> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> ... [root@intel-5130-16-2 ~]# virsh list --all Id Name State ---------------------------------------------------- 30 r6 running 3. Run the following cmds [root@intel-5130-16-2 ~]# virsh dompmsuspend r6 --target mem Domain r6 successfully suspended [root@intel-5130-16-2 ~]# virsh dompmwakeup r6 Domain r6 successfully woken up [root@intel-5130-16-2 ~]# virsh save r6 /tmp/r6.save Domain r6 saved to /tmp/r6.save [root@intel-5130-16-2 ~]# virsh restore /tmp/r6.save Domain restored from /tmp/r6.save [root@intel-5130-16-2 ~]# virsh dompmsuspend r6 --target mem ^C <======hung here. [root@intel-5130-16-2 ~]# virsh save r6 /tmp/r6.save error: Failed to save domain r6 to /tmp/r6.save error: Timed out during operation: cannot acquire state change lock [root@intel-5130-16-2 ~]# virsh list --all Id Name State ---------------------------------------------------- 30 r6 running [root@intel-5130-16-2 ~]# Hi Peter Since the bug 970161 has been fixed, so this bug no longer depend on it. I just test this bug follow the comment 0 steps with the latest packet , found that both S3 and S4 can be done successfully, however the issue in comment 10 was still exsiting, so i wonder that will we continue to fix this issue in this bug or open another BZ to track this issue, please help have a look thanks. pkg info virtio-win-1.6.7-2.el6.noarch qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64 kernel-2.6.32-432.el6.x86_64 libvirt-0.10.2-29.el6_5.2.x86_64 steps 1.Prepare two guests one is win7 guest, another is rhel65 guest, the guest's xml was in the attachment # virsh list --all Id Name State ---------------------------------------------------- 11 win7 running 10 rhel65 running 2.Install the qemu-ga service in the two guests, then start the service 3.After the qemu-ga service start successfully, do the S3/S4 operation with the two guests, we can do it successfully # virsh dompmsuspend win7 --target mem Domain win72 successfully suspended # virsh list Id Name State ---------------------------------------------------- 4 win7 pmsuspended # virsh dompmwakeup win7 Domain win72 successfully woken up # virsh list Id Name State ---------------------------------------------------- 4 win7 running # virsh dompmsuspend win7 --target disk Domain win72 successfully suspended # virsh start win7 Domain win72 started # virsh list Id Name State ---------------------------------------------------- 5 win7 running 4.Re-do the step3's operation with the rhel65 guest, The rhel guest can also do the S3/S4 sucessfully 5.Do the operation follow the comment 10 steps with the rhel65 guest, we can get the same result with the comment 10 # virsh dompmsuspend rhel65 --target mem virsh Domain rhel65 successfully suspended # virsh dompmwakeup rhel65 Domain rhel65 successfully woken up # virsh save rhel65 /tmp/rhel65.save Domain rhel65 saved to /tmp/rhel65.save # virsh restore /tmp/rhel65.save Domain restored from /tmp/rhel65.save # virsh dompmsuspend rhel65 --target mem ^C <======hung here. # virsh save rhel65 /tmp/rhel65.save error: Failed to save domain rhel65 to /tmp/rhel65.save error: Timed out during operation: cannot acquire state change lock 6.Do the operation follow the comment 10 steps with the win7 guest, during test this scenario, we have to restart the guest agent service after resume from S3/S4 since bug 888694. and i met another issue that can't login the win7 guest while it restore from the save file # virsh dompmsuspend win7 --target mem Domain win7 successfully suspended # virsh dompmwakeup win7 Domain win7 successfully woken up Restart the guest agent service in the guest # virsh save win7 /tmp/1.save Domain win7 saved to /tmp/1.save # virsh restore /tmp/1.save Domain restored from /tmp/1.save # virsh list Id Name State ---------------------------------------------------- 18 rheltest2 running 24 win7 running Try to login the win7 guest, find we can't login it again Please open a new bug regarding the issue in the comment above. This bug is now tracking issues if guest agent crashes while libvirt is attempting to use it. Hi Peter Thanks for your response, have file bug for comment 11's issue (Bug1049858 &Bug 1049860) (In reply to zhenfeng wang from comment #11) > # virsh restore /tmp/rhel65.save > Domain restored from /tmp/rhel65.save > > # virsh dompmsuspend rhel65 --target mem > ^C <======hung here. > > # virsh save rhel65 /tmp/rhel65.save > error: Failed to save domain rhel65 to /tmp/rhel65.save > error: Timed out during operation: cannot acquire state change lock > In order to fix this bug libvirt needs to know if the qemu-ga is listening or not. Currently, we do something which is not bulletproof: prior to executing any real command that may change guest's state (e.g. dompmsuspend), libvirt pings the guest agent. If it replies, we know it's listening and then issue the real command. However, qemu-ga may be stopped meanwhile (e.g. be killed) resulting in libvirt being stuck on the domain (libvirt serializes status changing calls on a domain). This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product. *** Bug 1028927 has been marked as a duplicate of this bug. *** Upstream libvirt has a proposed solution that adds new events to inform libvirt of when the qga connection changes states (basically, when the guest opens or closes the device) - this could be used in libvirt to learn definitively when the agent has closed (probably crashed) and therefore libvirt need not wait forever for an answer from the agent. I don't know if it will make qemu 2.1, though. https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg06366.html (In reply to Eric Blake from comment #20) > Upstream libvirt has a proposed solution that adds new events to inform Make that: upstream qemu has proposed new events Upstream qemu event is in qemu 2.1; we can use the existence of VSERPORT_CHANGE in query-events to learn if we can rely on it: commit e2ae6159de2482ee5e22532301eb7f2795828d07 Author: Laszlo Ersek <lersek> Date: Thu Jun 26 17:50:02 2014 +0200 virtio-serial: report frontend connection state via monitor Libvirt wants to know about the guest-side connection state of some virtio-serial ports (in particular the one(s) assigned to guest agent(s)). Report such states with a new monitor event. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1080376 Signed-off-by: Laszlo Ersek <lersek> Reviewed-by: Eric Blake <eblake> Reviewed-by: Amit Shah <amit.shah> Signed-off-by: Luiz Capitulino <lcapitulino> It is also possible to poll the current state upon libvirtd reconnect if an event was missed: commit 32a97ea1711f43388e178b7c43e02143a61e47ee Author: Laszlo Ersek <lersek> Date: Thu Jun 26 17:50:03 2014 +0200 char: report frontend open/closed state in 'query-chardev' In addition to the on-line reporting added in the previous patch, allow libvirt to query frontend state independently of events. Libvirt's path to identify the guest agent channel it cares about differs between the event added in the previous patch and the QMP response field added here. The event identifies the frontend device, by "id". The 'query-chardev' QMP command identifies the backend device (again by "id"). The association is under libvirt's control. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1080376 Reviewed-by: Amit Shah <amit.shah> Signed-off-by: Laszlo Ersek <lersek> Reviewed-by: Eric Blake <eblake> Signed-off-by: Luiz Capitulino <lcapitulino> I met an issue that pm-suspend-disk command will hang there sometimes if the guest cpu numbers >=2, confirm it with peter, he said the issue was closely related to this bug , so trace this issue in this bug pkginfo kernel-3.10.0-212.el7.x86_64 libvirt-1.2.8-10.el7.x86_64 qemu-kvm-rhev-2.1.2-15.el7.x86_64 steps 1.Prepare a guest with 2 cpus and guest agent service installed #virsh dumpxml rhel7.0 -- <vcpu placement='static'>2</vcpu> -- <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> 2.Start the guest #virsh start rhel7.0 3.Excute S3 with the guest, then wakeup it # virsh dompmsuspend rhel7.0 --target mem Domain rhel7.0 successfully suspended # virsh dompmwakeup rhel7.0 Domain rhel7.0 successfully woken up # virsh list Id Name State ---------------------------------------------------- 20 rhel7.0 running 4.Excute S4 with the guest, it will hang there (sometimes it couldn't be reproduced during your first or second time , you need repeat the step 2~4 several time, then the issue happens) # virsh dompmsuspend rhel7.0 --target disk ^C 5.The s3/s4 could always excuted successfully while set guest cpu number=1 6.The following was log got from the libvirtd.log #cat /var/log/libvirt/libvirtd.log -- 2014-12-04 06:17:28.574+0000: 25783: debug : virObjectRef:296 : OBJECT_REF: obj=0x7fe180018860 2014-12-04 06:17:28.574+0000: 25783: error : qemuAgentIO:634 : internal error: End of file from monitor 2014-12-04 06:17:28.574+0000: 25783: debug : qemuAgentIO:667 : Error on monitor internal error: End of file from monitor 2014-12-04 05:50:59.000+0000: 23611: warning : qemuDomainObjBeginJobInternal:1391 : Cannot start job (modify, none) for domain rhel7.0; current job is (modify, none) owned by (23532, 0) 2014-12-04 05:50:59.000+0000: 23611: error : qemuDomainObjBeginJobInternal:1396 : Timed out during operation: cannot acquire state change lock Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2015-May/msg00143.html Moving to POST: commit 2af51483cc2fa43b70b41b4aaa88eeb77701f590 Author: Michal Privoznik <mprivozn> AuthorDate: Thu May 7 11:19:38 2015 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Thu May 7 11:31:17 2015 +0200 processSerialChangedEvent: Close agent monitor early https://bugzilla.redhat.com/show_bug.cgi?id=890648 So, imagine you've issued an API that involves guest agent. For instance, you want to query guest's IP addresses. So the API acquires QUERY_JOB, locks the guest agent and issues the agent command. However, for some reason, guest agent replies to initial ping correctly, but then crashes tragically while executing real command (in this case guest-network-get-interfaces). Since initial ping went well, libvirt thinks guest agent is accessible and awaits reply to the real command. But it will never come. What will is a monitor event. Our handler (processSerialChangedEvent) will try to acquire MODIFY_JOB, which will fail obviously because the other thread that's executing the API already holds a job. So the event handler exits early, and the QUERY_JOB is never released nor ended. The way how to solve this is to put flag somewhere in the monitor internals. The flag is called @running and agent commands are issued iff the flag is set. The flag itself is set when we connect to the agent socket. And unset whenever we see DISCONNECT event from the agent. Moreover, we must wake up all the threads waiting for the agent. This is done by signalizing the condition they're waiting on. Signed-off-by: Michal Privoznik <mprivozn> v1.2.15-43-g2af5148 Hi Michal I met an issue during verify this bug that the guest agent will stay in 'disconnected' status after wakeup a guest which configured 2 cpus from 'pmsuspended' status, can you help check it, thanks pkginfo libvirt-1.2.16-1.el7.x86_64 steps 1.Start a guest with 2 cpus with guest agent installing #virsh dumpxml rhel7.0 -- <vcpu placement='static'>2</vcpu> -- <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> 2.Do S3 with guest # virsh dompmsuspend rhel7.0 --target mem Domain rhel7.0 successfully suspended # virsh list Id Name State ---------------------------------------------------- 15 rhel7.0 pmsuspended #virsh dumpxml rhel7.0 -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> 3.Wakeup the guest, check the guest agent status with virsh dumpxml, found the guest agent was still in 'disconnected' status, also will fail to excute the commands which depend on guest agent # virsh dompmwakeup rhel7.0 Domain rhel7.0 successfully woken up #virsh dumpxml rhel7.0 -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend rhel7.0 --target mem error: Domain rhel7.0 could not be suspended error: Guest agent is not responding: QEMU guest agent is not connected 4.Restart libvirtd service or restart guest agent service inside guest will make the guest agent back to 'connected' status 5.The guest with 1 cpu will could work expectly. (In reply to zhenfeng wang from comment #29) > Hi Michal > I met an issue during verify this bug that the guest agent will stay in > 'disconnected' status after wakeup a guest which configured 2 cpus from > 'pmsuspended' status, can you help check it, thanks > > pkginfo > libvirt-1.2.16-1.el7.x86_64 Interesting. I'm unable to reproduce with this libvirt version. What's the qemu version? Can you please attach debug logs so that I can narrow down the problem? Thanks! 1.pkginfo qemu-kvm-rhev-2.3.0-2.el7.x86_64 qemu-guest-agent-2.3.0-1.el7.x86_64 2.The key operation to reproduce this issue was that you must have 2 cpus configured in your guest's xml, just like comment 29 3.Better try several times for the reproduce steps, especially do S3--> wakeup Created attachment 1042920 [details]
libvirt's log while guest agent lost control
https://bugzilla.redhat.com/show_bug.cgi?id=1236924 is probably related to this issue. I think the patch that fixes the problem was just sent to the list: https://www.redhat.com/archives/libvir-list/2015-June/msg01612.html I'm gonna review it. Until then, let's move this back to ASSIGNED. Patch has been pushed upstream: commit f1caa42777ff5433fb15f05f62d2ff717876eeac Author: Peter Krempa <pkrempa> AuthorDate: Tue Jun 30 10:46:50 2015 +0200 Commit: Peter Krempa <pkrempa> CommitDate: Tue Jun 30 13:18:02 2015 +0200 qemu: Close the agent connection only on agent channel events processSerialChangedEvent processes events for all channels. Commit 2af51483 broke all agent interaction if a channel other than the agent closes since it did not check that the event actually originated from the guest agent channel. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1236924 Fixes up: https://bugzilla.redhat.com/show_bug.cgi?id=890648 v1.2.17-rc1-5-gf1caa42 Hi Michal The issue in comment 29 was still exsiting while re-test it with libvirt-1.2.17-1.el7 with a guest with desktop installation. BTW, it works well with the guest without desktop installation,please help check it thanks. (In reply to zhenfeng wang from comment #36) > Hi Michal > The issue in comment 29 was still exsiting while re-test it with > libvirt-1.2.17-1.el7 with a guest with desktop installation. BTW, it works > well with the guest without desktop installation,please help check it thanks. what do you mean by 'desktop installation'? I mean the guest have graphical desktop hi Michal Any solution about my comment 36's issue, do you need me offer some info about it Sorry for the delay, I was debugging this issue. From my findings: 1) Finally, I've managed to successfully reproduce the issue 2) What's happening can be seen from this log snippet: 2015-07-16 08:32:38.818+0000: 7748: info : libvirt version: 1.2.17, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-07-10-07:33:51, x86-035.build.eng.bos.redhat.com) 2015-07-16 08:34:12.642+0000: 7751: debug : virDomainPMSuspendForDuration:728 : dom=0x7f7c18002050, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), target=0 duration=0 flags=0 2015-07-16 08:34:12.644+0000: 7751: debug : qemuAgentCommand:1135 : Send command '{"execute":"guest-suspend-ram"}' for write, seconds = -2 2015-07-16 08:34:13.358+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 358639}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}} len=135 2015-07-16 08:34:13.897+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 897380}, "event": "SUSPEND"} len=84 2015-07-16 08:34:23.502+0000: 7749: debug : virDomainPMWakeup:772 : dom=0x7f7c20003160, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), flags=0 2015-07-16 08:34:23.502+0000: 7749: info : qemuMonitorSend:1033 : QEMU_MONITOR_SEND_MSG: mon=0x7f7c1000e580 msg={"execute":"system_wakeup","id":"libvirt-17"} fd=-1 2015-07-16 08:34:23.515+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035663, "microseconds": 514883}, "event": "WAKEUP"} len=83 2015-07-16 08:35:11.420+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035711, "microseconds": 419909}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}} len=134 So, at 08:34:12 I've suspended the domain. Then one second after that QEMU sent event that qemu-ga socket has been closed in guest. This is correct, nobody can be listening in a suspended system, right? Then, after ten seconds I woke the domain up. But strange thing happened - it took really a long while until qemu-ga started listening again. Nearly 50 seconds. Therefore I think this is qemu bug (if anything - maybe it really takes long to fully wake up a system). Then, I've noticed that guest's display was blank during this time, so I doubt it's qemu alone here and maybe we need to dig deeper. At any rate, I don't think that what you've found is a libvirt bug. In fact it shows how well is libvirt driven by qemu events. Thanks for Michal's reply, have filed a bug to qemu and will verify the original bug ASAP. https://bugzilla.redhat.com/show_bug.cgi?id=1244064 Verify the bug with libvirt-1.2.17-3.el7.x86_64, libvirt will close the agent fd while found guest agent stay in'disconnected' status and will re-open the agent fd while found guest agent back 'connected' status,Verify steps as following. 1.Prepare a running guest with guest agent configured and s3/s4 enabled #virsh dumpxml virt-tests-vm1 -- <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> 2.Do S3 with the guest, found the agent status in disconnected status after finish doing S3 # virsh dompmsuspend virt-tests-vm1 --target mem Domain virt-tests-vm1 successfully suspended #virsh dumpxml virt-tests-vm1 -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> 3.Re-excute the step2's command, could get the expect error # virsh dompmsuspend virt-tests-vm1 --target mem error: Domain virt-tests-vm1 could not be suspended error: Requested operation is not valid: domain is not running 4.Check the libvirtd.log, could find agent has been closed #cat /var/log/libvirt/libvirtd.log -- 2015-08-03 05:20:17.736+0000: 3148: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fea5800b920 reply={"return": [{"frontend-open": false, "filename": "disconnected:unix:/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0,server", "label": "charchannel0"} -- 2015-08-03 05:22:01.410+0000: 3329: debug : qemuAgentNotifyClose:816 : mon=0x7fea58009710 -- 2015-08-03 05:22:01.410+0000: 3152: debug : qemuDomainObjExitAgent:1834 : Exited agent (agent=0x7fea58009710 vm=0x7fea1420d560 name=virt-tests-vm1) 2015-08-03 05:22:01.410+0000: 3329: debug : qemuAgentClose:829 : mon=0x7fea58009710 4.Wakeup the guest, could find the agent back to 'connected' status # virsh dompmwakeup virt-tests-vm1 Domain virt-tests-vm1 successfully woken up <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> #cat /var/log/libvirt/libvirtd.log -- 2015-08-03 05:26:23.988+0000: 3329: debug : qemuAgentOpen:778 : New mon 0x7fea1428b2e0 fd =21 watch=18 2015-08-03 05:26:23.988+0000: 3329: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fea14000a30 classname=virDomainEventAgentLifecycle 2015-08-03 05:26:23.988+0000: 3329: debug : virDomainEventAgentLifecycleDispose:496 : obj=0x7fea14000a30 5.Re-excute S3/S4, both of them could be excuted successfully # virsh dompmsuspend virt-tests-vm1 --target mem Domain virt-tests-vm1 successfully suspended # virsh dompmwakeup virt-tests-vm1 Domain virt-tests-vm1 successfully woken up # virsh dompmsuspend virt-tests-vm1 --target disk Domain virt-tests-vm1 successfully suspended 6.Start guest, guest could back to the place where it left #virsh start virt-tests-vm1 7.Re-excute S3, then save/restore the guest, all of the operations could be done successfully # virsh dompmsuspend virt-tests-vm1 --target mem Domain virt-tests-vm1 successfully suspended [root@zhwangrhel71 ~]# virsh dompmwakeup virt-tests-vm1 Domain virt-tests-vm1 successfully woken up # virsh save virt-tests-vm1 /tmp/virt-tests-vm1.save Domain virt-tests-vm1 saved to /tmp/virt-tests-vm1.save # virsh restore /tmp/virt-tests-vm2.save Domain restored from /tmp/virt-tests-vm1.save 8.Try several time about step 5~7, all of them could get expect result According to upper steps, mark this bug verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |