RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 890648 - guest agent commands will hang if the guest agent crashes while executing a command
Summary: guest agent commands will hang if the guest agent crashes while executing a c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 7.2
Assignee: Michal Privoznik
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1028927 (view as bug list)
Depends On: 970161 1122151
Blocks: 896690 892079 1028927 1105185 1167336 1167392
TreeView+ depends on / blocked
 
Reported: 2012-12-28 13:04 UTC by zhenfeng wang
Modified: 2015-11-19 05:36 UTC (History)
12 users (show)

Fixed In Version: libvirt-1.2.17-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 892079 970161 1028927 1080376 (view as bug list)
Environment:
Last Closed: 2015-11-19 05:36:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The dll file about qemu-agent (2.34 MB, application/octet-stream)
2013-01-05 05:13 UTC, zhenfeng wang
no flags Details
The guest's xml (3.01 KB, text/plain)
2013-01-05 05:16 UTC, zhenfeng wang
no flags Details
libvirt's log while guest agent lost control (10.39 MB, text/plain)
2015-06-25 02:36 UTC, zhenfeng wang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description zhenfeng wang 2012-12-28 13:04:05 UTC
Description of problem:
Fail to excute s3/s4 operation for the windows guest which running the guest agent service
Version-Release number of selected component (if applicable):
libvirt-0.10.2-13.el6.x86_64
kernel-2.6.32-348.el6.x86_64  
qemu-kvm-0.12.1.2-2.346.el6.x86_64

How reproducible:
100%
1 Install the virtio-win-1.5.4-1.el6.noarch pkg to get the virtio-serial and spice+qxl drivers
The virtio-serial driver was in
# ls /usr/share/virtio-win/virtio-win-1.5.4.iso
/usr/share/virtio-win/virtio-win-1.5.4.iso

The spice+qxl driver was in
# ls /usr/share/virtio-win/   -----you need make a iso file for this directory
#mkisofs -o /var/lib/libvirt/images/virtiowin.iso /usr/share/virtio-win/

2 Prepare a windows guest with the virtio-serial and spice+qxl driver installed
# virsh dumpxml win7x86
  <domain type='kvm'>
  <name>win7-32</name>
  <uuid>ad61420e-b3c6-b50e-16ab-73009cbf9b6d</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='i686' machine='rhel6.4.0'>hvm</type>
    <loader>/usr/share/seabios/bios.bin</loader>
    <boot dev='hd'/>
  </os>

---
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
---
  <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/win7-32.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <graphics type='spice' autoport='yes'/>
    <video>
      <model type='qxl' vram='65536' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>

----

3 Install the qemu-guest-agent-win32-0.12.1.2-2.346.el6.x86_64 on a rhel host and get the executable
# ll /usr/share/qemu-kvm/qemu-ga-win32/
total 464
-rwxr-xr-x. 1 root root 467160 Dec 14 12:23 qemu-ga.exe
-r--r--r--. 1 root root   1155 Dec 14 12:16 README.txt

4 Install the qemu-ga service in guest
mkdir a folder named qemu-ga in the windows guest then put the qemu-ga.exe  and other three dll file which needed in README.TXT to the qemu-ga directory
#c:\qemu-ga> dir
qemu-ga.exe
iconv.dll
libglib-2.0.0.dll
libintl-8.dll
README.txt

#c:\qemu-ga\qemu-ga.exe --service install

5 Check the qemu-ga  service statu with the command  services.msc
#c:\services.msc
6 operation s3/s4 on the host
# time virsh dompmsuspend win7-32 --target mem
Domain win7-32 successfully suspended

real    10m17.578s
user    0m0.030s
sys    0m0.041s

# virsh domstate --reason win7-32
shut off (shutdown)

libvirtd did not hung, but domain s3 fail

# time virsh dompmsuspend win7-32 --target disk
Domain win7-32 successfully suspended

real    10m20.538s
user    0m0.036s
sys    0m0.043s

# virsh domstate --reason win7-32
shut off (shutdown)

libvirtd did not hung, but domain s4 fail

So libvirt support on windows s3/s4 still fail.

Try with qemu qmp command
# virsh start win7-32
Domain win7-32 started

# ps aux|grep qemu
qemu     20399 45.8  0.0 1603380 39544 ?       Sl   06:17   0:02
/usr/libexec/qemu-kvm -name win7-32 -S -M rhel6.4.0 -enable-kvm -m 1024
-smp 1,sockets=1,cores=1,threads=1 -uuid
752d19d2-6b22-d848-52c8-66c0a0b7891a -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7-32.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/var/lib/libvirt/images/win7-32.img,if=none,id=drive-ide0-0-0,format=raw,cache=none
-device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
-drive
file=/usr/share/virtio-win/virtio-win-1.5.4.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-netdev tap,fd=29,id=hostnet0 -device
rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:01:57:87,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/win7-32.agent,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on
-vga qxl -global qxl-vga.vram_size=67108864 -device
intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

# nc -U /var/lib/libvirt/qemu/win7-32.agent
{ "execute": "guest-ping"}

no response here.

So, could be the libvirt xml setting problem.

But with qemu command with the same img will success.
/usr/libexec/qemu-kvm -M rhel6.4.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name win7-32 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/var/lib/libvirt/images/win7-32.img,if=none,id=disk0,format=raw,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=disk0,id=disk0  -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -spice port=5930,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0  -global  PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

# nc -U /tmp/qga.sock
 { "execute": "guest-ping"}
 {"return": {}}
 { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }
{"return": 123456}
 { "execute": "guest-suspend-ram" } or { "execute": "guest-suspend-disk" }
 can successfully resume from s3 with (qemu)system_wakeup  and resume from s4

Actual results:
The s3/s4 virsh command will hang there while run the qemu-agent service in the windows guest

Expected results:
should do s3/s4 operation successfully for windows guest which running the guest agent service 
Additional info:
The s3/s4 operation works fine with rhel guest, the guest'xml is the same config with rhel guest.

Comment 2 RHEL Program Management 2013-01-01 06:47:21 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 3 zhenfeng wang 2013-01-05 05:13:01 UTC
Created attachment 672705 [details]
The dll file about qemu-agent

Comment 4 zhenfeng wang 2013-01-05 05:16:11 UTC
Created attachment 672706 [details]
The guest's xml

Comment 5 Michal Privoznik 2013-01-09 14:36:15 UTC
I think this bug is fixed by this patch:

https://www.redhat.com/archives/libvir-list/2013-January/msg00520.html

However, the patch fixes bug 892079. What means, either this is clone of the other one or vice versa.

Comment 7 zhenfeng wang 2013-03-28 10:31:27 UTC
I just report a new bug 928661 that Libvirtd crash when destroyed the linux guest which excuted a series of operations about S3 and save /restore. Maybe this bug have relationship with that one,you can reference it while fix this one. thanks

Comment 8 Peter Krempa 2013-06-03 15:10:52 UTC
The main reason that blocks the virsh command is that the qemu guest agent in windows crashes on the request to perform a suspend to disk. I created bug https://bugzilla.redhat.com/show_bug.cgi?id=970161 to track the issue.

This might happen also with the linux guest agent or in case of a malicious guest that would ignore the command read after libvirt syncs with the guest agent. I'm changing the summary of this bug to reflect this.

Comment 9 Peter Krempa 2013-06-27 08:08:50 UTC
The fix for this bug would require major rework of the guest agent infrastructure. Additionally the problem is only limited to one guest that has problem with the guest agent and hasn't the potential to influence other guests and management connections.

I'm moving it to 6.6 to re-evaluate the fix afterwards.

Comment 10 Hu Jianwei 2013-10-14 10:07:56 UTC
I found the similar issues with steps from bug 928661 on rhel6.5.

Version:
libvirt-0.10.2-29.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.411.el6.x86_64
qemu-guest-agent-0.12.1.2-2.411.el6.x86_64.rpm
kernel-2.6.32-421.el6.x86_64

1.# getenforce
Enforcing

2.Prepare a guest with qemu-ga ENV,add below config to domain xml. 
...
<pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
</pm>
...
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/r6.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
...

[root@intel-5130-16-2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 30    r6                             running
3. Run the following cmds
[root@intel-5130-16-2 ~]# virsh dompmsuspend r6 --target mem
Domain r6 successfully suspended
[root@intel-5130-16-2 ~]# virsh dompmwakeup r6
Domain r6 successfully woken up
[root@intel-5130-16-2 ~]# virsh save r6 /tmp/r6.save

Domain r6 saved to /tmp/r6.save

[root@intel-5130-16-2 ~]# virsh restore /tmp/r6.save 
Domain restored from /tmp/r6.save

[root@intel-5130-16-2 ~]# virsh dompmsuspend r6 --target mem
^C                                                                <======hung here.
[root@intel-5130-16-2 ~]# virsh save r6 /tmp/r6.save
error: Failed to save domain r6 to /tmp/r6.save
error: Timed out during operation: cannot acquire state change lock

[root@intel-5130-16-2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 30    r6                             running

[root@intel-5130-16-2 ~]#

Comment 11 zhenfeng wang 2013-12-26 07:40:05 UTC
Hi Peter
Since the bug 970161 has been fixed, so this bug no longer depend on it. I just test this bug follow the comment 0 steps with the latest packet , found that both S3 and S4 can be done successfully, however the issue in comment 10 was still exsiting, so i wonder that will we continue to fix this issue in this bug or open another BZ to track this issue, please help have a look thanks.

pkg info
virtio-win-1.6.7-2.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64
kernel-2.6.32-432.el6.x86_64
libvirt-0.10.2-29.el6_5.2.x86_64

steps
1.Prepare two guests one is win7 guest, another is rhel65 guest, the guest's xml was in the attachment
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 11    win7                          running
 10    rhel65                        running

2.Install the qemu-ga service in the two guests, then start the service

3.After the qemu-ga service start successfully, do the S3/S4 operation with the two guests, we can do it successfully
# virsh dompmsuspend win7 --target mem
Domain win72 successfully suspended
# virsh list
 Id    Name                           State
----------------------------------------------------
 4     win7                         pmsuspended

# virsh dompmwakeup win7
Domain win72 successfully woken up
# virsh list
 Id    Name                           State
----------------------------------------------------
 4     win7                          running

# virsh dompmsuspend win7 --target disk
Domain win72 successfully suspended
# virsh start win7
Domain win72 started

# virsh list
 Id    Name                           State
----------------------------------------------------
 5     win7                          running

4.Re-do the step3's operation with the rhel65 guest, The rhel guest can also do the S3/S4 sucessfully

5.Do the operation follow the comment 10 steps with the rhel65 guest, we can get the same result with the comment 10
# virsh dompmsuspend rhel65 --target mem
virsh Domain rhel65 successfully suspended

# virsh dompmwakeup rhel65
Domain rhel65 successfully woken up

# virsh save rhel65 /tmp/rhel65.save

Domain rhel65 saved to /tmp/rhel65.save

# virsh restore /tmp/rhel65.save 
Domain restored from /tmp/rhel65.save

# virsh dompmsuspend rhel65 --target mem
^C                                                                <======hung here.

# virsh save rhel65 /tmp/rhel65.save 
error: Failed to save domain rhel65 to /tmp/rhel65.save
error: Timed out during operation: cannot acquire state change lock

6.Do the operation follow the comment 10 steps with the win7 guest, during test this scenario, we have to restart the guest agent service after resume from S3/S4 since bug 888694. and i met another issue that can't login the win7 guest while it restore from the save file

# virsh dompmsuspend win7 --target mem
Domain win7 successfully suspended

# virsh dompmwakeup win7
Domain win7 successfully woken up

Restart the guest agent service in the guest

# virsh save win7 /tmp/1.save

Domain win7 saved to /tmp/1.save

# virsh restore /tmp/1.save 
Domain restored from /tmp/1.save

# virsh list
 Id    Name                           State
----------------------------------------------------
 18    rheltest2                      running
 24    win7                           running

Try to login the win7 guest, find we can't login it again

Comment 12 Peter Krempa 2014-01-07 13:01:14 UTC
Please open a new bug regarding the issue in the comment above. This bug is now tracking issues if guest agent crashes while libvirt is attempting to use it.

Comment 13 zhenfeng wang 2014-01-08 11:46:23 UTC
Hi Peter
Thanks for your response, have file bug for comment 11's issue (Bug1049858 &Bug 1049860)

Comment 16 Michal Privoznik 2014-03-25 09:41:21 UTC
(In reply to zhenfeng wang from comment #11)
> # virsh restore /tmp/rhel65.save 
> Domain restored from /tmp/rhel65.save
> 
> # virsh dompmsuspend rhel65 --target mem
> ^C                                                                <======hung here.
> 
> # virsh save rhel65 /tmp/rhel65.save 
> error: Failed to save domain rhel65 to /tmp/rhel65.save
> error: Timed out during operation: cannot acquire state change lock
> 

In order to fix this bug libvirt needs to know if the qemu-ga is listening or not. Currently, we do something which is not bulletproof: prior to executing any real command that may change guest's state (e.g. dompmsuspend), libvirt pings the guest agent. If it replies, we know it's listening and then issue the real command. However, qemu-ga may be stopped meanwhile (e.g. be killed) resulting in libvirt being stuck on the domain (libvirt serializes status changing calls on a domain).

Comment 18 Jiri Denemark 2014-04-04 21:37:38 UTC
This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product.

Comment 19 Michal Privoznik 2014-05-29 13:55:02 UTC
*** Bug 1028927 has been marked as a duplicate of this bug. ***

Comment 20 Eric Blake 2014-06-24 14:42:35 UTC
Upstream libvirt has a proposed solution that adds new events to inform libvirt of when the qga connection changes states (basically, when the guest opens or closes the device) - this could be used in libvirt to learn definitively when the agent has closed (probably crashed) and therefore libvirt need not wait forever for an answer from the agent.  I don't know if it will make qemu 2.1, though.
https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg06366.html

Comment 21 Eric Blake 2014-06-24 14:43:36 UTC
(In reply to Eric Blake from comment #20)
> Upstream libvirt has a proposed solution that adds new events to inform

Make that: upstream qemu has proposed new events

Comment 22 Eric Blake 2014-06-30 17:19:46 UTC
Upstream qemu event is in qemu 2.1; we can use the existence of VSERPORT_CHANGE in query-events to learn if we can rely on it:

commit e2ae6159de2482ee5e22532301eb7f2795828d07
Author: Laszlo Ersek <lersek>
Date:   Thu Jun 26 17:50:02 2014 +0200

    virtio-serial: report frontend connection state via monitor
    
    Libvirt wants to know about the guest-side connection state of some
    virtio-serial ports (in particular the one(s) assigned to guest agent(s)).
    Report such states with a new monitor event.
    
    RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1080376
    Signed-off-by: Laszlo Ersek <lersek>
    Reviewed-by: Eric Blake <eblake>
    Reviewed-by: Amit Shah <amit.shah>
    Signed-off-by: Luiz Capitulino <lcapitulino>

It is also possible to poll the current state upon libvirtd reconnect if an event was missed:

commit 32a97ea1711f43388e178b7c43e02143a61e47ee
Author: Laszlo Ersek <lersek>
Date:   Thu Jun 26 17:50:03 2014 +0200

    char: report frontend open/closed state in 'query-chardev'
    
    In addition to the on-line reporting added in the previous patch, allow
    libvirt to query frontend state independently of events.
    
    Libvirt's path to identify the guest agent channel it cares about differs
    between the event added in the previous patch and the QMP response field
    added here. The event identifies the frontend device, by "id". The
    'query-chardev' QMP command identifies the backend device (again by "id").
    The association is under libvirt's control.
    
    RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1080376
    
    Reviewed-by: Amit Shah <amit.shah>
    Signed-off-by: Laszlo Ersek <lersek>
    Reviewed-by: Eric Blake <eblake>
    Signed-off-by: Luiz Capitulino <lcapitulino>

Comment 24 zhenfeng wang 2015-01-09 06:20:41 UTC
I met an issue that pm-suspend-disk command will hang there sometimes if the guest cpu numbers >=2, confirm it with peter, he said the issue was closely related to this bug , so trace this issue in this bug

pkginfo
kernel-3.10.0-212.el7.x86_64
libvirt-1.2.8-10.el7.x86_64
qemu-kvm-rhev-2.1.2-15.el7.x86_64


steps
1.Prepare a guest with 2 cpus and guest agent service installed
#virsh dumpxml rhel7.0
--
 <vcpu placement='static'>2</vcpu>
--
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>


2.Start the guest
#virsh start rhel7.0

3.Excute S3 with the guest, then wakeup it
# virsh dompmsuspend rhel7.0 --target mem
Domain rhel7.0 successfully suspended
# virsh dompmwakeup rhel7.0
Domain rhel7.0 successfully woken up
# virsh list
 Id    Name                           State
----------------------------------------------------
 20    rhel7.0                        running

4.Excute S4 with the guest, it will hang there (sometimes it couldn't be reproduced during your first or second time , you need 
repeat the step 2~4 several time, then the issue happens)

# virsh dompmsuspend rhel7.0 --target disk
^C

5.The s3/s4 could always excuted successfully while set guest cpu number=1

6.The following was log got from the libvirtd.log
#cat /var/log/libvirt/libvirtd.log
--

2014-12-04 06:17:28.574+0000: 25783: debug : virObjectRef:296 : OBJECT_REF: obj=0x7fe180018860
2014-12-04 06:17:28.574+0000: 25783: error : qemuAgentIO:634 : internal error: End of file from monitor
2014-12-04 06:17:28.574+0000: 25783: debug : qemuAgentIO:667 : Error on monitor internal error: End of file from monitor

2014-12-04 05:50:59.000+0000: 23611: warning : qemuDomainObjBeginJobInternal:1391 : Cannot start job (modify, none) for domain rhel7.0; current job is (modify, none) owned by (23532, 0)
2014-12-04 05:50:59.000+0000: 23611: error : qemuDomainObjBeginJobInternal:1396 : Timed out during operation: cannot acquire state change lock

Comment 26 Michal Privoznik 2015-05-06 09:04:45 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2015-May/msg00143.html

Comment 27 Michal Privoznik 2015-05-07 14:52:15 UTC
Moving to POST:

commit 2af51483cc2fa43b70b41b4aaa88eeb77701f590
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu May 7 11:19:38 2015 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu May 7 11:31:17 2015 +0200

    processSerialChangedEvent: Close agent monitor early
    
    https://bugzilla.redhat.com/show_bug.cgi?id=890648
    
    So, imagine you've issued an API that involves guest agent. For
    instance, you want to query guest's IP addresses. So the API acquires
    QUERY_JOB, locks the guest agent and issues the agent command.
    However, for some reason, guest agent replies to initial ping
    correctly, but then crashes tragically while executing real command
    (in this case guest-network-get-interfaces). Since initial ping went
    well, libvirt thinks guest agent is accessible and awaits reply to the
    real command. But it will never come. What will is a monitor event.
    Our handler (processSerialChangedEvent) will try to acquire
    MODIFY_JOB, which will fail obviously because the other thread that's
    executing the API already holds a job. So the event handler exits
    early, and the QUERY_JOB is never released nor ended.
    
    The way how to solve this is to put flag somewhere in the monitor
    internals. The flag is called @running and agent commands are issued
    iff the flag is set. The flag itself is set when we connect to the
    agent socket. And unset whenever we see DISCONNECT event from the
    agent. Moreover, we must wake up all the threads waiting for the
    agent. This is done by signalizing the condition they're waiting on.
    
    Signed-off-by: Michal Privoznik <mprivozn>

v1.2.15-43-g2af5148

Comment 29 zhenfeng wang 2015-06-23 10:15:14 UTC
Hi Michal
I met an issue during verify this bug that the guest agent will stay in 'disconnected' status after wakeup a guest which configured 2 cpus from 'pmsuspended' status, can you help check it, thanks

pkginfo
libvirt-1.2.16-1.el7.x86_64

steps
1.Start a guest with 2 cpus with guest agent installing
#virsh dumpxml rhel7.0
--
 <vcpu placement='static'>2</vcpu>
--
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>

--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

2.Do S3 with guest
# virsh dompmsuspend rhel7.0 --target mem
Domain rhel7.0 successfully suspended
# virsh list
 Id    Name                           State
----------------------------------------------------
 15    rhel7.0                        pmsuspended

#virsh dumpxml rhel7.0
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>


3.Wakeup the guest, check the guest agent status with virsh dumpxml, found the guest agent was still in 'disconnected' status, also will fail to excute the commands which depend on guest agent
# virsh dompmwakeup rhel7.0
Domain rhel7.0 successfully woken up


#virsh dumpxml rhel7.0
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend rhel7.0 --target mem
error: Domain rhel7.0 could not be suspended
error: Guest agent is not responding: QEMU guest agent is not connected

4.Restart libvirtd service or restart guest agent service inside guest will make the guest agent back to 'connected' status

5.The guest with 1 cpu will could work expectly.

Comment 30 Michal Privoznik 2015-06-24 14:38:08 UTC
(In reply to zhenfeng wang from comment #29)
> Hi Michal
> I met an issue during verify this bug that the guest agent will stay in
> 'disconnected' status after wakeup a guest which configured 2 cpus from
> 'pmsuspended' status, can you help check it, thanks
> 
> pkginfo
> libvirt-1.2.16-1.el7.x86_64

Interesting. I'm unable to reproduce with this libvirt version. What's the qemu version? Can you please attach debug logs so that I can narrow down the problem? Thanks!

Comment 31 zhenfeng wang 2015-06-25 02:34:03 UTC
1.pkginfo
qemu-kvm-rhev-2.3.0-2.el7.x86_64
qemu-guest-agent-2.3.0-1.el7.x86_64

2.The key operation to reproduce this issue was that you must have 2 cpus configured in your guest's xml, just like comment 29

3.Better try several times for the reproduce steps, especially do S3--> wakeup

Comment 32 zhenfeng wang 2015-06-25 02:36:42 UTC
Created attachment 1042920 [details]
libvirt's log while guest agent lost control

Comment 33 Peter Krempa 2015-06-30 08:57:21 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1236924 is probably related to this issue.

Comment 34 Michal Privoznik 2015-06-30 09:44:46 UTC
I think the patch that fixes the problem was just sent to the list:

https://www.redhat.com/archives/libvir-list/2015-June/msg01612.html

I'm gonna review it. Until then, let's move this back to ASSIGNED.

Comment 35 Michal Privoznik 2015-06-30 13:15:45 UTC
Patch has been pushed upstream:

commit f1caa42777ff5433fb15f05f62d2ff717876eeac
Author:     Peter Krempa <pkrempa>
AuthorDate: Tue Jun 30 10:46:50 2015 +0200
Commit:     Peter Krempa <pkrempa>
CommitDate: Tue Jun 30 13:18:02 2015 +0200

    qemu: Close the agent connection only on agent channel events
    
    processSerialChangedEvent processes events for all channels. Commit
    2af51483 broke all agent interaction if a channel other than the agent
    closes since it did not check that the event actually originated from
    the guest agent channel.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1236924
    Fixes up: https://bugzilla.redhat.com/show_bug.cgi?id=890648

v1.2.17-rc1-5-gf1caa42

Comment 36 zhenfeng wang 2015-07-07 06:59:35 UTC
Hi Michal
The issue in comment 29 was still exsiting while re-test it with libvirt-1.2.17-1.el7 with a guest with desktop installation. BTW, it works well with the guest without desktop installation,please help check it thanks.

Comment 37 Michal Privoznik 2015-07-08 14:29:27 UTC
(In reply to zhenfeng wang from comment #36)
> Hi Michal
> The issue in comment 29 was still exsiting while re-test it with
> libvirt-1.2.17-1.el7 with a guest with desktop installation. BTW, it works
> well with the guest without desktop installation,please help check it thanks.

what do you mean by 'desktop installation'?

Comment 38 zhenfeng wang 2015-07-09 02:22:54 UTC
I mean the guest have graphical desktop

Comment 39 zhenfeng wang 2015-07-16 07:50:31 UTC
hi Michal
Any solution about my comment 36's issue, do you need me offer some info about it

Comment 40 Michal Privoznik 2015-07-16 08:52:55 UTC
Sorry for the delay, I was debugging this issue. From my findings:

1) Finally, I've managed to successfully reproduce the issue

2) What's happening can be seen from this log snippet:

2015-07-16 08:32:38.818+0000: 7748: info : libvirt version: 1.2.17, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-07-10-07:33:51, x86-035.build.eng.bos.redhat.com)

2015-07-16 08:34:12.642+0000: 7751: debug : virDomainPMSuspendForDuration:728 : dom=0x7f7c18002050, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), target=0 duration=0 flags=0
2015-07-16 08:34:12.644+0000: 7751: debug : qemuAgentCommand:1135 : Send command '{"execute":"guest-suspend-ram"}' for write, seconds = -2
2015-07-16 08:34:13.358+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 358639}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
 len=135
2015-07-16 08:34:13.897+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 897380}, "event": "SUSPEND"}
 len=84

2015-07-16 08:34:23.502+0000: 7749: debug : virDomainPMWakeup:772 : dom=0x7f7c20003160, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), flags=0
2015-07-16 08:34:23.502+0000: 7749: info : qemuMonitorSend:1033 : QEMU_MONITOR_SEND_MSG: mon=0x7f7c1000e580 msg={"execute":"system_wakeup","id":"libvirt-17"}
 fd=-1
2015-07-16 08:34:23.515+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035663, "microseconds": 514883}, "event": "WAKEUP"}
 len=83

2015-07-16 08:35:11.420+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035711, "microseconds": 419909}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
 len=134

So, at 08:34:12 I've suspended the domain. Then one second after that QEMU sent event that qemu-ga socket has been closed in guest. This is correct, nobody can be listening in a suspended system, right? Then, after ten seconds I woke the domain up. But strange thing happened - it took really a long while until qemu-ga started listening again. Nearly 50 seconds. Therefore I think this is qemu bug (if anything - maybe it really takes long to fully wake up a system). Then, I've noticed that guest's display was blank during this time, so I doubt it's qemu alone here and maybe we need to dig deeper. At any rate, I don't think that what you've found is a libvirt bug. In fact it shows how well is libvirt driven by qemu events.

Comment 41 zhenfeng wang 2015-07-20 02:01:31 UTC
Thanks for Michal's reply, have filed a bug to qemu and will verify the original bug ASAP. 
https://bugzilla.redhat.com/show_bug.cgi?id=1244064

Comment 42 zhenfeng wang 2015-08-03 10:21:00 UTC
Verify the bug with libvirt-1.2.17-3.el7.x86_64, libvirt will close the agent fd while found guest agent stay in'disconnected' status and will re-open the agent fd while found guest agent back 'connected' status,Verify steps as following.

1.Prepare a running guest with guest agent configured and s3/s4 enabled
#virsh dumpxml virt-tests-vm1
--
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>

2.Do S3 with the guest, found the agent status in disconnected status after finish doing S3
# virsh dompmsuspend virt-tests-vm1 --target mem
Domain virt-tests-vm1 successfully suspended

#virsh dumpxml virt-tests-vm1
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>


3.Re-excute the step2's command, could get the expect error
# virsh dompmsuspend virt-tests-vm1 --target mem
error: Domain virt-tests-vm1 could not be suspended
error: Requested operation is not valid: domain is not running

4.Check the libvirtd.log, could find agent has been closed
#cat /var/log/libvirt/libvirtd.log
--
2015-08-03 05:20:17.736+0000: 3148: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fea5800b920 reply={"return": [{"frontend-open": false, "filename": "disconnected:unix:/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0,server", "label": "charchannel0"}
--
2015-08-03 05:22:01.410+0000: 3329: debug : qemuAgentNotifyClose:816 : mon=0x7fea58009710
--
2015-08-03 05:22:01.410+0000: 3152: debug : qemuDomainObjExitAgent:1834 : Exited agent (agent=0x7fea58009710 vm=0x7fea1420d560 name=virt-tests-vm1)
2015-08-03 05:22:01.410+0000: 3329: debug : qemuAgentClose:829 : mon=0x7fea58009710

4.Wakeup the guest, could find the agent back to 'connected' status
# virsh dompmwakeup virt-tests-vm1
Domain virt-tests-vm1 successfully woken up

    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/virt-tests-vm1.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>

#cat /var/log/libvirt/libvirtd.log
--
2015-08-03 05:26:23.988+0000: 3329: debug : qemuAgentOpen:778 : New mon 0x7fea1428b2e0 fd =21 watch=18
2015-08-03 05:26:23.988+0000: 3329: info : virObjectNew:202 : OBJECT_NEW: obj=0x7fea14000a30 classname=virDomainEventAgentLifecycle
2015-08-03 05:26:23.988+0000: 3329: debug : virDomainEventAgentLifecycleDispose:496 : obj=0x7fea14000a30


5.Re-excute S3/S4, both of them could be excuted successfully

# virsh dompmsuspend virt-tests-vm1 --target mem
Domain virt-tests-vm1 successfully suspended
# virsh dompmwakeup virt-tests-vm1
Domain virt-tests-vm1 successfully woken up
# virsh dompmsuspend virt-tests-vm1 --target disk
Domain virt-tests-vm1 successfully suspended

6.Start guest, guest could back to the place where it left
#virsh start virt-tests-vm1

7.Re-excute S3, then save/restore the guest, all of the operations could be done successfully
# virsh dompmsuspend virt-tests-vm1 --target mem
Domain virt-tests-vm1 successfully suspended
[root@zhwangrhel71 ~]# virsh dompmwakeup virt-tests-vm1
Domain virt-tests-vm1 successfully woken up
# virsh save virt-tests-vm1 /tmp/virt-tests-vm1.save

Domain virt-tests-vm1 saved to /tmp/virt-tests-vm1.save
# virsh restore /tmp/virt-tests-vm2.save 
Domain restored from /tmp/virt-tests-vm1.save

8.Try several time about step 5~7, all of them could get expect result

According to upper steps, mark this bug verified

Comment 44 errata-xmlrpc 2015-11-19 05:36:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.