Bug 1244064 - the guest agent will always stay in 'disconnected' status after wakeup a guest which configured 2 cpus from 'pmsuspended' status
Summary: the guest agent will always stay in 'disconnected' status after wakeup a gues...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Markus Armbruster
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On:
Blocks: 1288337
TreeView+ depends on / blocked
 
Reported: 2015-07-17 02:39 UTC by zhenfeng wang
Modified: 2016-11-07 20:28 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-rhev-2.5.0-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: event VSERPORT_CHANGE is rate-limited Consequence: when the guest triggers several VSERPORT_CHANGE in quick succession, rate-limiting drops some. Okay when they're all for the same port: only intermediate state changes can get dropped. Not okay when they're for different ports: any state change can get dropped, even the last one, and that makes the event unsuitable for tracking port state accurately. Defeats libvirt's tracking of port state. In particular, the connection to the guest agent can be lost after wakeup from S3 with multiple CPUs. Fix: rate limit seperately for each port. Result: even when rate-limiting drops events, libvirt tracks port state with sufficient accuracy. The connection to the guest agent is fine after wakeup from S3.
Clone Of:
Environment:
Last Closed: 2016-11-07 20:28:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description zhenfeng wang 2015-07-17 02:39:02 UTC
1.Start a guest with 2 cpus, guest agent and graphical desktop 
#virsh dumpxml rhel7.0
--
 <vcpu placement='static'>2</vcpu>
--
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>

--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

2.Do S3 with guest
# virsh dompmsuspend rhel7.0 --target mem
Domain rhel7.0 successfully suspended
# virsh list
 Id    Name                           State
----------------------------------------------------
 15    rhel7.0                        pmsuspended

#virsh dumpxml rhel7.0
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>


3.Wakeup the guest, check the guest agent status with virsh dumpxml, found the guest agent was still in 'disconnected' status, also will fail to excute the commands which depend on guest agent
# virsh dompmwakeup rhel7.0
Domain rhel7.0 successfully woken up


#virsh dumpxml rhel7.0
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend rhel7.0 --target mem
error: Domain rhel7.0 could not be suspended
error: Guest agent is not responding: QEMU guest agent is not connected

4.Restart libvirtd service or restart guest agent service inside guest will make the guest agent back to 'connected' status

5.The guest with 1 cpu will could work expectly.

Comment 1 zhenfeng wang 2015-07-17 02:40:17 UTC
Here some debugging info from libvirt dev from bug 890648
https://bugzilla.redhat.com/show_bug.cgi?id=890648#c40


 What's happening can be seen from this log snippet:

2015-07-16 08:32:38.818+0000: 7748: info : libvirt version: 1.2.17, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-07-10-07:33:51, x86-035.build.eng.bos.redhat.com)

2015-07-16 08:34:12.642+0000: 7751: debug : virDomainPMSuspendForDuration:728 : dom=0x7f7c18002050, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), target=0 duration=0 flags=0
2015-07-16 08:34:12.644+0000: 7751: debug : qemuAgentCommand:1135 : Send command '{"execute":"guest-suspend-ram"}' for write, seconds = -2
2015-07-16 08:34:13.358+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 358639}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
 len=135
2015-07-16 08:34:13.897+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 897380}, "event": "SUSPEND"}
 len=84

2015-07-16 08:34:23.502+0000: 7749: debug : virDomainPMWakeup:772 : dom=0x7f7c20003160, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), flags=0
2015-07-16 08:34:23.502+0000: 7749: info : qemuMonitorSend:1033 : QEMU_MONITOR_SEND_MSG: mon=0x7f7c1000e580 msg={"execute":"system_wakeup","id":"libvirt-17"}
 fd=-1
2015-07-16 08:34:23.515+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035663, "microseconds": 514883}, "event": "WAKEUP"}
 len=83

2015-07-16 08:35:11.420+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035711, "microseconds": 419909}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
 len=134

So, at 08:34:12 I've suspended the domain. Then one second after that QEMU sent event that qemu-ga socket has been closed in guest. This is correct, nobody can be listening in a suspended system, right? Then, after ten seconds I woke the domain up. But strange thing happened - it took really a long while until qemu-ga started listening again. Nearly 50 seconds. Therefore I think this is qemu bug (if anything - maybe it really takes long to fully wake up a system). Then, I've noticed that guest's display was blank during this time, so I doubt it's qemu alone here and maybe we need to dig deeper. At any rate, I don't think that what you've found is a libvirt bug. In fact it shows how well is libvirt driven by qemu events.

Comment 3 Marc-Andre Lureau 2015-07-20 22:16:16 UTC
I can't reproduce on fedora host & rhel7 qemu-guest-agent-2.1.0-4.el7.x86_64

What's your version of qemu on host? Can you reproduce with a fedora host or is this rhel7 only bug?

thanks

Comment 4 zhenfeng wang 2015-08-05 02:44:29 UTC
hi Marc-Andre
sorry to reply you so late, i can't reproduce it with fedora host, and could still reproduce it with rhel, the following was my pkg info
host:
libvirt-1.2.17-3.el7.x86_64
qemu-kvm-rhev-2.3.0-14.el7.x86_64

guest:
qemu-guest-agent-2.3.0-2.el7.x86_64

Comment 9 Marc-Andre Lureau 2015-08-11 21:13:17 UTC
sent fix:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01285.html

(I am working on some follow-up patches to throttle VSERPORT_CHANGED too)

Comment 10 Ademar Reis 2015-09-16 12:54:05 UTC
Even though this is not a supported scenario (suspending/resuming a guest), we have patches upstream and this should be included in the next rebase.

Comment 11 Markus Armbruster 2015-10-12 12:43:53 UTC
Proposed patches to throttle VSERPORT_CHANGE properly:
http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg06649.html

Comment 12 Markus Armbruster 2015-10-30 14:27:46 UTC
Upstream commits
7f1e7b2 docs: Document QMP event rate limiting
7de0be6 monitor: Throttle event VSERPORT_CHANGE separately by "id"
a24712a monitor: Turn monitor_qapi_event_state[] into a hash table
8681dff glib: add compatibility interface for g_hash_table_add()
b9b03ab monitor: Split MonitorQAPIEventConf off MonitorQAPIEventState
1824c41 monitor: Switch from timer_new() to timer_new_ns()
93f8f98 monitor: Simplify event throttling
688b4b7 monitor: Reduce casting of QAPI event QDict
7f02784 qstring: Make conversion from QObject * accept null
2d6421a qlist: Make conversion from QObject * accept null
fcf73f6 qfloat qint: Make conversion from QObject * accept null
89cad9f qdict: Make conversion from QObject * accept null
14b6160 qbool: Make conversion from QObject * accept null
c7c4621 qobject: Drop QObject_HEAD

Comment 17 Xueqiang Wei 2016-08-18 10:55:58 UTC
according to Comment 7, retested five times and the results are all passed. So verify this issue. The details as below:

host:
kernel-3.10.0-461.el7.x86_64
qemu-kvm-rhev-2.6.0-12.el7
libvirt-2.0.0-1.el7.x86_64

guest:
kernel-3.10.0-456.el7.x86_64
qemu-guest-agent-2.3.0-4.el7.x86_64

some logs:
# virsh start bug1244064
Domain bug1244064 started

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended


wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

Comment 18 Xueqiang Wei 2016-08-22 07:25:45 UTC
retested on the latest version and the results are all passed. The details as below:

host:
kernel-3.10.0-461.el7.x86_64
qemu-kvm-rhev-2.6.0-21.el7
libvirt-2.0.0-1.el7.x86_64

guest:
kernel-3.10.0-456.el7.x86_64
qemu-guest-agent-2.5.0-2.el7.x86_64

some logs:
# virsh start bug1244064
Domain bug1244064 started

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

wakeup with spice input

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     running

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

# virsh dompmsuspend bug1244064 --target mem
Domain bug1244064 successfully suspended

# virsh list
 Id    Name                           State
----------------------------------------------------
 12    bug1244064                     pmsuspended

check the guest agent status with virsh dumpxml
# virsh dumpxml bug1244064
<channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>

Comment 20 errata-xmlrpc 2016-11-07 20:28:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.