Bug 1244064
| Summary: | the guest agent will always stay in 'disconnected' status after wakeup a guest which configured 2 cpus from 'pmsuspended' status | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | zhenfeng wang <zhwang> |
| Component: | qemu-kvm-rhev | Assignee: | Markus Armbruster <armbru> |
| Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | armbru, chayang, dyuan, gsun, huding, jen, juzhang, mzhan, virt-maint, xfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.5.0-1.el7 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: event VSERPORT_CHANGE is rate-limited
Consequence: when the guest triggers several VSERPORT_CHANGE in quick succession, rate-limiting drops some. Okay when they're all for the same port: only intermediate state changes can get dropped. Not okay when they're for different ports: any state change can get dropped, even the last one, and that makes the event unsuitable for tracking port state accurately. Defeats libvirt's tracking of port state. In particular, the connection to the guest agent can be lost after wakeup from S3 with multiple CPUs.
Fix: rate limit seperately for each port.
Result: even when rate-limiting drops events, libvirt tracks port state with sufficient accuracy. The connection to the guest agent is fine after wakeup from S3.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-07 20:28:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1288337 | ||
Here some debugging info from libvirt dev from bug 890648 https://bugzilla.redhat.com/show_bug.cgi?id=890648#c40 What's happening can be seen from this log snippet: 2015-07-16 08:32:38.818+0000: 7748: info : libvirt version: 1.2.17, package: 2.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-07-10-07:33:51, x86-035.build.eng.bos.redhat.com) 2015-07-16 08:34:12.642+0000: 7751: debug : virDomainPMSuspendForDuration:728 : dom=0x7f7c18002050, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), target=0 duration=0 flags=0 2015-07-16 08:34:12.644+0000: 7751: debug : qemuAgentCommand:1135 : Send command '{"execute":"guest-suspend-ram"}' for write, seconds = -2 2015-07-16 08:34:13.358+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 358639}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}} len=135 2015-07-16 08:34:13.897+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035653, "microseconds": 897380}, "event": "SUSPEND"} len=84 2015-07-16 08:34:23.502+0000: 7749: debug : virDomainPMWakeup:772 : dom=0x7f7c20003160, (VM: name=rhel7.0, uuid=336ba55b-5631-46a8-b57e-f4e1ce7dfed4), flags=0 2015-07-16 08:34:23.502+0000: 7749: info : qemuMonitorSend:1033 : QEMU_MONITOR_SEND_MSG: mon=0x7f7c1000e580 msg={"execute":"system_wakeup","id":"libvirt-17"} fd=-1 2015-07-16 08:34:23.515+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035663, "microseconds": 514883}, "event": "WAKEUP"} len=83 2015-07-16 08:35:11.420+0000: 7748: info : qemuMonitorIOProcess:452 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7c1000e580 buf={"timestamp": {"seconds": 1437035711, "microseconds": 419909}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}} len=134 So, at 08:34:12 I've suspended the domain. Then one second after that QEMU sent event that qemu-ga socket has been closed in guest. This is correct, nobody can be listening in a suspended system, right? Then, after ten seconds I woke the domain up. But strange thing happened - it took really a long while until qemu-ga started listening again. Nearly 50 seconds. Therefore I think this is qemu bug (if anything - maybe it really takes long to fully wake up a system). Then, I've noticed that guest's display was blank during this time, so I doubt it's qemu alone here and maybe we need to dig deeper. At any rate, I don't think that what you've found is a libvirt bug. In fact it shows how well is libvirt driven by qemu events. I can't reproduce on fedora host & rhel7 qemu-guest-agent-2.1.0-4.el7.x86_64 What's your version of qemu on host? Can you reproduce with a fedora host or is this rhel7 only bug? thanks hi Marc-Andre sorry to reply you so late, i can't reproduce it with fedora host, and could still reproduce it with rhel, the following was my pkg info host: libvirt-1.2.17-3.el7.x86_64 qemu-kvm-rhev-2.3.0-14.el7.x86_64 guest: qemu-guest-agent-2.3.0-2.el7.x86_64 sent fix: http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01285.html (I am working on some follow-up patches to throttle VSERPORT_CHANGED too) Even though this is not a supported scenario (suspending/resuming a guest), we have patches upstream and this should be included in the next rebase. Proposed patches to throttle VSERPORT_CHANGE properly: http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg06649.html Upstream commits 7f1e7b2 docs: Document QMP event rate limiting 7de0be6 monitor: Throttle event VSERPORT_CHANGE separately by "id" a24712a monitor: Turn monitor_qapi_event_state[] into a hash table 8681dff glib: add compatibility interface for g_hash_table_add() b9b03ab monitor: Split MonitorQAPIEventConf off MonitorQAPIEventState 1824c41 monitor: Switch from timer_new() to timer_new_ns() 93f8f98 monitor: Simplify event throttling 688b4b7 monitor: Reduce casting of QAPI event QDict 7f02784 qstring: Make conversion from QObject * accept null 2d6421a qlist: Make conversion from QObject * accept null fcf73f6 qfloat qint: Make conversion from QObject * accept null 89cad9f qdict: Make conversion from QObject * accept null 14b6160 qbool: Make conversion from QObject * accept null c7c4621 qobject: Drop QObject_HEAD according to Comment 7, retested five times and the results are all passed. So verify this issue. The details as below: host: kernel-3.10.0-461.el7.x86_64 qemu-kvm-rhev-2.6.0-12.el7 libvirt-2.0.0-1.el7.x86_64 guest: kernel-3.10.0-456.el7.x86_64 qemu-guest-agent-2.3.0-4.el7.x86_64 some logs: # virsh start bug1244064 Domain bug1244064 started # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended retested on the latest version and the results are all passed. The details as below: host: kernel-3.10.0-461.el7.x86_64 qemu-kvm-rhev-2.6.0-21.el7 libvirt-2.0.0-1.el7.x86_64 guest: kernel-3.10.0-456.el7.x86_64 qemu-guest-agent-2.5.0-2.el7.x86_64 some logs: # virsh start bug1244064 Domain bug1244064 started # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> wakeup with spice input # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 running check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend bug1244064 --target mem Domain bug1244064 successfully suspended # virsh list Id Name State ---------------------------------------------------- 12 bug1244064 pmsuspended check the guest agent status with virsh dumpxml # virsh dumpxml bug1244064 <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-15-bug1244064/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |
1.Start a guest with 2 cpus, guest agent and graphical desktop #virsh dumpxml rhel7.0 -- <vcpu placement='static'>2</vcpu> -- <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> 2.Do S3 with guest # virsh dompmsuspend rhel7.0 --target mem Domain rhel7.0 successfully suspended # virsh list Id Name State ---------------------------------------------------- 15 rhel7.0 pmsuspended #virsh dumpxml rhel7.0 -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> 3.Wakeup the guest, check the guest agent status with virsh dumpxml, found the guest agent was still in 'disconnected' status, also will fail to excute the commands which depend on guest agent # virsh dompmwakeup rhel7.0 Domain rhel7.0 successfully woken up #virsh dumpxml rhel7.0 -- <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/rhel7.0.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> # virsh dompmsuspend rhel7.0 --target mem error: Domain rhel7.0 could not be suspended error: Guest agent is not responding: QEMU guest agent is not connected 4.Restart libvirtd service or restart guest agent service inside guest will make the guest agent back to 'connected' status 5.The guest with 1 cpu will could work expectly.