Bug 2186668
| Summary: | Make qga poller code more robust when parsing vcpuinfo | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | vdsm | Assignee: | Nobody <nobody> |
| Status: | CLOSED DUPLICATE | QA Contact: | Lukas Svaty <lsvaty> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.5.3 | CC: | ldixon, lsurette, srevivo, tgolembi, ycui |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-04-18 21:09:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I spun up a Win2016 VM from the old template which should have the old version of QGA installed. Its showing the following virtio-win-guest-tools 1.9.10 RHEV-Tools 4.43.10 RHV-Spice-Agent64 4.43.3 REV-Application-Provisioning-Tool 4.34.4 QEMU guest agent: 7.6.2 I am not sure where its getting that version of QEMU guest agent. This is a screen shot of the versions: https://share.getcloudapp.com/yAuJJKY6 When I run the following on the host that is running the VM named lynnwin01.ad.shadowman.dev I see this: virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf guestvcpus lynnwin01.ad.shadowman.dev error: internal error: 'can-offline' missing in reply of guest-get-vcpus So it doesn't appear that its picking up the guest CPU's but it does report the IP address. Similarly, whenever I start a Win2016 VM that is using this template and old QGA, I see these in the vdsm.log on that host: 2023-04-14 15:50:57,060-0400 ERROR (qgapoller/3) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fdaa8fa32e8>> operation failed (periodic:204) TypeError: argument of type 'NoneType' is not iterable 2023-04-14 15:51:02,076-0400 ERROR (qgapoller/4) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fdaa8fa32e8>> operation failed (periodic:204) TypeError: argument of type 'NoneType' is not iterable If I stop the Win2016 VM on the host, those errors stop. And the errors return whenever a Win2016 VM is ran. I don't know why I couldn't reproduce it, but its probably a symptom of this: https://bugzilla.redhat.com/show_bug.cgi?id=1438735 Anyway, I think VDSM code should be more robust for this. If the problem reported breaks Guest Info from QGA for the problematic guest that is acceptable, but its stopping the entire monitoring cycle so many VMs are not polled and are missing guest info, and they have nothing to do with this and are working fine, data just not being collected. The exception stops the monitoring cycle for all next VMs, thats the problem. This bug was already fixed as part of this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2120381#c12 Ohh, thanks Tomáš! Not sure how I missed that :( Lynn, turns out you did upgrade to the very latest, but just 2 days later a newer version, with the fix, was available: vdsm-4.50.2.2-1.el8ev.x86_64 Tue Mar 28 23:57:02 2023 Fix is here: https://access.redhat.com/errata/RHBA-2022:8694 However you did the right thing, to upgrade that ancient Guest Agent. Closing as duplicate. *** This bug has been marked as a duplicate of bug 2120381 *** |
Description of problem: A very old (unknown actually) QGA on Windows Guests can apparently not return much in 'guestvcpus' command. Or at least not return "online" key. If that happens, VDSM's qga poller blows up here every 5 seconds, on that VM. 2023-04-12 21:34:09,188-0400 ERROR (qgapoller/3) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7f33601920f0>> operation failed (periodic:204) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 202, in __call__ self._func() File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 493, in _poller vm_id, self._qga_call_get_vcpus(vm_obj)) File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 814, in _qga_call_get_vcpus if 'online' in vcpus: TypeError: argument of type 'NoneType' is not iterable The problem with this is that it breaks the entire monitoring cycle, not just the VM affected. So for example if there are 10 VMs on the host, and the 4th VM has this QGA issue, then 6 VMs on the host don't have their QGA polled, resulting in 6 VMs with missing IPs and other info. It is hard to track down which VMs needs it upgraded to the latest version if the user has 6 VM with missing info. Upgrading QGA to the latest shipped fixes the issue. Version-Release number of selected component (if applicable): RHV 4.4. SP1 How reproducible: * Still trying to figure out what exactly QGA version that was. Steps to Reproduce: * Uknown QGA so far