Bug 1073478
Summary: | VDSM internal exception gathering balloon info while shutting down | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ilanit Stein <istein> | ||||||||||
Component: | vdsm | Assignee: | Francesco Romani <fromani> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Ilanit Stein <istein> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 3.4.0 | CC: | acathrow, adahms, bazulay, fromani, iheim, istein, jentrena, lpeer, mavital, michal.skrivanek, ofrenkel, Rhev-m-bugs, sherold, yeylon | ||||||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged, ZStream | ||||||||||
Target Release: | 3.5.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | virt | ||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: |
Previously, the state of suspended virtual machines was incorrectly displayed as 'Down' under certain conditions. This was caused by the logic used to poll the status of virtual machines, whereby an uncaught exception would cause the status of virtual machines to be reported incorrectly. Now, this logic has been revised so that the status of virtual machines is reported correctly.
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 1098148 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2014-06-09 13:29:27 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1098148, 1142923, 1156165 | ||||||||||||
Attachments: |
|
Created attachment 871466 [details]
vdsm.log
Do you still have vdsm+qemu logs? would be helpful I mean libvirt logs Created attachment 872409 [details]
libvirt log
Created attachment 872410 [details]
VM qemu log
vm moves to down because exception in getVmStats is not treated well in the engine, and the vm is mistaken to be down. this is documented and fixed in bug 1072282 we need to understand why vdsm fails in getVmStats, though, so this is not a duplicate (in bug 1072282 exception is due to network issues) Thread-105985::ERROR::2014-03-06 13:26:40,972::BindingXMLRPC::1072::vds::(wrapper) libvirt error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 1065, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 414, in getAllVmStats response = self.vmGetStats(s['vmId']) File "/usr/share/vdsm/BindingXMLRPC.py", line 405, in vmGetStats return vm.getStats() File "/usr/share/vdsm/API.py", line 357, in getStats stats = v.getStats().copy() File "/usr/share/vdsm/vm.py", line 2750, in getStats stats = self._getStatsInternal() File "/usr/share/vdsm/vm.py", line 2850, in _getStatsInternal stats['balloonInfo'] = self._getBalloonInfo() File "/usr/share/vdsm/vm.py", line 4644, in _getBalloonInfo cur_mem = self._dom.info()[2] File "/usr/share/vdsm/vm.py", line 859, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1874, in info if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid '534197a4-a88d-4378-8081-7b789b2edc65' may be a consequence of commit 8fedf8bde3c28edb07add23c3e9b72681cb48e49, and a race between libvirt notification and sampling threads. The fact this is hard to reproduce seems to confirm that. Adding some extra checks in the sampling threads won't hurt. (In reply to Francesco Romani from comment #7) > may be a consequence of commit 8fedf8bde3c28edb07add23c3e9b72681cb48e49, and > a race between libvirt notification and sampling threads. The fact this is > hard to reproduce seems to confirm that. Adding some extra checks in the > sampling threads won't hurt. Sorry, not sampling threads but engine polling. Still a race, but a different one :) *** Bug 1070649 has been marked as a duplicate of this bug. *** was: Suspend VM ends up with VM in down status if a getVmStats call fails inside VDSM for whatever reason, older engine are confused (fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1072282) and mistanely think the VM is down. This bug is about how to properly handle the failure inside VDSM, as explained in https://bugzilla.redhat.com/show_bug.cgi?id=1073478#c6 https://bugzilla.redhat.com/show_bug.cgi?id=1073478#c7 https://bugzilla.redhat.com/show_bug.cgi?id=1073478#c8 As the original flow reported in this bug is not reproducible, how should this bug be verified? Being a timing issue, is indeed very hard to reproduce. The only answer I can offer is to try to recreate the race issuing the getVmStats command VERY frequently (something like 10 times per second), for example with a shell script, and to shutdown a VM in the meantime. Hopefully in a few try you can get an error like the one being fixed. Can't think yet of a better or simpler way to reproduce this issue. Closing the bug since it can't be reproduced. *** Bug 1070649 has been marked as a duplicate of this bug. *** (In reply to Ilanit Stein from comment #14) Please do not close the bug if you can't reproduce it after the fix. That means it should be VERIFIED or you should say you can't verify it…but that doesn't mean the fix is invalid - on the contrary, it indicates it seems to work. unfortunately it's too late for fixing this in 3.4 GA so retargeting to 3.4.1. It's important to get this fixed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html |
Created attachment 871465 [details] engine.log Description of problem: Suspend VM (rhel6.5), shutdown VM, Run VM, suspend VM - The VM had turned into down state. Version-Release number of selected component (if applicable): av2 Expected results: VM should be suspended Additional info: I didn't reproduce it. Events (1st suspend 13:24, 2nd suspend 13:26): 2014-Mar-06, 13:26 VM downtime is down. Exit message: SaveState succeeded 2014-Mar-06, 13:26 User admin@internal got disconnected from VM downtime. 2014-Mar-06, 13:26 Suspending VM downtime was initiated by User admin (Host: lilach-vdsb.tlv.redhat.com). 2014-Mar-06, 13:25 User admin@internal is connected to VM downtime. 2014-Mar-06, 13:25 user admin initiated console session for VM downtime 2014-Mar-06, 13:25 VM downtime started on Host lilach-vdsb.tlv.redhat.com 2014-Mar-06, 13:24 user admin initiated console session for VM downtime 2014-Mar-06, 13:24 VM downtime was started by admin (Host: lilach-vdsb.tlv.redhat.com). 2014-Mar-06, 13:24 Suspended VM downtime powered off by admin. 2014-Mar-06, 13:24 VM downtime on Host lilach-vdsb.tlv.redhat.com is suspended. 2014-Mar-06, 13:23 Suspending VM downtime was initiated by User admin (Host: lilach-vdsb.tlv.redhat.com).