Bug 1612454
| Summary: | Live merge fails with "NotConnectedError: VM '%VM_ID%' was not defined yet or was undefined" after an extensive snapshots operations | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Elad <ebenahar> | ||||
| Component: | Core | Assignee: | Tal Nisan <tnisan> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Evelina Shames <eshames> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.20.31 | CC: | aefrat, bugs, tnisan | ||||
| Target Milestone: | ovirt-4.3.1 | Keywords: | Automation | ||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-03-13 16:40:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1583424 | ||||||
| Attachments: |
|
||||||
Ala, this error from Engine log looks familiar, didn't you fix a similar problem not long ago? The engine error is probably fixed by https://gerrit.ovirt.org/#/c/91930/ The thing is I that the failure at vdsm side is probably the root cause of the issue. This requires more investigations and may be related to BZ 1583424, that is also in the "extensive operations" area. OK, will target to 4.3 as BZ 1583424 This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. We have done tremendous changes to snapshot flows in 4.3, my guess is that this bug will not reproduce due to that, moving to ON_QA to confirm this Verified on engine-4.3.1.1-0.1.el7, vdsm-4.30.9-1.el7 after certain flows of extensive snapshots creation and deletion for a VM. This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1473302 [details] logs and flow screenshot Description of problem: After a certain flow of extensive snapshots creation and deletion for a VM, live merge failed on VDSM. The exact flow is described in the VM events screenshots taken from webadmdin. This was discovered during random automation testing focused on snapshots, i.e, testing randomly picked operations, like snapshot creation and deletion. Version-Release number of selected component (if applicable): rhvm-4.2.5.2-0.1.el7ev.noarch vdsm-4.20.35-1.el7ev.x86_64 libvirt-3.9.0-14.el7_5.6.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64 RHEL7.5 kernel - 3.10.0-862.10.2.el7.x86_64 How reproducible: Reproduced once Steps to Reproduce: The exact scenario is described in the attached screenshots. The VM was created from a template as a thin copy with disk on iSCSI Actual results: Task 7143de03-6b02-40ff-bbe2-6d72c5f6859e starts for getVolumeSize: 2018-08-04 17:31:48,107+0300 DEBUG (mailbox-hsm/4) [storage.TaskManager.Task] (Task='7143de03-6b02-40ff-bbe2-6d72c5f6859e') moving from state preparing -> state preparing (task:602) 2018-08-04 17:31:48,112+0300 INFO (mailbox-hsm/4) [vdsm.api] START getVolumeSize(sdUUID='5893e521-2452-493f-9295-731ee100016d', spUUID='817f9aad-e41a-45e2-891d-cb52182cf8c7', imgUUID='33f54dea-5482-44dd-9600-d8dae943db18', volUUID='664e68ce-2d3f-4813-b10b-f66ce26a444b', options=None) from=internal, task_id=7143de03-6b02-40ff-bbe2-6d72c5f6859e (api:46) Live merge fails: 2018-08-04 17:31:48,623+0300 INFO (mailbox-hsm/4) [virt.vm] (vmId='7741e7b1-5838-4762-8f2c-dc2a14fb0259') Extend volume 664e68ce-2d3f-4813-b10b-f66ce26a444b completed <Clock(total=5.76, extend-volume=4.39, refresh-volume=0.85)> (vm:1393) 2018-08-04 17:31:48,625+0300 ERROR (mailbox-hsm/4) [storage.TaskManager.Task] (Task='02a1f113-6c6e-4201-a7b4-82f7b0095932') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1401, in __afterVolumeExtension self._resume_if_needed() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1419, in _resume_if_needed self.cont() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1524, in cont self._underlyingCont() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4226, in _underlyingCont hooks.before_vm_cont(self._dom.XMLDesc(0), self._custom) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__ % self.vmid) NotConnectedError: VM '7741e7b1-5838-4762-8f2c-dc2a14fb0259' was not defined yet or was undefined Expected results: Live merge should succeed Additional info: logs and flow screenshot engine.log: 2018-08-04 17:32:08,522+03 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [snapshots_delete_b51bac20-4be5-4f48] Failed to live merge. Top volume 659220 47-6740-4d4c-b9a3-a36a3e833efc is still in qemu chain [6380b945-cb30-4a64-b072-56344fbd986b, 722a7396-db4b-4234-a93c-45e9bf029f10, 664e68ce-2d3f-4813-b10b-f66ce26a444b, 539f3b8b-e4cf-47e4-9a86-677fdd9420a8, 6592 2047-6740-4d4c-b9a3-a36a3e833efc]