Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1612454

Summary: Live merge fails with "NotConnectedError: VM '%VM_ID%' was not defined yet or was undefined" after an extensive snapshots operations
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: CoreAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.20.31CC: aefrat, bugs, tnisan
Target Milestone: ovirt-4.3.1Keywords: Automation
Target Release: ---Flags: rule-engine: ovirt-4.3+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-13 16:40:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1583424    
Attachments:
Description Flags
logs and flow screenshot none

Description Elad 2018-08-04 15:50:25 UTC
Created attachment 1473302 [details]
logs and flow screenshot

Description of problem:
After a certain flow of extensive snapshots creation and deletion for a VM, live merge failed on VDSM.
The exact flow is described in the VM events screenshots taken from webadmdin.
This was discovered during random automation testing focused on snapshots, i.e, testing randomly picked operations, like snapshot creation and deletion.


Version-Release number of selected component (if applicable):
rhvm-4.2.5.2-0.1.el7ev.noarch
vdsm-4.20.35-1.el7ev.x86_64
libvirt-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
RHEL7.5
kernel - 3.10.0-862.10.2.el7.x86_64


How reproducible:
Reproduced once

Steps to Reproduce:
The exact scenario is described in the attached screenshots.
The VM was created from a template as a thin copy with disk on iSCSI


Actual results:

Task 7143de03-6b02-40ff-bbe2-6d72c5f6859e starts for getVolumeSize:

2018-08-04 17:31:48,107+0300 DEBUG (mailbox-hsm/4) [storage.TaskManager.Task] (Task='7143de03-6b02-40ff-bbe2-6d72c5f6859e') moving from state preparing -> state preparing (task:602)
2018-08-04 17:31:48,112+0300 INFO  (mailbox-hsm/4) [vdsm.api] START getVolumeSize(sdUUID='5893e521-2452-493f-9295-731ee100016d', spUUID='817f9aad-e41a-45e2-891d-cb52182cf8c7', imgUUID='33f54dea-5482-44dd-9600-d8dae943db18', volUUID='664e68ce-2d3f-4813-b10b-f66ce26a444b', options=None) from=internal, task_id=7143de03-6b02-40ff-bbe2-6d72c5f6859e (api:46)
 

Live merge fails:


2018-08-04 17:31:48,623+0300 INFO  (mailbox-hsm/4) [virt.vm] (vmId='7741e7b1-5838-4762-8f2c-dc2a14fb0259') Extend volume 664e68ce-2d3f-4813-b10b-f66ce26a444b completed <Clock(total=5.76, extend-volume=4.39, refresh-volume=0.85)> (vm:1393)
2018-08-04 17:31:48,625+0300 ERROR (mailbox-hsm/4) [storage.TaskManager.Task] (Task='02a1f113-6c6e-4201-a7b4-82f7b0095932') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1401, in __afterVolumeExtension
    self._resume_if_needed()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1419, in _resume_if_needed
    self.cont()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1524, in cont
    self._underlyingCont()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4226, in _underlyingCont
    hooks.before_vm_cont(self._dom.XMLDesc(0), self._custom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__
    % self.vmid)
NotConnectedError: VM '7741e7b1-5838-4762-8f2c-dc2a14fb0259' was not defined yet or was undefined



Expected results:
Live merge should succeed

Additional info:
logs and flow screenshot



engine.log:

2018-08-04 17:32:08,522+03 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [snapshots_delete_b51bac20-4be5-4f48] Failed to live merge. Top volume 659220
47-6740-4d4c-b9a3-a36a3e833efc is still in qemu chain [6380b945-cb30-4a64-b072-56344fbd986b, 722a7396-db4b-4234-a93c-45e9bf029f10, 664e68ce-2d3f-4813-b10b-f66ce26a444b, 539f3b8b-e4cf-47e4-9a86-677fdd9420a8, 6592
2047-6740-4d4c-b9a3-a36a3e833efc]

Comment 1 Tal Nisan 2018-08-05 19:59:06 UTC
Ala, this error from Engine log looks familiar, didn't you fix a similar problem not long ago?

Comment 2 Ala Hino 2018-08-06 14:27:16 UTC
The engine error is probably fixed by https://gerrit.ovirt.org/#/c/91930/

The thing is I that the failure at vdsm side is probably the root cause of the issue. This requires more investigations and may be related to BZ 1583424, that is also in the "extensive operations" area.

Comment 3 Tal Nisan 2018-08-06 15:12:12 UTC
OK, will target to 4.3 as BZ 1583424

Comment 5 Sandro Bonazzola 2019-01-28 09:36:48 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 6 Tal Nisan 2019-03-01 15:30:22 UTC
We have done tremendous changes to snapshot flows in 4.3, my guess is that this bug will not reproduce due to that, moving to ON_QA to confirm this

Comment 7 Evelina Shames 2019-03-07 16:16:35 UTC
Verified on engine-4.3.1.1-0.1.el7, vdsm-4.30.9-1.el7 after certain flows of extensive snapshots creation and deletion for a VM.

Comment 8 Sandro Bonazzola 2019-03-13 16:40:30 UTC
This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.