Bug 1404332

Summary: Auto-generated snapshot remain on 'locked' state while trying to delete it after live storage migration
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.0.6CC: ahino, bugs, danken, eshenitz, gklein, tnisan
Target Milestone: ovirt-4.0.6Keywords: Automation, Regression
Target Release: ---Flags: rule-engine: ovirt-4.0.z+
gklein: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 07:25:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
engine and vdsm logs
none
New engine and vdsm logs none

Description Eyal Shenitzky 2016-12-13 15:42:24 UTC
Created attachment 1231254 [details]
engine and vdsm logs

Description of problem:

When running a VM with disk and trying to live migrate it,
an exception is thrown in the vdsm and the snapshot remain in 'locked' state:

periodic/1::ERROR::2016-12-13 17:03:45,477::executor::232::Executor::(_execute_task) Unhandled exception in Task(callable=<BlockjobMonitor vm=1f6853e1-dfc1-4288-8a12-b2f0b0f
5eb65 at 0x2bbb5d0>, timeout=7.5)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 230, in _execute_task
    task.callable()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 303, in __call__
    self._execute()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 374, in _execute
    self._vm.updateVmJobs()
  File "/usr/share/vdsm/virt/vm.py", line 4506, in updateVmJobs
    self._vmJobs = self.queryBlockJobs()
  File "/usr/share/vdsm/virt/vm.py", line 4529, in queryBlockJobs
    drive = self._findDriveByUUIDs(storedJob['disk'])
  File "/usr/share/vdsm/virt/vm.py", line 3248, in _findDriveByUUIDs
    raise LookupError("No such drive: '%s'" % drive)
LookupError: No such drive: '{'domainID': '64025f2c-8e50-4b1d-97cc-d2fd01cd804c', 'imageID': 'f8027bc7-3db0-401b-88af-37d550b670e7', 'volumeID': '9cfb1b83-c9ad-4aa3-a5c3-441
5c90dbede', 'poolID': '4f13324a-1cf9-4fd5-a469-8bce75cbe897'}'



Version-Release number of selected component (if applicable):
Engine -  4.0.6.3-0.1.el7ev
VDSM - 4.18.20-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with disk
2. Start the VM
3. Move the disk to another storage domain

Actual results:
An exception is thrown and the Auto-generated snapshot remain on 'locked' 

Expected results:
Auto-generated snapshot should delete when live storage migration is over

Additional info:
vdsm and engine log attached

Comment 1 Ala Hino 2016-12-13 18:53:59 UTC
Hi Eyal,

How many hosts do you have in this environment?
If you have multiple host, can you please send SPM and HSM (the host running the VM) logs?

Thanks.

Comment 2 Raz Tamir 2016-12-13 23:06:40 UTC
Created attachment 1231362 [details]
New engine and vdsm logs

Ala,
The environment contain 3 hosts.
In the attached new logs the VM run on HSM

Comment 3 Tal Nisan 2016-12-14 13:37:21 UTC
Targeting to 4.0.6 until we get a clear answer whether this is a blocker or not

Comment 4 Raz Tamir 2016-12-14 14:29:03 UTC
Live storage migration is blocked because of this so this is a blocker

Comment 5 Dan Kenigsberg 2016-12-15 14:11:34 UTC
In HSM log I see


Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 372, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 5002, in run
    self.teardown_top_volume()
  File "/usr/share/vdsm/virt/vm.py", line 4990, in teardown_top_volume
    sd.teardownVolume(self.drive.imageID, self.job['topVolume'])
  File "/usr/share/vdsm/storage/sdc.py", line 50, in __getattr__
    return getattr(self.getRealDomain(), attrName)
AttributeError: 'NfsStorageDomain' object has no attribute 'teardownVolume'

This was solved by Idce11c424c03c1b04f1ed3c84baca85a675c80ea in vdsm-v4.18.21.

Please state the tested vdsm version, preferably re-testing with the recent build of vdsm.

Comment 6 Dan Kenigsberg 2016-12-15 14:17:54 UTC
please test with v4.18.21

Comment 7 Eyal Shenitzky 2016-12-18 07:03:25 UTC
Verified with the following versions:
------------------------------------------
vdsm - 4.18.21-1.el7ev.x86_64
rhevm - 4.0.6.3-0.1.el7ev


Verified with the following scenario:

1. Create a VM with disk
2. Start the VM
3. Move the disk to another storage domain

Moving to VERIFIED!