Bug 968933 - engine: after LSM fails because src domain is inaccessible, we cannot cold move the disk due to 'factory threw an exception' error in vdsm
Summary: engine: after LSM fails because src domain is inaccessible, we cannot cold mo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.5.0
Assignee: Vered Volansky
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard: storage
Depends On:
Blocks: rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2013-05-30 10:11 UTC by Dafna Ron
Modified: 2016-02-10 16:53 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-16 13:41:35 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
abaron: Triaged+


Attachments (Terms of Use)
logs (1.28 MB, application/x-gzip)
2013-05-30 10:11 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2013-05-30 10:11:08 UTC
Created attachment 754756 [details]
logs

Description of problem:

In iscsi storage with 2 hosts cluster, I created a vm on the master domain and than started the vm, when vm was up I started a LSM for the disk and blocked the master domain (src domain only) from both hosts using iptables. 
after master domain reconstruct, src domain was put in inactive and vm paused and failed the LSM, I restored the connectivity to the storage and once the domain was activated I resumed the vm and than powered it off. 
when I tried to move the vm's disk off line (when the vm was down) we get an error from vdsm. 

ImagePathError: Image path does not exist or cannot be accessed/created: ('/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/7414f930-bbdb-4ec6-8132-4640cbb3c722/images/aecc0c48-1174-4400-a0e8-002f6d3784a6',)

and than 

ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: ()
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::DEBUG::2013-05-30 12:47:49,264::task::869::TaskManager.Task::(_run) Task=`101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd`::Task._run: 101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd () {} failed - stopping task

Version-Release number of selected component (if applicable):

sf17.2
vdsm-4.10.2-22.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create two iscsi storage domains located on different servers
2. create a template and copy it to both domains
3. create a vm from the template on the master domain
4. run the vm and start a LSM on its disk 
5. when the task is sent to the vdsm block the master domain in both hosts using iptables
6. once the reconstruct finished, vm is paused and src domain becomes inactive, restore the connectivity to the master domain from both hosts. 
7. once the domain becomes active resume the vm
8. power off the vm and try to move the disk offline

Actual results:

we fail to move the disk off line with resource error from vdsm 

Expected results:

we should be able to move the vm disk

Additional info: logs

opening this bug as medium since vdsm restart will solve the issue. 


101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::ERROR::2013-05-30 12:47:49,259::blockVolume::403::Storage.Volume::(validateImagePath) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/blockVolume.py", line 401, in validateImagePath
    os.mkdir(imageDir, 0755)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/7414f930-bbdb-4ec6-8132-4640cbb3c722/images/aecc0c48-1174-4400-a0e8-002f6d
3784a6'
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::WARNING::2013-05-30 12:47:49,260::resourceManager::520::ResourceManager::(registerResource) Resource factory failed to create resource '74
14f930-bbdb-4ec6-8132-4640cbb3c722_imageNS.aecc0c48-1174-4400-a0e8-002f6d3784a6'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 518, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/share/vdsm/storage/image.py", line 316, in getChain
    srcVol = volclass(self.repoPath, sdUUID, imgUUID, uuidlist[0])
  File "/usr/share/vdsm/storage/blockVolume.py", line 80, in __init__
    volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 128, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/blockVolume.py", line 89, in validate
    volume.Volume.validate(self)
  File "/usr/share/vdsm/storage/volume.py", line 140, in validate
    self.validateImagePath()
  File "/usr/share/vdsm/storage/blockVolume.py", line 404, in validateImagePath
    raise se.ImagePathError(imageDir)
ImagePathError: Image path does not exist or cannot be accessed/created: ('/rhev/data-center/7fd33b43-a9f4-4eb7-a885-e9583a929ceb/7414f930-bbdb-4ec6-8132-4640cbb3c722/images/ae
cc0c48-1174-4400-a0e8-002f6d3784a6',)
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::DEBUG::2013-05-30 12:47:49,263::resourceManager::186::ResourceManager.Request::(cancel) ResName=`7414f930-bbdb-4ec6-8132-4640cbb3c722_imag
eNS.aecc0c48-1174-4400-a0e8-002f6d3784a6`ReqID=`2d8de0e9-1b8d-4acc-aa2a-6d1bbafbe18e`::Canceled request
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::WARNING::2013-05-30 12:47:49,263::resourceManager::180::ResourceManager.Request::(cancel) ResName=`7414f930-bbdb-4ec6-8132-4640cbb3c722_im
ageNS.aecc0c48-1174-4400-a0e8-002f6d3784a6`ReqID=`2d8de0e9-1b8d-4acc-aa2a-6d1bbafbe18e`::Tried to cancel a processed request
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::ERROR::2013-05-30 12:47:49,263::task::850::TaskManager.Task::(_setError) Task=`101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1743, in moveImage
  with nested(rmanager.acquireResource(srcImageResourcesNamespace, imgUUID, srcLock),
  File "/usr/share/vdsm/storage/resourceManager.py", line 468, in acquireResource
    raise se.ResourceAcqusitionFailed()
ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: ()
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::DEBUG::2013-05-30 12:47:49,264::task::869::TaskManager.Task::(_run) Task=`101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd`::Task._run: 101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd () {} failed - stopping task
101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd::DEBUG::2013-05-30 12:47:49,264::task::1194::TaskManager.Task::(stop) Task=`101d31fa-e9e5-4c9c-a5b1-6705e0af6cdd`::stopping in state running (force False)

Comment 1 Dafna Ron 2013-05-30 10:11:48 UTC
also, until vdsm restart vm will not run: 

Thread-745::ERROR::2013-05-30 13:06:45,085::vm::704::vm.Vm::(_startUnderlyingVm) vmId=`50b98a53-88ce-4364-863b-e8e80f1b0674`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 664, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1458, in _run
    self.preparePaths(devices[vm.DISK_DEVICES])
  File "/usr/share/vdsm/vm.py", line 725, in preparePaths
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/share/vdsm/clientIF.py", line 275, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'index': 0, 'iface': 'ide', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'poolID': '7fd33b43-a9f4-4eb7-a885-e9583a929ceb', 'volumeI
D': '7ae20c35-27d0-4e9e-848d-ab480b05472f', 'apparentsize': '1073741824', 'imageID': 'aecc0c48-1174-4400-a0e8-002f6d3784a6', 'specParams': {}, 'readonly': 'false', 'domainID': 
'7414f930-bbdb-4ec6-8132-4640cbb3c722', 'optional': 'false', 'deviceId': 'aecc0c48-1174-4400-a0e8-002f6d3784a6', 'truesize': '1073741824', 'address': {' controller': '0', ' tar
get': '0', 'unit': '0', ' bus': '0', ' type': 'drive'}, 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}

Comment 4 Allon Mureinik 2014-08-31 07:18:13 UTC
I could not reproduce this error - after shutting the VM down, cold move succeeds.
Moving to ON_QA to verify.

Comment 5 Kevin Alon Goldblatt 2014-09-17 15:28:32 UTC
I was able to move the disk of the VM after shutting the VM down. I am moving this defect to Verified. 

I followed the scenario described above, HOWEVER there was a change in the behavior described as follows: See my remarks in step 6:

Steps to Reproduce:
1. create two iscsi storage domains located on different servers
2. create a template and copy it to both domains
3. create a vm from the template on the master domain
4. run the vm and start a LSM on its disk 
5. when the task is sent to the vdsm block the master domain in both hosts using iptables
6. once the reconstruct finished, vm is paused >>>>(VM was not paused)<<<< and src domain becomes inactive, restore the connectivity to the master >>>>>(Master was transferred)<<<<< domain from both hosts. 
7. once the domain becomes active resume the vm
8. power off the vm and try to move the disk offline


Note You need to log in before you can comment on or make changes to this bug.