Bug 955661

Summary: [vdsm] remove mapping fails after extend storage domain with force=True
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: acanan, acathrow, amureini, bazulay, derez, fsimonce, iheim, jentrena, jkt, lpeer, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.4.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.4.0-alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-04 11:31:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1078909, 1142926    
Attachments:
Description Flags
logs
none
logs 2
none
logs none

Description Elad 2013-04-23 13:33:40 UTC
Created attachment 738998 [details]
logs

Description of problem:

[vdsm] vdsm fails to extend storage domain due to remove mapping failure (with force=True).


Version-Release number of selected component (if applicable):

vdsm-4.10.2-15.0.el6ev.x86_64


How reproducible:

Happens only when force overriding LUN when extending storage domain, Cannot know for sure if it is 100% reproducible. 

Steps to Reproduce:

On FC env.:
1) Extend storage domain with force=true (extend the storage domain with a 'used' LUN) 

  
Actual results:

vdsm fails to extend storage domain with:

Thread-146025::INFO::2013-04-23 11:42:59,918::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59',
devlist=['3514f0c542a000796'], force=True, options=None)
Thread-146025::DEBUG::2013-04-23 11:42:59,919::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Request
was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__'
Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' for lock type 'shared'
Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' is free. Now locking as 'shared' (1 active user)
Thread-146025::DEBUG::2013-04-23 11:42:59,921::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Granted req
uest
Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::811::TaskManager.Task::(resourceAcquired) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::_resourcesAcquired: Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a (shared)
Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::974::TaskManager.Task::(_decref) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::ref 1 aborting False
Thread-146025::INFO::2013-04-23 11:42:59,923::blockSD::779::Storage.StorageDomain::(readMetadataMapping) META MAPPING: {u'PV0': {u'uuid': u'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm', u'pestart': u'0', u'mapoffset':
u'0', u'pecount': u'157', 'guid': u'3514f0c542a00077e'}}
Thread-146025::ERROR::2013-04-23 11:43:00,013::task::850::TaskManager.Task::(_setError) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 682, in extendStorageDomain
    pool.extendSD(sdUUID, devlist, force)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 2014, in extendSD
    sdCache.produce(sdUUID).extend(devlist, force)
  File "/usr/share/vdsm/storage/blockSD.py", line 695, in extend
    lvm.extendVG(self.sdUUID, devices, force)
  File "/usr/share/vdsm/storage/lvm.py", line 914, in extendVG
    _initpvs(pvs, int(vg.vg_mda_size) / 2 ** 20, force)
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 667, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 166, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 137, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 76, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 67, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids`



Expected results:

extendStorageDomain should succeed.


Additional info:  See logs attached

Comment 1 Ayal Baron 2013-04-28 10:05:42 UTC
> 
> 
> Expected results:
> 
> extendStorageDomain should succeed.
> 

Using 'force' does not guarantee success (nothing can).
1. need to understand why vdsm failed removed the mapping (could be because it's in use or could be because of something we can actually fix) - what was 'using' the device in this case?
2. The expected behaviour in case of failure is a faster recovery.

In either case, this is not very severe.

Comment 2 Elad 2013-04-29 07:12:19 UTC
(In reply to comment #1)

Those LUNs were indeed in use (probably they were present in an old setup) , but I don't think it should be a reason for vdsm to fail in remove mapping. AFAIK vdsm supposed to override the data in the 'used' LUN in a specific order and it is not matter if they are 'used'.

Comment 3 Ayal Baron 2013-05-08 11:23:07 UTC
From the log it appears the dmsetup remove failed, which probably indicates that the LV was still open (which would indicate there is nothing vdsm can do).
can you reproduce this?

Comment 4 Ayal Baron 2013-05-08 11:24:35 UTC
Unfortunately superVdsm does not log the error there. It might be failing because by the time we wanted to remove the device it was no longer there in which case we should check for eexist and proceed).  Anyway, this is guess work.

Comment 5 Haim 2013-07-04 11:41:47 UTC
the problem here is that 'dm setup remove' fails since the removal is not done correctly in the needed order.

each device is dependent on different, so you need to start from the last in the list.

usually its working, not sure its worth the effort, adding devmapper keyword.

Comment 6 Federico Simoncelli 2013-07-09 11:33:58 UTC
The domain extension request was for domain 151df758-5a9d-4935-b4aa-c2edf61e1c9a, using device 3514f0c542a000796:

Thread-145763::INFO::2013-04-23 11:32:48,786::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59', devlist=['3514f0c542a000796'], force=True, options=None)

Problem is that such device was already part of the storage domain, combining data from getDeviceList and VGTagMetadataRW:

     {'GUID': '3514f0c542a000796',
      'capacity': '107374182400',
      'devtype': 'FCP',
      'fwrev': '1.05',
      'logicalblocksize': '512',
      'pathlist': [],
      'pathstatus': [{'lun': '1',
                      'physdev': 'sde',
                      'state': 'active',
                      'type': 'FCP'},
                     {'lun': '1',
                      'physdev': 'sdc',
                      'state': 'active',
                      'type': 'FCP'}],
      'physicalblocksize': '4096',
      'productID': 'XtremApp',
      'pvUUID': 'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm',
      'serial': 'SXtremIO_XtremApp_e5b3e7136cb2400e91fdb955bdf9caf9',
      'status': 'used',
      'vendorID': 'XtremIO',
      'vgUUID': 'bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1'},

Thread-49121::DEBUG::2013-04-23 07:05:50,018::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['VGUUID=bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1', 'PHYBLKSIZE=4096', 'LOCKPOLICY=', 'VERSION=3', 'LOGBLKSIZE=512', 'SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a', 'LEASERETRIES=3', 'LOCKRENEWALINTERVALSEC=5', 'CLASS=Data', u'PV0=pv:3514f0c542a00077e,uuid:edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm,pestart:0,pecount:157,mapoffset:0', 'DESCRIPTION=SD-FC1', 'TYPE=FCP', 'LEASETIMESEC=60', 'IOOPTIMEOUTSEC=10', 'MASTER_VERSION=7', 'ROLE=Master', 'POOL_DESCRIPTION=DC-FC', 'POOL_UUID=885fe5c0-67d3-4060-a24b-849bebcb5d59', u'POOL_DOMAINS=70cb8838-3a86-4e9a-a2ea-4e5bb0d6c303:Active,151df758-5a9d-4935-b4aa-c2edf61e1c9a:Active', 'POOL_SPM_ID=1', '_SHA_CKSUM=31f9809ca71b59d26a3ddb7a86e0c7a67348572a', 'POOL_SPM_LVER=1']

Looking at pvUUID, you'll see that SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a is using edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm and device 3514f0c542a000796 has pvUUID edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm as well.

In fact what removeMappingsHoldingDevice is trying to do is removing the "ids" mapping that instead is in use given that the storage domain is active:

Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids`

This should have been prevented by the frontend and backend.
We could consider to add a check in vdsm as well (even though API-wise force=True is a last resort option meaning you know what you're doing).

Comment 7 Ayal Baron 2013-07-24 09:01:49 UTC
Fede, correct me if I'm wrong, but there are 2 LUNs here with the same PV UUID?

Haim, can you reproduce this without replicating LUNs?

Comment 8 Elad 2013-07-27 22:29:54 UTC
I've manage to reproduce it:

Thread-39575::ERROR::2013-07-24 09:22:54,884::task::850::TaskManager.Task::(_setError) Task=`063e7c51-b4c9-4786-8ea9-49fa3f39a2e5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2034, in createVG
    (force.capitalize() == "True")))
  File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `35fb7c84--28db--40c7--8047--c6074d0a041f-ids`



It occurred because engine let me using one of master domain devices when I tried to create a new SD.

logs will be attached

Comment 9 Elad 2013-07-27 22:34:38 UTC
Created attachment 779240 [details]
logs 2

Comment 10 Aharon Canan 2013-08-14 11:06:45 UTC
seem like the same issue reproduced using is9.1 when trying to create new SD.

on 10.35.116.3 (spm) - Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids

[root@camel-vdsc ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  ids                                  508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-ao--- 128.00m                                             
  inbox                                508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m                                             
  leases                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a----   2.00g                                             
  master                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a----   1.00g                                             
  metadata                             508ba8de-379c-4d39-bb81-b0c0c03bacac -wi------ 512.00m                                             
  outbox                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m                                             
  4807dcb6-6f42-4de2-a2e4-d49bcbcb21b3 a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi------  10.00g                                             
  ids                                  a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao--- 128.00m                                             
  inbox                                a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m                                             
  leases                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a----   2.00g                                             
  master                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao---   1.00g                                             
  metadata                             a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 512.00m                                             
  outbox                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 224.88g                                             
  lv_swap                              vg0                                  -wi-ao---   7.81g          

from logs
=========
Thread-3133::ERROR::2013-08-14 13:44:20,307::task::850::TaskManager.Task::(_setError) Task=`0e0a27f8-0bf0-495a-97ec-ccaeeadd2565`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2051, in createVG
    (force.capitalize() == "True")))
  File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids`

Comment 11 Aharon Canan 2013-08-14 11:10:13 UTC
Created attachment 786501 [details]
logs

Comment 12 Ayal Baron 2013-09-02 15:15:08 UTC
Daniel,

The issue is that they're using a LUN that already belongs to an SD to create a new sd.  The question is how is this allowed?
In any event, vdsm failing is more than fine.

Comment 15 Allon Mureinik 2013-09-29 11:17:38 UTC
Daniel, please add a link to gerrit in the tracker

Comment 16 Sandro Bonazzola 2014-01-14 08:45:03 UTC
ovirt 3.4.0 alpha has been released

Comment 17 Elad 2014-01-15 14:12:02 UTC
Using a LUN which is in use by other domain in the setup is now blocked from the backend. Checked using REST. I'm getting the following message when I try to do it:

<fault>
<reason>Operation Failed</reason>
<detail>
[Cannot add Storage. The following LUNs are already part of existing storage domains: 3514f0c54d5c00006 (iscsi1-1).]
</detail>
</fault>


Checked on ovirt-3.4.0-alpha1:
ovirt-engine-3.4.0-0.2.master.20140106180914.el6.noarch

Comment 19 Itamar Heim 2014-06-12 14:11:51 UTC
Closing as part of 3.4.0

Comment 20 Daniel Erez 2014-09-15 16:36:17 UTC
*** Bug 1133938 has been marked as a duplicate of this bug. ***