Bug 955661
Summary: | [vdsm] remove mapping fails after extend storage domain with force=True | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||||||
Component: | ovirt-engine | Assignee: | Daniel Erez <derez> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | unspecified | CC: | acanan, acathrow, amureini, bazulay, derez, fsimonce, iheim, jentrena, jkt, lpeer, Rhev-m-bugs, scohen, yeylon | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | 3.4.0 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | storage | ||||||||||
Fixed In Version: | ovirt-3.4.0-alpha1 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2013-07-04 11:31:01 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1078909, 1142926 | ||||||||||
Attachments: |
|
>
>
> Expected results:
>
> extendStorageDomain should succeed.
>
Using 'force' does not guarantee success (nothing can).
1. need to understand why vdsm failed removed the mapping (could be because it's in use or could be because of something we can actually fix) - what was 'using' the device in this case?
2. The expected behaviour in case of failure is a faster recovery.
In either case, this is not very severe.
(In reply to comment #1) Those LUNs were indeed in use (probably they were present in an old setup) , but I don't think it should be a reason for vdsm to fail in remove mapping. AFAIK vdsm supposed to override the data in the 'used' LUN in a specific order and it is not matter if they are 'used'. From the log it appears the dmsetup remove failed, which probably indicates that the LV was still open (which would indicate there is nothing vdsm can do). can you reproduce this? Unfortunately superVdsm does not log the error there. It might be failing because by the time we wanted to remove the device it was no longer there in which case we should check for eexist and proceed). Anyway, this is guess work. the problem here is that 'dm setup remove' fails since the removal is not done correctly in the needed order. each device is dependent on different, so you need to start from the last in the list. usually its working, not sure its worth the effort, adding devmapper keyword. The domain extension request was for domain 151df758-5a9d-4935-b4aa-c2edf61e1c9a, using device 3514f0c542a000796: Thread-145763::INFO::2013-04-23 11:32:48,786::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59', devlist=['3514f0c542a000796'], force=True, options=None) Problem is that such device was already part of the storage domain, combining data from getDeviceList and VGTagMetadataRW: {'GUID': '3514f0c542a000796', 'capacity': '107374182400', 'devtype': 'FCP', 'fwrev': '1.05', 'logicalblocksize': '512', 'pathlist': [], 'pathstatus': [{'lun': '1', 'physdev': 'sde', 'state': 'active', 'type': 'FCP'}, {'lun': '1', 'physdev': 'sdc', 'state': 'active', 'type': 'FCP'}], 'physicalblocksize': '4096', 'productID': 'XtremApp', 'pvUUID': 'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm', 'serial': 'SXtremIO_XtremApp_e5b3e7136cb2400e91fdb955bdf9caf9', 'status': 'used', 'vendorID': 'XtremIO', 'vgUUID': 'bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1'}, Thread-49121::DEBUG::2013-04-23 07:05:50,018::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['VGUUID=bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1', 'PHYBLKSIZE=4096', 'LOCKPOLICY=', 'VERSION=3', 'LOGBLKSIZE=512', 'SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a', 'LEASERETRIES=3', 'LOCKRENEWALINTERVALSEC=5', 'CLASS=Data', u'PV0=pv:3514f0c542a00077e,uuid:edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm,pestart:0,pecount:157,mapoffset:0', 'DESCRIPTION=SD-FC1', 'TYPE=FCP', 'LEASETIMESEC=60', 'IOOPTIMEOUTSEC=10', 'MASTER_VERSION=7', 'ROLE=Master', 'POOL_DESCRIPTION=DC-FC', 'POOL_UUID=885fe5c0-67d3-4060-a24b-849bebcb5d59', u'POOL_DOMAINS=70cb8838-3a86-4e9a-a2ea-4e5bb0d6c303:Active,151df758-5a9d-4935-b4aa-c2edf61e1c9a:Active', 'POOL_SPM_ID=1', '_SHA_CKSUM=31f9809ca71b59d26a3ddb7a86e0c7a67348572a', 'POOL_SPM_LVER=1'] Looking at pvUUID, you'll see that SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a is using edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm and device 3514f0c542a000796 has pvUUID edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm as well. In fact what removeMappingsHoldingDevice is trying to do is removing the "ids" mapping that instead is in use given that the storage domain is active: Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids` This should have been prevented by the frontend and backend. We could consider to add a check in vdsm as well (even though API-wise force=True is a last resort option meaning you know what you're doing). Fede, correct me if I'm wrong, but there are 2 LUNs here with the same PV UUID? Haim, can you reproduce this without replicating LUNs? I've manage to reproduce it: Thread-39575::ERROR::2013-07-24 09:22:54,884::task::850::TaskManager.Task::(_setError) Task=`063e7c51-b4c9-4786-8ea9-49fa3f39a2e5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2034, in createVG (force.capitalize() == "True"))) File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG _initpvs(pvs, metadataSize, force) File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs _initpvs_removeHolders() File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders os.path.basename(device)) File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice removeMapping(getDevName(holder)) File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping return getProxy().removeDeviceMapping(deviceName) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in removeDeviceMapping File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) Exception: Could not remove mapping `35fb7c84--28db--40c7--8047--c6074d0a041f-ids` It occurred because engine let me using one of master domain devices when I tried to create a new SD. logs will be attached Created attachment 779240 [details]
logs 2
seem like the same issue reproduced using is9.1 when trying to create new SD. on 10.35.116.3 (spm) - Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids [root@camel-vdsc ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert ids 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-ao--- 128.00m inbox 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m leases 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 2.00g master 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 1.00g metadata 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi------ 512.00m outbox 508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m 4807dcb6-6f42-4de2-a2e4-d49bcbcb21b3 a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi------ 10.00g ids a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao--- 128.00m inbox a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m leases a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 2.00g master a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao--- 1.00g metadata a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 512.00m outbox a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m lv_root vg0 -wi-ao--- 224.88g lv_swap vg0 -wi-ao--- 7.81g from logs ========= Thread-3133::ERROR::2013-08-14 13:44:20,307::task::850::TaskManager.Task::(_setError) Task=`0e0a27f8-0bf0-495a-97ec-ccaeeadd2565`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2051, in createVG (force.capitalize() == "True"))) File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG _initpvs(pvs, metadataSize, force) File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs _initpvs_removeHolders() File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders os.path.basename(device)) File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice removeMapping(getDevName(holder)) File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping return getProxy().removeDeviceMapping(deviceName) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in removeDeviceMapping File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) Exception: Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids` Created attachment 786501 [details]
logs
Daniel, The issue is that they're using a LUN that already belongs to an SD to create a new sd. The question is how is this allowed? In any event, vdsm failing is more than fine. Daniel, please add a link to gerrit in the tracker ovirt 3.4.0 alpha has been released Using a LUN which is in use by other domain in the setup is now blocked from the backend. Checked using REST. I'm getting the following message when I try to do it: <fault> <reason>Operation Failed</reason> <detail> [Cannot add Storage. The following LUNs are already part of existing storage domains: 3514f0c54d5c00006 (iscsi1-1).] </detail> </fault> Checked on ovirt-3.4.0-alpha1: ovirt-engine-3.4.0-0.2.master.20140106180914.el6.noarch Closing as part of 3.4.0 *** Bug 1133938 has been marked as a duplicate of this bug. *** |
Created attachment 738998 [details] logs Description of problem: [vdsm] vdsm fails to extend storage domain due to remove mapping failure (with force=True). Version-Release number of selected component (if applicable): vdsm-4.10.2-15.0.el6ev.x86_64 How reproducible: Happens only when force overriding LUN when extending storage domain, Cannot know for sure if it is 100% reproducible. Steps to Reproduce: On FC env.: 1) Extend storage domain with force=true (extend the storage domain with a 'used' LUN) Actual results: vdsm fails to extend storage domain with: Thread-146025::INFO::2013-04-23 11:42:59,918::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59', devlist=['3514f0c542a000796'], force=True, options=None) Thread-146025::DEBUG::2013-04-23 11:42:59,919::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__' Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' for lock type 'shared' Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' is free. Now locking as 'shared' (1 active user) Thread-146025::DEBUG::2013-04-23 11:42:59,921::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Granted req uest Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::811::TaskManager.Task::(resourceAcquired) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::_resourcesAcquired: Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a (shared) Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::974::TaskManager.Task::(_decref) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::ref 1 aborting False Thread-146025::INFO::2013-04-23 11:42:59,923::blockSD::779::Storage.StorageDomain::(readMetadataMapping) META MAPPING: {u'PV0': {u'uuid': u'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm', u'pestart': u'0', u'mapoffset': u'0', u'pecount': u'157', 'guid': u'3514f0c542a00077e'}} Thread-146025::ERROR::2013-04-23 11:43:00,013::task::850::TaskManager.Task::(_setError) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 41, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 682, in extendStorageDomain pool.extendSD(sdUUID, devlist, force) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 2014, in extendSD sdCache.produce(sdUUID).extend(devlist, force) File "/usr/share/vdsm/storage/blockSD.py", line 695, in extend lvm.extendVG(self.sdUUID, devices, force) File "/usr/share/vdsm/storage/lvm.py", line 914, in extendVG _initpvs(pvs, int(vg.vg_mda_size) / 2 ** 20, force) File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs _initpvs_removeHolders() File "/usr/share/vdsm/storage/lvm.py", line 667, in _initpvs_removeHolders os.path.basename(device)) File "/usr/share/vdsm/storage/devicemapper.py", line 166, in removeMappingsHoldingDevice removeMapping(getDevName(holder)) File "/usr/share/vdsm/storage/devicemapper.py", line 137, in removeMapping return getProxy().removeDeviceMapping(deviceName) File "/usr/share/vdsm/supervdsm.py", line 76, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 67, in <lambda> **kwargs) File "<string>", line 2, in removeDeviceMapping File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids` Expected results: extendStorageDomain should succeed. Additional info: See logs attached