Bug 846323
| Summary: | vdsm: remove storage domain fails on format command with Failed reload error | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||||
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.3 | CC: | abaron, bazulay, cpelland, ewarszaw, hateya, iheim, ilvovsky, jbiddle, lpeer, ykaul | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | vdsm-4.9.6-36.0 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Previously, removing multiple domains with extended luns would occasionally result in a race which would cause one of the removals to fail. This has been corrected so that only one lvm operation is used at a time, preventing the race and allowing domains to be successfully removed.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-12-04 19:04:54 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
once the domain failed on format we cannot remove it with the following errors (attaching full logs for that too):
Thread-83692::ERROR::2012-08-07 16:10:59,475::sdc::145::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `7a627b01-4deb-4e10-aa5a-d8bffadd8a70`
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 140, in _findDomain
return mod.findDomain(sdUUID)
File "/usr/share/vdsm/storage/blockSD.py", line 1097, in findDomain
return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
File "/usr/share/vdsm/storage/blockSD.py", line 290, in __init__
metadata = selectMetadata(sdUUID)
File "/usr/share/vdsm/storage/blockSD.py", line 259, in selectMetadata
if len(mdProvider) > 0:
File "/usr/share/vdsm/storage/persistentDict.py", line 61, in __len__
return len(self.keys())
File "/usr/share/vdsm/storage/persistentDict.py", line 105, in keys
return list(self.__iter__())
File "/usr/share/vdsm/storage/persistentDict.py", line 102, in __iter__
return ifilter(lambda k: k in self._validatorDict, self._dict.__iter__())
File "/usr/share/vdsm/storage/persistentDict.py", line 219, in __iter__
with self._accessWrapper():
File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper
self.refresh()
File "/usr/share/vdsm/storage/persistentDict.py", line 224, in refresh
lines = self._metaRW.readlines()
File "/usr/share/vdsm/storage/blockSD.py", line 224, in readlines
lvm.activateLVs(self._vgName, self._lvName)
File "/usr/share/vdsm/storage/lvm.py", line 1027, in activateLVs
_setLVAvailability(vgName, toActivate, "y")
File "/usr/share/vdsm/storage/lvm.py", line 719, in _setLVAvailability
raise error(str(e))
CannotActivateLogicalVolumes: Cannot activate Logical Volumes: ('General Storage Exception: ("5 [] [\' /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \' /dev/mapper/1Dafna-si13-011
343925: read failed after 0 of 4096 at 53687083008: Input/output error\', \' /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 0: Input/output error\', \' WARNING: Error counts reached a limit of 3. Device /dev/mapper/
1Dafna-si13-011343925 was disabled\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687083008: Input/outp
ut error\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 0: Input/output error\', \' WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-si13-021343925 was disabled\', \' One or more specifie
d logical volume(s) not found.\']\\n7a627b01-4deb-4e10-aa5a-d8bffadd8a70/[\'metadata\']",)',)
Thread-83692::ERROR::2012-08-07 16:10:59,477::task::853::TaskManager.Task::(_setError) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 861, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain
self._recycle(sd)
File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle
dom.format(dom.sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__
dom = self.getRealDomain()
File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce
dom = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70',)
Thread-83695::INFO::2012-08-07 16:10:59,479::logUtils::39::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'versi
on': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'dis
kfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}}
Thread-83692::DEBUG::2012-08-07 16:10:59,479::task::872::TaskManager.Task::(_run) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Task._run: b0dcdaab-f7ce-447e-bb56-80acc1b9125e ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - stopping task
Thread-83695::DEBUG::2012-08-07 16:10:59,480::task::1172::TaskManager.Task::(prepare) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::finished: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'version': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'diskfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}}
Thread-83692::DEBUG::2012-08-07 16:10:59,480::task::1199::TaskManager.Task::(stop) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::stopping in state preparing (force False)
Thread-83695::DEBUG::2012-08-07 16:10:59,481::task::588::TaskManager.Task::(_updateState) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::moving from state preparing -> state finished
Thread-83692::DEBUG::2012-08-07 16:10:59,481::task::978::TaskManager.Task::(_decref) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::ref 1 aborting True
Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e': < ResourceRef 'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e', isValid: 'True' obj: 'None'>}
Thread-83692::INFO::2012-08-07 16:10:59,482::task::1157::TaskManager.Task::(prepare) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::aborting: Task is aborted: 'Storage domain does not exist' - code 358
Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Created attachment 602753 [details]
second remove failed logs
verified on si19 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |
Created attachment 602746 [details] logs Description of problem: I removed two domains (both extended with 2-3 luns) and failed remove on one of the domains during format. error in vdsm shows that it failed with reload on one of the luns seems like there might be a race since the lun we fail to reload is a lun that belonged to the second domain that I already removed (so we are trying to read from cache). Version-Release number of selected component (if applicable): si13 How reproducible: unknown Steps to Reproduce: 1. create several storage domain (try to extend 2 also) 2. remove the domains from UI 3. Actual results: we fail to remove one of the domains on format with reload error. Expected results: we should not fail on remove. Additional info: full vdsm and engine logs UI error: Error while executing action Remove Storage Domain: Error in storage domain action backend error: 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Failed in FormatStorageDomainVDS method 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Error code StorageDomainActionError and error message VDSGenericException: VDSErrorException: Failed to Form atStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 350 mMessage Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Vds: blond-vdsh 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Fa iled to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] FINISH, FormatStorageDomainVDSCommand, log id: a77b96f 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error me ssage VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bff add8a70',) Thread-81104::ERROR::2012-08-07 15:16:04,169::task::853::TaskManager.Task::(_setError) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain self._recycle(sd) File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle dom.format(dom.sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 782, in format lvm.removeVG(sdUUID) File "/usr/share/vdsm/storage/lvm.py", line 825, in removeVG pvs = [pv.name for pv in _lvminfo.getAllPvs() if pv.vg_name == vgName] File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: /dev/mapper/1Dafna-si13-021344240 Thread-81104::DEBUG::2012-08-07 15:16:04,185::task::872::TaskManager.Task::(_run) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Task._run: d35dd9ad-5458-4f32-872a-a86c6449e133 ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - s topping task