Bug 846323
Summary: | vdsm: remove storage domain fails on format command with Failed reload error | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||||
Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Leonid Natapov <lnatapov> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.3 | CC: | abaron, bazulay, cpelland, ewarszaw, hateya, iheim, ilvovsky, jbiddle, lpeer, ykaul | ||||||
Target Milestone: | rc | Keywords: | Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | vdsm-4.9.6-36.0 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, removing multiple domains with extended luns would occasionally result in a race which would cause one of the removals to fail. This has been corrected so that only one lvm operation is used at a time, preventing the race and allowing domains to be successfully removed.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-12-04 19:04:54 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
once the domain failed on format we cannot remove it with the following errors (attaching full logs for that too): Thread-83692::ERROR::2012-08-07 16:10:59,475::sdc::145::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `7a627b01-4deb-4e10-aa5a-d8bffadd8a70` Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 140, in _findDomain return mod.findDomain(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1097, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 290, in __init__ metadata = selectMetadata(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 259, in selectMetadata if len(mdProvider) > 0: File "/usr/share/vdsm/storage/persistentDict.py", line 61, in __len__ return len(self.keys()) File "/usr/share/vdsm/storage/persistentDict.py", line 105, in keys return list(self.__iter__()) File "/usr/share/vdsm/storage/persistentDict.py", line 102, in __iter__ return ifilter(lambda k: k in self._validatorDict, self._dict.__iter__()) File "/usr/share/vdsm/storage/persistentDict.py", line 219, in __iter__ with self._accessWrapper(): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper self.refresh() File "/usr/share/vdsm/storage/persistentDict.py", line 224, in refresh lines = self._metaRW.readlines() File "/usr/share/vdsm/storage/blockSD.py", line 224, in readlines lvm.activateLVs(self._vgName, self._lvName) File "/usr/share/vdsm/storage/lvm.py", line 1027, in activateLVs _setLVAvailability(vgName, toActivate, "y") File "/usr/share/vdsm/storage/lvm.py", line 719, in _setLVAvailability raise error(str(e)) CannotActivateLogicalVolumes: Cannot activate Logical Volumes: ('General Storage Exception: ("5 [] [\' /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \' /dev/mapper/1Dafna-si13-011 343925: read failed after 0 of 4096 at 53687083008: Input/output error\', \' /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 0: Input/output error\', \' WARNING: Error counts reached a limit of 3. Device /dev/mapper/ 1Dafna-si13-011343925 was disabled\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687083008: Input/outp ut error\', \' /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 0: Input/output error\', \' WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-si13-021343925 was disabled\', \' One or more specifie d logical volume(s) not found.\']\\n7a627b01-4deb-4e10-aa5a-d8bffadd8a70/[\'metadata\']",)',) Thread-83692::ERROR::2012-08-07 16:10:59,477::task::853::TaskManager.Task::(_setError) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain self._recycle(sd) File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle dom.format(dom.sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__ dom = self.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce dom = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) Thread-83695::INFO::2012-08-07 16:10:59,479::logUtils::39::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'versi on': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'dis kfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}} Thread-83692::DEBUG::2012-08-07 16:10:59,479::task::872::TaskManager.Task::(_run) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Task._run: b0dcdaab-f7ce-447e-bb56-80acc1b9125e ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - stopping task Thread-83695::DEBUG::2012-08-07 16:10:59,480::task::1172::TaskManager.Task::(prepare) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::finished: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'version': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'diskfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}} Thread-83692::DEBUG::2012-08-07 16:10:59,480::task::1199::TaskManager.Task::(stop) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::stopping in state preparing (force False) Thread-83695::DEBUG::2012-08-07 16:10:59,481::task::588::TaskManager.Task::(_updateState) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::moving from state preparing -> state finished Thread-83692::DEBUG::2012-08-07 16:10:59,481::task::978::TaskManager.Task::(_decref) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::ref 1 aborting True Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e': < ResourceRef 'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e', isValid: 'True' obj: 'None'>} Thread-83692::INFO::2012-08-07 16:10:59,482::task::1157::TaskManager.Task::(prepare) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::aborting: Task is aborted: 'Storage domain does not exist' - code 358 Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Created attachment 602753 [details]
second remove failed logs
verified on si19 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |
Created attachment 602746 [details] logs Description of problem: I removed two domains (both extended with 2-3 luns) and failed remove on one of the domains during format. error in vdsm shows that it failed with reload on one of the luns seems like there might be a race since the lun we fail to reload is a lun that belonged to the second domain that I already removed (so we are trying to read from cache). Version-Release number of selected component (if applicable): si13 How reproducible: unknown Steps to Reproduce: 1. create several storage domain (try to extend 2 also) 2. remove the domains from UI 3. Actual results: we fail to remove one of the domains on format with reload error. Expected results: we should not fail on remove. Additional info: full vdsm and engine logs UI error: Error while executing action Remove Storage Domain: Error in storage domain action backend error: 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Failed in FormatStorageDomainVDS method 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Error code StorageDomainActionError and error message VDSGenericException: VDSErrorException: Failed to Form atStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 350 mMessage Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Vds: blond-vdsh 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Fa iled to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',) 2012-08-07 15:17:28,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] FINISH, FormatStorageDomainVDSCommand, log id: a77b96f 2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error me ssage VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bff add8a70',) Thread-81104::ERROR::2012-08-07 15:16:04,169::task::853::TaskManager.Task::(_setError) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain self._recycle(sd) File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle dom.format(dom.sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 782, in format lvm.removeVG(sdUUID) File "/usr/share/vdsm/storage/lvm.py", line 825, in removeVG pvs = [pv.name for pv in _lvminfo.getAllPvs() if pv.vg_name == vgName] File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: /dev/mapper/1Dafna-si13-021344240 Thread-81104::DEBUG::2012-08-07 15:16:04,185::task::872::TaskManager.Task::(_run) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Task._run: d35dd9ad-5458-4f32-872a-a86c6449e133 ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - s topping task