RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 846323 - vdsm: remove storage domain fails on format command with Failed reload error
Summary: vdsm: remove storage domain fails on format command with Failed reload error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eduardo Warszawski
QA Contact: Leonid Natapov
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-07 13:15 UTC by Dafna Ron
Modified: 2022-07-09 05:37 UTC (History)
10 users (show)

Fixed In Version: vdsm-4.9.6-36.0
Doc Type: Bug Fix
Doc Text:
Previously, removing multiple domains with extended luns would occasionally result in a race which would cause one of the removals to fail. This has been corrected so that only one lvm operation is used at a time, preventing the race and allowing domains to be successfully removed.
Clone Of:
Environment:
Last Closed: 2012-12-04 19:04:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (534.89 KB, application/x-gzip)
2012-08-07 13:15 UTC, Dafna Ron
no flags Details
second remove failed logs (447.70 KB, application/x-gzip)
2012-08-07 13:28 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 0 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Dafna Ron 2012-08-07 13:15:28 UTC
Created attachment 602746 [details]
logs

Description of problem:

I removed two domains (both extended with 2-3 luns) and failed remove on one of the domains during format. 

error in vdsm shows that it failed with reload on one of the luns

seems like there might be a race since the lun we fail to reload is a lun that belonged to the second domain that I already removed (so we are trying to read from cache). 

Version-Release number of selected component (if applicable):

si13

How reproducible:

unknown

Steps to Reproduce:
1. create several storage domain (try to extend 2 also)
2. remove the domains from UI
3.
  
Actual results:

we fail to remove one of the domains on format with reload error. 

Expected results:

we should not fail on remove. 

Additional info: full vdsm and engine logs


UI error: 

Error while executing action Remove Storage Domain: Error in storage domain action

backend error: 

2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Failed in FormatStorageDomainVDS method
2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Error code StorageDomainActionError and error message VDSGenericException: VDSErrorException: Failed to Form
atStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',)
2012-08-07 15:17:28,383 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         350
mMessage                      Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',)


2012-08-07 15:17:28,383 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Vds: blond-vdsh
2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Fa
iled to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bffadd8a70',)
2012-08-07 15:17:28,383 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] FINISH, FormatStorageDomainVDSCommand, log id: a77b96f
2012-08-07 15:17:28,383 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8009-15) [3beddaa7] Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error me
ssage VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error in storage domain action: ('sdUUID=7a627b01-4deb-4e10-aa5a-d8bff
add8a70',)



Thread-81104::ERROR::2012-08-07 15:16:04,169::task::853::TaskManager.Task::(_setError) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain
    self._recycle(sd)
  File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle
    dom.format(dom.sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 782, in format
    lvm.removeVG(sdUUID)
  File "/usr/share/vdsm/storage/lvm.py", line 825, in removeVG
    pvs = [pv.name for pv in _lvminfo.getAllPvs() if pv.vg_name == vgName]
  File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: /dev/mapper/1Dafna-si13-021344240
Thread-81104::DEBUG::2012-08-07 15:16:04,185::task::872::TaskManager.Task::(_run) Task=`d35dd9ad-5458-4f32-872a-a86c6449e133`::Task._run: d35dd9ad-5458-4f32-872a-a86c6449e133 ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - s
topping task

Comment 1 Dafna Ron 2012-08-07 13:27:49 UTC
once the domain failed on format we cannot remove it with the following errors (attaching full logs for that too): 

Thread-83692::ERROR::2012-08-07 16:10:59,475::sdc::145::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `7a627b01-4deb-4e10-aa5a-d8bffadd8a70`
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 140, in _findDomain
    return mod.findDomain(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1097, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 290, in __init__
    metadata = selectMetadata(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 259, in selectMetadata
    if len(mdProvider) > 0:
  File "/usr/share/vdsm/storage/persistentDict.py", line 61, in __len__
    return len(self.keys())
  File "/usr/share/vdsm/storage/persistentDict.py", line 105, in keys
    return list(self.__iter__())
  File "/usr/share/vdsm/storage/persistentDict.py", line 102, in __iter__
    return ifilter(lambda k: k in self._validatorDict, self._dict.__iter__())
  File "/usr/share/vdsm/storage/persistentDict.py", line 219, in __iter__
    with self._accessWrapper():
  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
    return self.gen.next()
  File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper
    self.refresh()
  File "/usr/share/vdsm/storage/persistentDict.py", line 224, in refresh
    lines = self._metaRW.readlines()
  File "/usr/share/vdsm/storage/blockSD.py", line 224, in readlines
    lvm.activateLVs(self._vgName, self._lvName)
  File "/usr/share/vdsm/storage/lvm.py", line 1027, in activateLVs
    _setLVAvailability(vgName, toActivate, "y")
  File "/usr/share/vdsm/storage/lvm.py", line 719, in _setLVAvailability
    raise error(str(e))
CannotActivateLogicalVolumes: Cannot activate Logical Volumes: ('General Storage Exception: ("5 [] [\'  /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \'  /dev/mapper/1Dafna-si13-011
343925: read failed after 0 of 4096 at 53687083008: Input/output error\', \'  /dev/mapper/1Dafna-si13-011343925: read failed after 0 of 4096 at 0: Input/output error\', \'  WARNING: Error counts reached a limit of 3. Device /dev/mapper/
1Dafna-si13-011343925 was disabled\', \'  /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687025664: Input/output error\', \'  /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 53687083008: Input/outp
ut error\', \'  /dev/mapper/1Dafna-si13-021343925: read failed after 0 of 4096 at 0: Input/output error\', \'  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-si13-021343925 was disabled\', \'  One or more specifie
d logical volume(s) not found.\']\\n7a627b01-4deb-4e10-aa5a-d8bffadd8a70/[\'metadata\']",)',)
Thread-83692::ERROR::2012-08-07 16:10:59,477::task::853::TaskManager.Task::(_setError) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2226, in formatStorageDomain
    self._recycle(sd)
  File "/usr/share/vdsm/storage/hsm.py", line 2187, in _recycle
    dom.format(dom.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__
    dom = self.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce
    dom = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70',)
Thread-83695::INFO::2012-08-07 16:10:59,479::logUtils::39::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'versi
on': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'dis
kfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}}
Thread-83692::DEBUG::2012-08-07 16:10:59,479::task::872::TaskManager.Task::(_run) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::Task._run: b0dcdaab-f7ce-447e-bb56-80acc1b9125e ('7a627b01-4deb-4e10-aa5a-d8bffadd8a70', False) {} failed - stopping task
Thread-83695::DEBUG::2012-08-07 16:10:59,480::task::1172::TaskManager.Task::(prepare) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::finished: {'info': {'spm_id': 1, 'master_uuid': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea', 'name': 'iSCSI', 'version': '3', 'domains': 'a93432d7-b235-4e4b-800d-43b3a42ba2ea:Active', 'pool_status': 'connected', 'isoprefix': '', 'type': 'ISCSI', 'master_ver': 2, 'lver': 0}, 'dominfo': {'a93432d7-b235-4e4b-800d-43b3a42ba2ea': {'status': 'Active', 'diskfree': '49123688448', 'alerts': [], 'disktotal': '53284438016'}}}
Thread-83692::DEBUG::2012-08-07 16:10:59,480::task::1199::TaskManager.Task::(stop) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::stopping in state preparing (force False)
Thread-83695::DEBUG::2012-08-07 16:10:59,481::task::588::TaskManager.Task::(_updateState) Task=`74c2df23-2a82-4420-acfc-d6ee4ca591c4`::moving from state preparing -> state finished
Thread-83692::DEBUG::2012-08-07 16:10:59,481::task::978::TaskManager.Task::(_decref) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::ref 1 aborting True
Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e': < ResourceRef 'Storage.5acd19f1-c608-41f6-a4e4-4dc90850356e', isValid: 'True' obj: 'None'>}
Thread-83692::INFO::2012-08-07 16:10:59,482::task::1157::TaskManager.Task::(prepare) Task=`b0dcdaab-f7ce-447e-bb56-80acc1b9125e`::aborting: Task is aborted: 'Storage domain does not exist' - code 358
Thread-83695::DEBUG::2012-08-07 16:10:59,482::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Comment 2 Dafna Ron 2012-08-07 13:28:19 UTC
Created attachment 602753 [details]
second remove failed logs

Comment 3 Eduardo Warszawski 2012-08-09 18:08:20 UTC
http://gerrit.ovirt.org/#/c/7066/

Comment 6 Dafna Ron 2012-09-27 12:29:04 UTC
verified on si19

Comment 9 errata-xmlrpc 2012-12-04 19:04:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.