Bug 1426916 - not failing early with getting a pv named "[unknown]".
Summary: not failing early with getting a pv named "[unknown]".
Keywords:
Status: CLOSED DUPLICATE of bug 1253640
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.19.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.0
: ---
Assignee: Dan Kenigsberg
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-26 10:52 UTC by mlehrer
Modified: 2017-03-22 15:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-19 16:52:05 UTC
oVirt Team: Storage
Embargoed:
mlehrer: needinfo-
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
Logs_On_Google_Drive (120 bytes, text/plain)
2017-02-26 10:52 UTC, mlehrer
no flags Details

Description mlehrer 2017-02-26 10:52:54 UTC
Created attachment 1257813 [details]
Logs_On_Google_Drive

Description of problem:

We noticed inconsistent behavior with re-creating an SD of the same ISCSI luns used previously (after detachment).  VDSM fails to add/dettach ISCSI storage domain saying " getDmId     deviceMultipathName) OSError: [Errno 19] Could not find dm device named `[unknown]`"


###
! Note ! - Logs
https://drive.google.com/drive/folders/0B8V1DXeGhPPWRE5PNmtJYmhEQ3M?usp=sharing
###


Version-Release number of selected component (if applicable):

Engine: ovirt-engine-4.0.7.1-0.1.el7ev.noarch
VDSM Version: vdsm-4.19.4-1.el7ev.x86_64

How reproducible:
Consistent

Steps to Reproduce:
Create multi lun iscsi  storage domain successfully
1) Dettach above sd domain and remove domain successfully, now vdsm log showed this error [1]
2) Now with storage domain no longer showing/existing in Webadmin UI I then re-created the storage domain via API and the storage domain attachment phase fails (before it passed)
3) Webadmin UI domains then shows this SD as detached; when attempting manual attachment / activation it fails with [2]



Actual results:

SD not connected/dettached properly.


[1] Error after dettaching
2017-02-21 15:12:30,955 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='54963613-a35f-4cb3-8816-ba88395f7439') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 847, in detachStorageDomain
    pool.detachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1012, in detachSD
    dom = sdCache.produce(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 112, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 136, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1599, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 958, in __init__
    lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))
  File "/usr/share/vdsm/storage/lvm.py", line 1124, in checkVGBlockSizes
    _checkpvsblksize(pvs, vgBlkSize)
  File "/usr/share/vdsm/storage/lvm.py", line 1101, in _checkpvsblksize
    pvBlkSize = _getpvblksize(pv)
  File "/usr/share/vdsm/storage/lvm.py", line 1095, in _getpvblksize
    dev = devicemapper.getDmId(os.path.basename(pv))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/devicemapper.py", line 42, in getDmId
    deviceMultipathName)
OSError: [Errno 19] Could not find dm device named `[unknown]`
2017-02-21 15:12:30,993 INFO  (jsonrpc/7) [storage.TaskManager.Task] (Task='54963613-a35f-4cb3-8816-ba88395f7439') aborting: Task is aborted: u'[Errno 19] Could not find dm device named `[unknown]`' - code 100 (task:1175)
2017-02-21 15:12:30,994 ERROR (jsonrpc/7) [storage.Dispatcher] [Errno 19] Could not find dm device named `[unknown]` (dispatcher:80)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 105, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1183, in prepare
    raise self.error
OSError: [Errno 19] Could not find dm device named `[unknown]`



[2] Failed Upon re-attachment attempt
2017-02-21 15:17:01,277 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='7a74fefc-d2f6-44e4-8ab0-57ae2f02077f') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1218, in activateStorageDomain
    pool.activateSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1089, in activateSD
    dom = sdCache.produce(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 112, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 136, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1599, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 958, in __init__
    lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))
  File "/usr/share/vdsm/storage/lvm.py", line 1124, in checkVGBlockSizes
    _checkpvsblksize(pvs, vgBlkSize)
  File "/usr/share/vdsm/storage/lvm.py", line 1101, in _checkpvsblksize
    pvBlkSize = _getpvblksize(pv)
  File "/usr/share/vdsm/storage/lvm.py", line 1095, in _getpvblksize
    dev = devicemapper.getDmId(os.path.basename(pv))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/devicemapper.py", line 42, in getDmId
    deviceMultipathName)
OSError: [Errno 19] Could not find dm device named `[unknown]`
2017-02-21 15:17:01,279 INFO  (jsonrpc/4) [storage.TaskManager.Task] (Task='7a74fefc-d2f6-44e4-8ab0-57ae2f02077f') aborting: Task is aborted: u'[Errno 19] Could not find dm device named `[unknown]`' - code 100 (task:1175)
2017-02-21 15:17:01,281 ERROR (jsonrpc/4) [storage.Dispatcher] [Errno 19] Could not find dm device named `[unknown]` (dispatcher:80)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 105, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1183, in prepare
    raise self.error
OSError: [Errno 19] Could not find dm device named `[unknown]`


[3] Error from Engine side
2017-02-21 15:17:02,285 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-38) [60b905af] Failed in 'ActivateStorageDomainVDS' method
2017-02-21 15:17:02,290 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-38) [60b905af] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command failed: Error in storage domain action: (u'sdUUID=54913ad9-6641-4cb6-93f4-041662855a51, spUUID=e805f94f-228a-48fd-803f-950882473d1c',)
2017-02-21 15:17:02,290 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-38) [60b905af] IrsBroker::Failed::ActivateStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Error in storage domain action: (u'sdUUID=54913ad9-6641-4cb6-93f4-041662855a51, spUUID=e805f94f-228a-48fd-803f-950882473d1c',), code = 350
2017-02-21 15:17:02,306 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-38) [60b905af] FINISH, ActivateStorageDomainVDSCommand, log id: 779f70dd
2017-02-21 15:17:02,306 ERROR [org.ovirt.engine.core.bll.storage.domain.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-38) [60b905af] Command 'org.ovirt.engine.core.bll.storage.domain.ActivateStorageDomainCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Error in storage domain action: (u'sdUUID=54913ad9-6641-4cb6-93f4-041662855a51, spUUID=e805f94f-228a-48fd-803f-950882473d1c',), code = 350 (Failed with error StorageDomainActionError and code 350)
2017-02-21 15:17:02,312 INFO  [org.ovirt.engine.core.bll.storage.domain.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-38) [60b905af] Command [id=9f49b4cd-bbb9-4ae7-82bc-150803f1c6f0]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='e805f94f-228a-48fd-803f-950882473d1c', storageId='54913ad9-6641-4cb6-93f4-041662855a51'}', status='Maintenance'}.
2017-02-21 15:17:02,318 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-38) [60b905af] Correlation ID: 12af981e, Job ID: 1b36e5cd-b0b4-47dc-ad2b-45118d1ce2ad, Call Stack: null, Custom Event ID: -1, Message: Failed to activate Storage Domain iscsi_sd_test (Data Center DC_REAL_1) by admin@internal-authz
2017-02-21 15:17:16,307 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (org.ovirt.thread.pool-6-thread-9) [] Adding domain '54913ad9-6641-4cb6-93f4-041662855a51' to the domains in maintenance cache



Expected results:
Successful domain attachment/dettachment.


Additional info:

##Initial Feedback from Dev##
This means that a PV is missing. In this case lvm return the special
name "[unknown]" instead of the pv name.

We have special code for detecting this so we can fail early in a clear way when a PV is missing, but it seems that we have some code path that do not detect this and instead of failing early with a clear message, try to get the
device dm id which will obviously fail because "[unknown]" is not a dm device name. The root cause of course is a missing PV, if you look at the output of
vgs and pvs commands you should see a "partial" vg and a pv with the name "[unknown]".

Inside the logs located # https://drive.google.com/drive/folders/0B8V1DXeGhPPWRE5PNmtJYmhEQ3M?usp=sharing

There is info.out folder showing PVs Vgs

Comment 1 Yaniv Kaul 2017-02-26 11:30:20 UTC
This is most likely a dup of a dup of a dup. We don't treat dirty LUNs nicely. I'd ask storage team what the original bug about this is.

Comment 2 Tal Nisan 2017-02-26 12:51:19 UTC
I think you are referring to bug 1253640

Comment 3 Allon Mureinik 2017-03-19 16:52:05 UTC

*** This bug has been marked as a duplicate of bug 1253640 ***


Note You need to log in before you can comment on or make changes to this bug.