Bug 1766595 - Webadmin and RESTAPI - creating/managing/importing ISCSI storage domain not possible - error block device action: (), code = 600
Summary: Webadmin and RESTAPI - creating/managing/importing ISCSI storage domain not p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.4.0
: 4.40.7
Assignee: Vojtech Juranek
QA Contact: Ilan Zuckerman
URL:
Whiteboard:
Depends On: 1784010
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-29 13:29 UTC by Avihai
Modified: 2020-05-20 20:02 UTC (History)
6 users (show)

Fixed In Version: vdsm-4.40.7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:02:24 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
engine and vdsm logs (624.12 KB, application/gzip)
2019-10-29 13:29 UTC, Avihai
no flags Details
restapi_createsd_failed_logs (660.82 KB, application/gzip)
2019-11-03 12:03 UTC, Avihai
no flags Details
import iscsi SD fails logs (816.63 KB, application/gzip)
2019-11-03 12:47 UTC, Avihai
no flags Details
video_capture of the ISCSI SD import issue (4.20 MB, video/mp4)
2019-11-03 12:50 UTC, Avihai
no flags Details
attach SD failed vg logs (541.59 KB, application/gzip)
2019-11-04 18:11 UTC, Avihai
no flags Details
engine and vdsm logs remove sd that recreate it scenario (433.61 KB, application/gzip)
2019-11-05 09:33 UTC, Avihai
no flags Details
journaltctl log of the host which the issue occured on (534.62 KB, text/plain)
2019-11-06 14:25 UTC, Avihai
no flags Details
messages from the host (2.45 KB, text/plain)
2019-11-24 13:28 UTC, Ilan Zuckerman
no flags Details
messages from the host (2.45 KB, text/plain)
2019-11-24 13:31 UTC, Ilan Zuckerman
no flags Details
engine log with failed attempt to creade sd (87.73 KB, text/plain)
2019-11-24 13:32 UTC, Ilan Zuckerman
no flags Details
engine and vdsm supervdsm logs of the 4.4.0-13 reproduction (1.52 MB, application/gzip)
2020-01-01 13:59 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 104498 0 'None' MERGED multipath: ignore scsi_id failures 2020-09-30 07:30:41 UTC
oVirt gerrit 106076 0 master MERGED tests: add getPathsStatus() test when no device is found 2020-09-30 07:30:41 UTC
oVirt gerrit 106077 0 master MERGED storage: determine correctly case when dmsetup doesn't find any devices 2020-09-30 07:30:35 UTC

Description Avihai 2019-10-29 13:29:15 UTC
Created attachment 1630176 [details]
engine and vdsm logs

Description of problem:
Via Webadmin try to create an ISCSI storage domain 
-> creating new domain not possible as window is stuck on "Loading... A large number of LUNs may slow down the operation"

At engine you see the following errors:
2019-10-29 15:18:55,077+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] START, GetDeviceListVDSCommand(HostName = host_mixed
_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='ISCSI', checkStatus='false', lunIds='null'}), log id: 5d760c11
2019-10-29 15:18:56,227+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] Failed in 'GetDeviceListVDS' method
2019-10-29 15:18:56,238+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDS
M host_mixed_1 command GetDeviceListVDS failed: Error block device action: ()
2019-10-29 15:18:56,238+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.G
etDeviceListVDSCommand' return value '
LUNListReturn:{status='Status [code=600, message=Error block device action: ()]'}
'
2019-10-29 15:18:56,238+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] HostName = host_mixed_1
2019-10-29 15:18:56,238+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] Command 'GetDeviceListVDSCommand(HostName = host_mix
ed_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='ISCSI', checkStatus='false', lunIds='null'})' execution failed: VDSGenericException: VDSErrorException: Faile
d to GetDeviceListVDS, error = Error block device action: (), code = 600
2019-10-29 15:18:56,238+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] FINISH, GetDeviceListVDSCommand, return: , log id: 5
d760c11
2019-10-29 15:18:56,239+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] Query 'GetDeviceListQuery' failed: EngineException: org.
ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActio
nError and code 600)
2019-10-29 15:18:56,239+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-11) [94706171-8714-4991-938f-00871a0d85cc] Exception: org.ovirt.engine.core.common.errors.EngineExc
eption: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Faile
d with error BlockDeviceActionError and code 600)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.QueriesCommandBase.runVdsCommand(QueriesCommandBase.java:246)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery.executeQueryCommand(GetDeviceListQuery.java:47)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106)
        at org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecutor.execute(DefaultBackendQueryExecutor.java:14)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:521)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:490)
        at jdk.internal.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)


VDSM log:
2019-10-29 09:18:56,213-0400 INFO  (jsonrpc/2) [vdsm.api] FINISH getDeviceList error=not enough values to unpack (expected 2, got 1) from=::ffff:10.35.161.118,32768, flow_id=94706171-8714-4991-938f-00871a0d85cc, task_id=b94be1b9-21d9-48cc-a59c-1bba171d9ed9 (api:52)
2019-10-29 09:18:56,214-0400 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='b94be1b9-21d9-48cc-a59c-1bba171d9ed9') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 248, in pathListIter
    pathStatuses = devicemapper.getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line 170, in getPathsStatus
    return supervdsm.getProxy().devicemapper_getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in devicemapper_getPathsStatus
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
ValueError: not enough values to unpack (expected 2, got 1)
2019-10-29 09:18:56,214-0400 INFO  (jsonrpc/2) [storage.TaskManager.Task] (Task='b94be1b9-21d9-48cc-a59c-1bba171d9ed9') aborting: Task is aborted: 'not enough values to unpack (expected 2, got 1)' - code 100 (task:1200)
2019-10-29 09:18:56,215-0400 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH getDeviceList error=not enough values to unpack (expected 2, got 1) (dispatcher:87)


Version-Release number of selected component (if applicable):
rhv-4.4.0-4 (http://bob.eng.lab.tlv.redhat.com/builds/4.4/rhv-4.4.0-4/)

ovirt-engine-4.4.0-0.4.master.el7.noarch
vdsm-4.40.0-127.gitc628cce.el8ev.x86_64
python3-libvirt-5.0.0-5.module+el8.0.1+3755+6782b0ed.x86_64
qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Via webadmin create a new ISCSI storage domain

Actual results:
Creating new domain not possible as window is stuck on "Loading... A large number of LUNs may slow down the operation"

Expected results:


Additional info:

Comment 1 Vojtech Juranek 2019-10-30 09:17:32 UTC
This seems like a bug in multiprocessing module. The root cause is same as in BZ #1766582 - there are no attached iSCSI devices, therefore device mapper fails and multiprocessing assumes that always gets kind and result from a call, but this is obviously not the case and therefore it fails with this exception. I'd propose close this bug (unless we want to dig into multiprocessing module and eventually fix it) and track the issue with attaching iSCSI device under BZ #1766582.

Comment 2 Tal Nisan 2019-11-03 10:29:29 UTC

*** This bug has been marked as a duplicate of bug 1766582 ***

Comment 3 Avihai 2019-11-03 11:54:33 UTC
Ok , so the issue that the SD's did not go up and no SCSI attached devices were seen is due to the fact that the initiatorname.iscsi was deleted in OS re-provisioning.
When I changed the initiatorname.iscsi to their old value + iscsid service restart + reconnect ISCSI target -> the pre upgrade ISCSI SD's went up.

*** But still we have the creating new SD's issue from UI, I still see that we are stuck in 'new domain' window on loading luns stage.***
*** So we are still blocked creating a new SD via UI *** 

I am closing this bug as ISCSI SD's will go up if you configure initiatorname.iscsi correctly. 
I am reopening bug 1766595 as we still cannot create new SD's via REST/UI.

Now iscsi attachment is working:
[root@storage-ge4-vdsm1 ~]#  iscsiadm -m session  -P 3
iSCSI Transport Class version 2.0-870
version 6.2.0.877-0
Target: iqn.2000-05.com.3pardata:20210002ac021f6b (non-flash)
	Current Portal: 10.35.146.1:3260,21
	Persistent Portal: 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:storage-ge4-vdsm1
		Iface IPaddress: 10.35.82.63
		Iface HWaddress: default
		Iface Netdev: default
		SID: 6
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
		*********
		Timeouts:
		*********
		Recovery Timeout: 10
		Target Reset Timeout: 30
		LUN Reset Timeout: 30
		Abort Timeout: 15
		*****
		CHAP:
		*****
		username: <empty>
		password: ********
		username_in: <empty>
		password_in: ********
		************************
		Negotiated iSCSI params:
		************************
		HeaderDigest: None
		DataDigest: None
		MaxRecvDataSegmentLength: 262144
		MaxXmitDataSegmentLength: 65536
		FirstBurstLength: 16384
		MaxBurstLength: 262144
		ImmediateData: Yes
		InitialR2T: Yes
		MaxOutstandingR2T: 1
		************************
		Attached SCSI devices:
		************************
		Host Number: 3	State: running
		scsi3 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdb		State: running
		scsi3 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdc		State: running
		scsi3 Channel 00 Id 0 Lun: 2
			Attached scsi disk sdd		State: running
		scsi3 Channel 00 Id 0 Lun: 254
		scsi3 Channel 00 Id 0 Lun: 3
			Attached scsi disk sde		State: running
		scsi3 Channel 00 Id 0 Lun: 4
			Attached scsi disk sdf		State: running
		scsi3 Channel 00 Id 0 Lun: 5
			Attached scsi disk sdg		State: running
Target: iqn.2000-05.com.3pardata:21210002ac021f6b (non-flash)
	Current Portal: 10.35.146.2:3260,121
	Persistent Portal: 10.35.146.2:3260,121
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:storage-ge4-vdsm1
		Iface IPaddress: 10.35.82.63
		Iface HWaddress: default
		Iface Netdev: default
		SID: 7
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
		*********
		Timeouts:
		*********
		Recovery Timeout: 10
		Target Reset Timeout: 30
		LUN Reset Timeout: 30
		Abort Timeout: 15
		*****
		CHAP:
		*****
		username: <empty>
		password: ********
		username_in: <empty>
		password_in: ********
		************************
		Negotiated iSCSI params:
		************************
		HeaderDigest: None
		DataDigest: None
		MaxRecvDataSegmentLength: 262144
		MaxXmitDataSegmentLength: 65536
		FirstBurstLength: 16384
		MaxBurstLength: 262144
		ImmediateData: Yes
		InitialR2T: Yes
		MaxOutstandingR2T: 1
		************************
		Attached SCSI devices:
		************************
		Host Number: 4	State: running
		scsi4 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdh		State: running
		scsi4 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdi		State: running
		scsi4 Channel 00 Id 0 Lun: 2
			Attached scsi disk sdj		State: running
		scsi4 Channel 00 Id 0 Lun: 254
		scsi4 Channel 00 Id 0 Lun: 3
			Attached scsi disk sdk		State: running
		scsi4 Channel 00 Id 0 Lun: 4
			Attached scsi disk sdl		State: running
		scsi4 Channel 00 Id 0 Lun: 5
			Attached scsi disk sdm		State: running

Comment 4 Avihai 2019-11-03 12:01:53 UTC
When trying to create a SD via REST I also fail with the following issues.


Rest request details:

method : POST
URL: https://storage-ge-04.scl.lab.tlv.redhat.com/ovirt-engine/api/storagedomains

Header:
"key":"Content-Type","value":"application/xml"

Body:
<storage_domain>
    <name>sd_test1</name>
    <storage>
        <logical_units>
            <logical_unit id="360002ac0000000000000001900021f6b">
                <address>10.35.146.1</address>
                <port>3260</port>
                <target>iqn.2000-05.com.3pardata:20210002ac021f6b</target>
            </logical_unit>
        </logical_units>
        <override_luns>true</override_luns>
        <type>iscsi</type>
    </storage>
    <storage_format>v5</storage_format>
    <type>data</type>
    <host>
        <name>host_mixed_1</name>
    </host>
</storage_domain>

Rest response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>Error block device action</detail>
    <reason>Operation Failed</reason>
</fault>


Engine:
2019-11-03 13:48:39,365+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] Failed in 'GetDeviceListVDS' method
2019-11-03 13:48:39,374+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDS
M host_mixed_1 command GetDeviceListVDS failed: Error block device action: ()
2019-11-03 13:48:39,374+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand' return value '
LUNListReturn:{status='Status [code=600, message=Error block device action: ()]'}
'
2019-11-03 13:48:39,374+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] HostName = host_mixed_1
2019-11-03 13:48:39,375+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] Command 'GetDeviceListVDSCommand(HostName = host_mixed_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='ISCSI', checkStatus='false', lunIds='null'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600
2019-11-03 13:48:39,375+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] FINISH, GetDeviceListVDSCommand, return: , log id: 166f9b26
2019-11-03 13:48:39,375+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] Query 'GetDeviceListQuery' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
2019-11-03 13:48:39,375+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-56) [bf041a57-0b6b-4e43-b8c8-c53acfda41a4] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.QueriesCommandBase.runVdsCommand(QueriesCommandBase.java:246)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery.executeQueryCommand(GetDeviceListQuery.java:47)

..........................................................................................................................................................
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73)
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84)
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72)
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56)
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79)
        at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68)
        at deployment.engine.ear//org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source)
        ... 202 more

2019-11-03 13:48:39,382+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-56) [] Operation Failed: Error block device action


VDSM:
2019-11-03 06:48:39,352-0500 INFO  (jsonrpc/6) [vdsm.api] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b'' from=::ffff:10.35.161.118,59166, flow_id=bf041a57-0b6b-4e43-b8c8-c53acfda41a4, task_id=2f041d31-1c09-4246-98bc-23615ef186fa (api:52)
2019-11-03 06:48:39,353-0500 ERROR (jsonrpc/6) [storage.TaskManager.Task] (Task='2f041d31-1c09-4246-98bc-23615ef186fa') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
2019-11-03 06:48:39,354-0500 INFO  (jsonrpc/6) [storage.TaskManager.Task] (Task='2f041d31-1c09-4246-98bc-23615ef186fa') aborting: Task is aborted: "Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''" - code 100 (task:1200)

supervdsm.log:
MainProcess|jsonrpc/6::ERROR::2019-11-03 06:48:39,351::supervdsm_server::96::SuperVdsm.ServerCallback::(wrapper) Error in multipath_get_scsi_serial
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 94, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_api/multipath.py", line 34, in multipath_get_scsi_serial
    return multipath.get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 211, in get_scsi_serial
    out = commands.run(cmd)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
MainProcess|monitor/788f4bf::DEBUG::2019-11-03 06:48:39,485::supervdsm_server::92::SuperVdsm.ServerCallback::(wrapper) call hbaRescan with (<vdsm.supervdsm_server._SuperVdsm object at 0x7ff073533550>,) {}



2019-11-03 06:48:39,355-0500 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b'' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
2019-11-03 06:48:39,355-0500 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getDeviceList failed (error 600) in 1.20 seconds (__init__:312)

Comment 5 Avihai 2019-11-03 12:03:18 UTC
Created attachment 1632114 [details]
restapi_createsd_failed_logs

Comment 6 Avihai 2019-11-03 12:05:55 UTC
Also trying to manage an existing ISCSI storage domain in webadmin is not possible as we are stuck with
 "Loading... A large number of LUNs may slow down the operation"

Comment 7 Avihai 2019-11-03 12:45:30 UTC
Also trying to import ISCSI storage domain via webadmin fails with the same issue.
Here logging to ISCSI targets fails and we can not see the storage LUNS so storage migration can not proceed.

Also here same ERROR is seen in engine.log and VDSM:

Engine:
2019-11-03 14:29:07,321+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] START, GetDeviceListVDSCommand(HostName = host_mixed_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='ISCSI', checkStatus='false', lunIds='null'}), log id: 3fbf9b9a
2019-11-03 14:29:09,403+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] Failed in 'GetDeviceListVDS' method
2019-11-03 14:29:09,412+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-56) [7f010914] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command GetDeviceListVDS failed: Error block device action: ()
2019-11-03 14:29:09,412+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand' return value '
LUNListReturn:{status='Status [code=600, message=Error block device action: ()]'}
'
2019-11-03 14:29:09,412+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] HostName = host_mixed_1
2019-11-03 14:29:09,412+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] Command 'GetDeviceListVDSCommand(HostName = host_mixed_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='ISCSI', checkStatus='false', lunIds='null'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600
2019-11-03 14:29:09,412+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-56) [7f010914] FINISH, GetDeviceListVDSCommand, return: , log id: 3fbf9b9a
2019-11-03 14:29:09,413+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-56) [7f010914] Query 'GetDeviceListQuery' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
2019-11-03 14:29:09,413+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-56) [7f010914] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.QueriesCommandBase.runVdsCommand(QueriesCommandBase.java:246)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery.executeQueryCommand(GetDeviceListQuery.java:47)


	at org.jboss.weld.core.6.Final-redhat-00002//org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68)
	at deployment.engine.ear//org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source)
	... 233 more

2019-11-03 14:29:09,417+02 ERROR [org.ovirt.engine.core.bll.storage.domain.GetUnregisteredBlockStorageDomainsQuery] (default task-56) [] Failed to retrieve storage domains by connections info: GetDeviceList execution failed. Exception message: BlockDeviceActionError

VDSM (time diff of 7 hours before engine due to 4.4 upgrade) :
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
2019-11-03 07:29:09,392-0500 INFO  (jsonrpc/6) [storage.TaskManager.Task] (Task='6903fe5a-2203-429f-8db1-6c7632319e4a') aborting: Task is aborted: "Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whiteliste
d', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''" - code 100 (task:1200)
2019-11-03 07:29:09,392-0500 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/de
v/dm-1'] failed with rc=1 out=b'' err=b'' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
2019-11-03 07:29:09,393-0500 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getDeviceList failed (error 600) in 2.07 seconds (__init__:312)

Comment 8 Avihai 2019-11-03 12:47:29 UTC
Created attachment 1632129 [details]
import iscsi SD fails logs

Comment 9 Avihai 2019-11-03 12:50:53 UTC
Created attachment 1632130 [details]
video_capture of the ISCSI SD import issue

Comment 10 Vojtech Juranek 2019-11-04 14:35:39 UTC
It was caused by failing scsi_id command which failed due to some multipath misconfiguration. Multipath on SPM contained some weird multipath paths for which scsi_id failed and therefore almost any operation on block SD failed. It was misconfigured only on SPM (storage-ge4-vdsm1), on remaining hosts it was configured correctly. Unfortunately I wasn't able to figure out how it has happened or how it got into this state, but looks like some leftover from previous setup or something like this. After  rebooting the hosts, everything seems to be setup correctly and looks block Sds work.

Here's lsblk and mutipath output before and after reboot (failing path was e.g. 360002ac0000000000000001d00021f6b (/dev/dm-1)):

before reboot:

[root@storage-ge4-vdsm1 ~]# lsblk
NAME                                                                                  MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                                                                     8:0    0   20G  0 disk  
├─sda1                                                                                  8:1    0  700M  0 part  /boot
├─sda2                                                                                  8:2    0    2G  0 part  [SWAP]
└─sda3                                                                                  8:3    0 17,3G  0 part  /
sdb                                                                                     8:16   0  150G  0 disk  
└─360002ac0000000000000001600021f6b                                                   253:5    0  150G  0 mpath 
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-metadata                                 253:35   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-outbox                                   253:42   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-xleases                                  253:44   0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-leases                                   253:48   0    2G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-ids                                      253:52   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-inbox                                    253:56   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-master                                   253:60   0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:73   0  4,4G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-f651423f--2bff--4f44--b8b2--bc1110dd80ca 253:75   0    1G  0 lvm   
  └─45f2708d--4b6d--418d--aaf1--10bea1b94552-8bbbeae5--1279--402e--b805--bb39084acfe3 253:76   0    1G  0 lvm   
sdc                                                                                     8:32   0  150G  0 disk  
└─360002ac0000000000000001700021f6b                                                   253:7    0  150G  0 mpath 
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-metadata                                 253:34   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-outbox                                   253:39   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-xleases                                  253:46   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-leases                                   253:50   0    2G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-ids                                      253:54   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-inbox                                    253:58   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-master                                   253:62   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-c594409a--7500--4052--9659--9d5f90c54fd1 253:66   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-cbbd9595--24ed--472a--b2cc--9d34f64a2d72 253:68   0  128M  0 lvm   
  └─082015b9--b8dd--4477--9d53--666f800b01fc-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:74   0  4,4G  0 lvm   
sdd                                                                                     8:48   0  150G  0 disk  
└─360002ac0000000000000001800021f6b                                                   253:28   0  150G  0 mpath 
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-metadata                                 253:32   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-outbox                                   253:43   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-xleases                                  253:45   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-leases                                   253:49   0    2G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-ids                                      253:53   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-inbox                                    253:57   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-master                                   253:61   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-bb588b8a--26ad--4acf--a166--4a1aa9716557 253:65   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-54a5dfac--86fc--40ee--bffb--84af6728f99f 253:69   0  128M  0 lvm   
  └─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:72   0  4,4G  0 lvm   
sde                                                                                     8:64   0   50G  0 disk  
└─360002ac0000000000000001900021f6b                                                   253:3    0   50G  0 mpath 
sdf                                                                                     8:80   0   50G  0 disk  
└─360002ac0000000000000001a00021f6b                                                   253:6    0   50G  0 mpath 
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-metadata                                 253:38   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-outbox                                   253:40   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-xleases                                  253:47   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-leases                                   253:51   0    2G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-ids                                      253:55   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-inbox                                    253:59   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-master                                   253:63   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-4dfea443--be0e--4760--8671--d3d27a319a16 253:67   0  128M  0 lvm   
  └─a52c5f80--536c--4d3f--a581--682038fd80ed-a93ae40a--78fe--418f--bab1--601079bf1b5c 253:71   0  128M  0 lvm   
sdg                                                                                     8:96   0   50G  0 disk  
└─360002ac0000000000000001b00021f6b                                                   253:29   0   50G  0 mpath 
  └─360002ac0000000000000001b00021f6b1                                                253:31   0   10G  0 part  
sdh                                                                                     8:112  0  150G  0 disk  
└─360002ac0000000000000001600021f6b                                                   253:5    0  150G  0 mpath 
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-metadata                                 253:35   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-outbox                                   253:42   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-xleases                                  253:44   0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-leases                                   253:48   0    2G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-ids                                      253:52   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-inbox                                    253:56   0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-master                                   253:60   0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:73   0  4,4G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-f651423f--2bff--4f44--b8b2--bc1110dd80ca 253:75   0    1G  0 lvm   
  └─45f2708d--4b6d--418d--aaf1--10bea1b94552-8bbbeae5--1279--402e--b805--bb39084acfe3 253:76   0    1G  0 lvm   
sdi                                                                                     8:128  0  150G  0 disk  
└─360002ac0000000000000001700021f6b                                                   253:7    0  150G  0 mpath 
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-metadata                                 253:34   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-outbox                                   253:39   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-xleases                                  253:46   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-leases                                   253:50   0    2G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-ids                                      253:54   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-inbox                                    253:58   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-master                                   253:62   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-c594409a--7500--4052--9659--9d5f90c54fd1 253:66   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-cbbd9595--24ed--472a--b2cc--9d34f64a2d72 253:68   0  128M  0 lvm   
  └─082015b9--b8dd--4477--9d53--666f800b01fc-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:74   0  4,4G  0 lvm   
sdj                                                                                     8:144  0  150G  0 disk  
└─360002ac0000000000000001800021f6b                                                   253:28   0  150G  0 mpath 
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-metadata                                 253:32   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-outbox                                   253:43   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-xleases                                  253:45   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-leases                                   253:49   0    2G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-ids                                      253:53   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-inbox                                    253:57   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-master                                   253:61   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-bb588b8a--26ad--4acf--a166--4a1aa9716557 253:65   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-54a5dfac--86fc--40ee--bffb--84af6728f99f 253:69   0  128M  0 lvm   
  └─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:72   0  4,4G  0 lvm   
sdk                                                                                     8:160  0   50G  0 disk  
└─360002ac0000000000000001900021f6b                                                   253:3    0   50G  0 mpath 
sdl                                                                                     8:176  0   50G  0 disk  
└─360002ac0000000000000001a00021f6b                                                   253:6    0   50G  0 mpath 
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-metadata                                 253:38   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-outbox                                   253:40   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-xleases                                  253:47   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-leases                                   253:51   0    2G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-ids                                      253:55   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-inbox                                    253:59   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-master                                   253:63   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-4dfea443--be0e--4760--8671--d3d27a319a16 253:67   0  128M  0 lvm   
  └─a52c5f80--536c--4d3f--a581--682038fd80ed-a93ae40a--78fe--418f--bab1--601079bf1b5c 253:71   0  128M  0 lvm   
sdm                                                                                     8:192  0   50G  0 disk  
└─360002ac0000000000000001b00021f6b                                                   253:29   0   50G  0 mpath 
  └─360002ac0000000000000001b00021f6b1                                                253:31   0   10G  0 part  
360002ac0000000000000001c00021f6b                                                     253:0    0  150G  0 mpath 
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-metadata                                   253:14   0  128M  0 lvm   
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-outbox                                     253:15   0  128M  0 lvm   
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-xleases                                    253:17   0    1G  0 lvm   
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-leases                                     253:27   0    2G  0 lvm   
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-ids                                        253:30   0  128M  0 lvm   
├─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-inbox                                      253:33   0  128M  0 lvm   
└─9b7c48f6--8810--408c--b7dc--e4f5ca02b52b-master                                     253:36   0    1G  0 lvm   
360002ac0000000000000001d00021f6b                                                     253:1    0  150G  0 mpath 
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-metadata                                   253:9    0  128M  0 lvm   
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-outbox                                     253:11   0  128M  0 lvm   
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-xleases                                    253:12   0    1G  0 lvm   
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-leases                                     253:13   0    2G  0 lvm   
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-ids                                        253:16   0  128M  0 lvm   
├─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-inbox                                      253:24   0  128M  0 lvm   
└─1c42ee09--9fa4--4e7f--9e43--ee6b28e6a356-master                                     253:26   0    1G  0 lvm   
360002ac0000000000000001e00021f6b                                                     253:2    0  150G  0 mpath 
├─e638dd3b--204e--4120--9a65--4592255765f4-metadata                                   253:18   0  128M  0 lvm   
├─e638dd3b--204e--4120--9a65--4592255765f4-outbox                                     253:19   0  128M  0 lvm   
├─e638dd3b--204e--4120--9a65--4592255765f4-xleases                                    253:20   0    1G  0 lvm   
├─e638dd3b--204e--4120--9a65--4592255765f4-leases                                     253:21   0    2G  0 lvm   
├─e638dd3b--204e--4120--9a65--4592255765f4-ids                                        253:22   0  128M  0 lvm   
├─e638dd3b--204e--4120--9a65--4592255765f4-inbox                                      253:23   0  128M  0 lvm   
└─e638dd3b--204e--4120--9a65--4592255765f4-master                                     253:25   0    1G  0 lvm   
360002ac0000000000000002000021f6b                                                     253:4    0  150G  0 mpath 
├─pooltest-pool1_tmeta                                                                253:8    0    4M  0 lvm   
│ └─pooltest-pool1-tpool                                                              253:37   0  500M  0 lvm   
│   └─pooltest-pool1                                                                  253:41   0  500M  0 lvm   
└─pooltest-pool1_tdata                                                                253:10   0  500M  0 lvm   
  └─pooltest-pool1-tpool                                                              253:37   0  500M  0 lvm   
    └─pooltest-pool1                                                                  253:41   0  500M  0 lvm  

[root@storage-ge4-vdsm1 ~]# multipath -ll
Nov 04 08:50:16 | /etc/multipath.conf line 176, invalid keyword: all_devs
Nov 04 08:50:16 | device config in /etc/multipath.conf missing vendor or product parameter
360002ac0000000000000001e00021f6b dm-2 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001800021f6b dm-28 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:2 sdd 8:48  active ready running
  `- 4:0:0:2 sdj 8:144 active ready running
360002ac0000000000000001c00021f6b dm-0 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001600021f6b dm-5 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:0 sdb 8:16  active ready running
  `- 4:0:0:0 sdh 8:112 active ready running
360002ac0000000000000001a00021f6b dm-6 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:4 sdf 8:80  active ready running
  `- 4:0:0:4 sdl 8:176 active ready running
360002ac0000000000000001900021f6b dm-3 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:3 sde 8:64  active ready running
  `- 4:0:0:3 sdk 8:160 active ready running
360002ac0000000000000002000021f6b dm-4 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001d00021f6b dm-1 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001700021f6b dm-7 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:1 sdc 8:32  active ready running
  `- 4:0:0:1 sdi 8:128 active ready running
360002ac0000000000000001b00021f6b dm-29 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:5 sdg 8:96  active ready running
  `- 4:0:0:5 sdm 8:192 active ready running


After reboot:

[root@storage-ge4-vdsm1 ~]# lsblk
NAME                                                                                  MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                                                                     8:0    0   20G  0 disk  
├─sda1                                                                                  8:1    0  700M  0 part  /boot
├─sda2                                                                                  8:2    0    2G  0 part  [SWAP]
└─sda3                                                                                  8:3    0 17,3G  0 part  /
sdb                                                                                     8:16   0  150G  0 disk  
└─360002ac0000000000000001600021f6b                                                   253:0    0  150G  0 mpath 
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-metadata                                 253:1    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-outbox                                   253:2    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-xleases                                  253:3    0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-leases                                   253:4    0    2G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-ids                                      253:5    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-inbox                                    253:6    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-master                                   253:7    0    1G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-4084c3a1--cc5e--48cb--87ea--85ad780579d7 253:8    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-fc32bca8--738e--446b--b78a--b800b2b0c18f 253:9    0  128M  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:10   0  4,4G  0 lvm   
  ├─45f2708d--4b6d--418d--aaf1--10bea1b94552-f651423f--2bff--4f44--b8b2--bc1110dd80ca 253:11   0    1G  0 lvm   
  └─45f2708d--4b6d--418d--aaf1--10bea1b94552-8bbbeae5--1279--402e--b805--bb39084acfe3 253:12   0    1G  0 lvm   
sdc                                                                                     8:32   0  150G  0 disk  
└─360002ac0000000000000001700021f6b                                                   253:13   0  150G  0 mpath 
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-metadata                                 253:21   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-outbox                                   253:23   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-xleases                                  253:27   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-leases                                   253:30   0    2G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-ids                                      253:32   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-inbox                                    253:36   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-master                                   253:38   0    1G  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-c594409a--7500--4052--9659--9d5f90c54fd1 253:41   0  128M  0 lvm   
  ├─082015b9--b8dd--4477--9d53--666f800b01fc-cbbd9595--24ed--472a--b2cc--9d34f64a2d72 253:43   0  128M  0 lvm   
  └─082015b9--b8dd--4477--9d53--666f800b01fc-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:45   0  4,4G  0 lvm   
sdd                                                                                     8:48   0  150G  0 disk  
└─360002ac0000000000000001800021f6b                                                   253:14   0  150G  0 mpath 
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-metadata                                 253:20   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-outbox                                   253:24   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-xleases                                  253:26   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-leases                                   253:29   0    2G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-ids                                      253:33   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-inbox                                    253:39   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-master                                   253:42   0    1G  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-bb588b8a--26ad--4acf--a166--4a1aa9716557 253:44   0  128M  0 lvm   
  ├─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-54a5dfac--86fc--40ee--bffb--84af6728f99f 253:46   0  128M  0 lvm   
  └─788f4bfb--380f--4b98--93c1--dfa967ea4f3a-063935b8--72e6--4edf--b63c--fe2a1f5a3dc7 253:47   0  4,4G  0 lvm   
sde                                                                                     8:64   0   50G  0 disk  
└─360002ac0000000000000001900021f6b                                                   253:16   0   50G  0 mpath 
sdf                                                                                     8:80   0   50G  0 disk  
└─360002ac0000000000000001a00021f6b                                                   253:15   0   50G  0 mpath 
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-metadata                                 253:19   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-outbox                                   253:22   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-xleases                                  253:25   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-leases                                   253:28   0    2G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-ids                                      253:31   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-inbox                                    253:34   0  128M  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-master                                   253:35   0    1G  0 lvm   
  ├─a52c5f80--536c--4d3f--a581--682038fd80ed-4dfea443--be0e--4760--8671--d3d27a319a16 253:37   0  128M  0 lvm   
  └─a52c5f80--536c--4d3f--a581--682038fd80ed-a93ae40a--78fe--418f--bab1--601079bf1b5c 253:40   0  128M  0 lvm   
sdg                                                                                     8:96   0   50G  0 disk  
└─360002ac0000000000000001b00021f6b                                                   253:17   0   50G  0 mpath 
  └─360002ac0000000000000001b00021f6b1                                                253:18   0   10G  0 part 
  
[root@storage-ge4-vdsm1 ~]# multipath -ll
Nov 04 09:00:50 | /etc/multipath.conf line 176, invalid keyword: all_devs
Nov 04 09:00:50 | device config in /etc/multipath.conf missing vendor or product parameter
360002ac0000000000000001800021f6b dm-14 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:2 sdd 8:48 active ready running
360002ac0000000000000001600021f6b dm-0 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:0 sdb 8:16 active ready running
360002ac0000000000000001a00021f6b dm-15 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:4 sdf 8:80 active ready running
360002ac0000000000000001900021f6b dm-16 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:3 sde 8:64 active ready running
360002ac0000000000000001700021f6b dm-13 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:1 sdc 8:32 active ready running
360002ac0000000000000001b00021f6b dm-17 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 3:0:0:5 sdg 8:96 active ready running

Comment 11 Avihai 2019-11-04 18:10:14 UTC
About additional issues, well yes at the same ENV(storage-ge-04) after reboot I see new storage domain is now added but not attached.
Can you please check this out? Logs attached and ENV is at your disposal.(I added this also to the same bug)

Engine:2019-11-04 19:59:31,560+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVGVDSCommand] (default task-109) [21dd610b] FINISH, CreateVGVDSCommand, return: , log id: 55d26f1b
2019-11-04 19:59:31,560+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand] (default task-109) [21dd610b] Command 'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateVGVDS, error = Cannot create Volume Group: "vgname=8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e, devname=['360002ac0000000000000001a00021f6b']", code = 502 (Failed with error VolumeGroupCreateError and code 502)
2019-11-04 19:59:31,564+02 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b] Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.profiles.DiskProfile; snapshot: 970b877a-c274-4e97-8d81-b8bc646bfec0.
2019-11-04 19:59:31,567+02 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b] Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StorageDomainDynamic; snapshot: 8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e.
2019-11-04 19:59:31,569+02 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b] Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StorageDomainStatic; snapshot: 8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e.
2019-11-04 19:59:31,583+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand] (default task-109) [21dd610b] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand'.
2019-11-04 19:59:31,592+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-109) [21dd610b] EVENT_ID: USER_ADD_STORAGE_DOMAIN_FAILED(957), Failed to add Storage Domain sd1. (User: admin@internal-authz)

In VDSM log I see:2019-11-04 12:59:31,539-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH createVG error=devicemapper_removeMapping() missing 1 required positional argument: 'deviceName' from=::ffff:10.35.161.118,42464, flow_id=21dd610
b, task_id=9fd6f044-d26d-4444-a426-f9b23d42620e (api:52)
2019-11-04 12:59:31,540-0500 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='9fd6f044-d26d-4444-a426-f9b23d42620e') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-95>", line 2, in createVG
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2147, in createVG
    force=force)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1143, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 881, in _initpvs
    _initpvs_removeHolders()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 871, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line 163, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line 135, in removeMapping
    return supervdsm.getProxy().devicemapper_removeMapping()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in devicemapper_removeMapping
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
TypeError: devicemapper_removeMapping() missing 1 required positional argument: 'deviceName'

Comment 12 Avihai 2019-11-04 18:11:40 UTC
Created attachment 1632709 [details]
attach SD failed vg logs

Comment 13 Avihai 2019-11-05 07:28:12 UTC
(In reply to Avihai from comment #11)
> About additional issues, well yes at the same ENV(storage-ge-04) after
> reboot I see new storage domain is now added but not attached.
> Can you please check this out? Logs attached and ENV is at your disposal.(I
> added this also to the same bug)
> 
> Engine:2019-11-04 19:59:31,560+02 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVGVDSCommand] (default
> task-109) [21dd610b] FINISH, CreateVGVDSCommand, return: , log id: 55d26f1b
> 2019-11-04 19:59:31,560+02 ERROR
> [org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand]
> (default task-109) [21dd610b] Command
> 'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand'
> failed: EngineException:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException:
> VDSGenericException: VDSErrorException: Failed to CreateVGVDS, error =
> Cannot create Volume Group: "vgname=8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e,
> devname=['360002ac0000000000000001a00021f6b']", code = 502 (Failed with
> error VolumeGroupCreateError and code 502)
> 2019-11-04 19:59:31,564+02 INFO 
> [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b]
> Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating
> NEW_ENTITY_ID of
> org.ovirt.engine.core.common.businessentities.profiles.DiskProfile;
> snapshot: 970b877a-c274-4e97-8d81-b8bc646bfec0.
> 2019-11-04 19:59:31,567+02 INFO 
> [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b]
> Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating
> NEW_ENTITY_ID of
> org.ovirt.engine.core.common.businessentities.StorageDomainDynamic;
> snapshot: 8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e.
> 2019-11-04 19:59:31,569+02 INFO 
> [org.ovirt.engine.core.bll.CommandCompensator] (default task-109) [21dd610b]
> Command [id=6abada9e-ecdf-45bf-b0a7-1170d1941263]: Compensating
> NEW_ENTITY_ID of
> org.ovirt.engine.core.common.businessentities.StorageDomainStatic; snapshot:
> 8976188d-d9c6-4cfe-a87a-1fa4ffb8ec1e.
> 2019-11-04 19:59:31,583+02 ERROR
> [org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand]
> (default task-109) [21dd610b] Transaction rolled-back for command
> 'org.ovirt.engine.core.bll.storage.domain.AddSANStorageDomainCommand'.
> 2019-11-04 19:59:31,592+02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-109) [21dd610b] EVENT_ID: USER_ADD_STORAGE_DOMAIN_FAILED(957),
> Failed to add Storage Domain sd1. (User: admin@internal-authz)
> 
> In VDSM log I see:2019-11-04 12:59:31,539-0500 INFO  (jsonrpc/7) [vdsm.api]
> FINISH createVG error=devicemapper_removeMapping() missing 1 required
> positional argument: 'deviceName' from=::ffff:10.35.161.118,42464,
> flow_id=21dd610
> b, task_id=9fd6f044-d26d-4444-a426-f9b23d42620e (api:52)
> 2019-11-04 12:59:31,540-0500 ERROR (jsonrpc/7) [storage.TaskManager.Task]
> (Task='9fd6f044-d26d-4444-a426-f9b23d42620e') Unexpected error (task:889)
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in
> _run
>     return fn(*args, **kargs)
>   File "<decorator-gen-95>", line 2, in createVG
>   File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in
> method
>     ret = func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2147, in
> createVG
>     force=force)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1143, in
> createVG
>     _initpvs(pvs, metadataSize, force)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 881, in
> _initpvs
>     _initpvs_removeHolders()
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 871, in
> _initpvs_removeHolders
>     os.path.basename(device))
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line
> 163, in removeMappingsHoldingDevice
>     removeMapping(getDevName(holder))
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line
> 135, in removeMapping
>     return supervdsm.getProxy().devicemapper_removeMapping()
>   File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56,
> in __call__
>     return callMethod()
>   File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54,
> in <lambda>
>     **kwargs)
>   File "<string>", line 2, in devicemapper_removeMapping
>   File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
> _callmethod
>     raise convert_to_error(kind, result)
> TypeError: devicemapper_removeMapping() missing 1 required positional
> argument: 'deviceName'

This is a different issue, I opened a new bug 1768735 on this issue.

Comment 14 Avihai 2019-11-05 09:31:48 UTC
I was able to reproduce this bug on another ENV (storage-ge-16) after I removed an existing ISCSI storage domain without format.
Multipath was OK before the test and got messed up due to the test.

You can also look at our 4.4 ENV for more help/debug and try it for yourself:
Engine : http://storage-ge-16.scl.lab.tlv.redhat.com
Host:     storage-ge16-vdsm1.scl.lab.tlv.redhat.com

Test Flow:
1) create/have an existing ISCSI storage domain check multipath => looks good.
2) remove the storage domain without format => OK.
3) Try to recreate the storage domain via webadmin 
The issue (webadmin stuck on loading LUNs) and the same errors occurs in VDSM and engine as seen before.multipath that was OK before the test now looks bad:

multipath Before the test:[root@storage-ge16-vdsm1 ~]# multipath -l
Nov 05 11:00:06 | /etc/multipath.conf line 176, invalid keyword: all_devs
Nov 05 11:00:06 | device config in /etc/multipath.conf missing vendor or product parameter
360002ac0000000000000004600021f6b dm-0 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:0 sdb 8:16  active undef running
  `- 4:0:0:0 sdc 8:32  active undef running
360002ac0000000000000004a00021f6b dm-5 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 4:0:0:4 sdk 8:160 active undef running
  `- 3:0:0:4 sdj 8:144 active undef running
360002ac0000000000000004900021f6b dm-1 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:3 sdi 8:128 active undef running
  `- 4:0:0:3 sdh 8:112 active undef running
360002ac0000000000000004700021f6b dm-2 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:1 sde 8:64  active undef running
  `- 4:0:0:1 sdd 8:48  active undef running
360002ac0000000000000004b00021f6b dm-4 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:5 sdl 8:176 active undef running
  `- 4:0:0:5 sdm 8:192 active undef running
360002ac0000000000000004800021f6b dm-3 3PARdata,VV
size=150G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:2 sdg 8:96  active undef running
  `- 4:0:0:2 sdf 8:80  active undef running 

Multipath after the test:
[root@storage-ge16-vdsm1 ~]# multipath -l
Nov 05 11:16:09 | /etc/multipath.conf line 176, invalid keyword: all_devs
Nov 05 11:16:09 | device config in /etc/multipath.conf missing vendor or product parameter
360002ac0000000000000004900021f6b dm-1 ##,##
size=50G features='0' hwhandler='1 alua' wp=rw

[root@storage-ge16-vdsm1 ~]# lsblk 
NAME                                                MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                                   8:0    0   20G  0 disk  
├─sda1                                                8:1    0  700M  0 part  /boot
├─sda2                                                8:2    0    2G  0 part  [SWAP]
└─sda3                                                8:3    0 17.3G  0 part  /
sr0                                                  11:0    1 1024M  0 rom   
360002ac0000000000000004900021f6b                   253:1    0   50G  0 mpath 
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-metadata 253:6    0  128M  0 lvm   
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-ids      253:7    0  128M  0 lvm   
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-inbox    253:8    0  128M  0 lvm   
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-outbox   253:9    0  128M  0 lvm   
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-leases   253:10   0    2G  0 lvm   
├─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-xleases  253:11   0    1G  0 lvm   
└─b912ec28--e8dd--4fc7--b353--e9b07f8b1d5b-master   253:12   0    1G  0 lvm   



2019-11-05 11:15:23,613+0200 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/de
v/dm-1'] failed with rc=1 out=b'' err=b'' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''
2019-11-05 11:15:23,613+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getDeviceList failed (error 600) in 0.78 seconds (__init__:312)
2019-11-05 11:15:24,996+0200 INFO  (jsonrpc/0) [api.host] START getAllVmStats() from=::ffff:10.35.163.251,53772 (api:48)

supervdsm log:
MainProcess|jsonrpc/3::ERROR::2019-11-05 04:15:23,609::supervdsm_server::96::SuperVdsm.ServerCallback::(wrapper) Error in multipath_get_scsi_serial
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 94, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_api/multipath.py", line 34, in multipath_get_scsi_serial
    return multipath.get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 211, in get_scsi_serial
    out = commands.run(cmd)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-1'] failed with rc=1 out=b'' err=b''

Comment 15 Avihai 2019-11-05 09:33:40 UTC
Created attachment 1632886 [details]
engine and vdsm logs remove sd that recreate it scenario

Comment 16 Avihai 2019-11-06 14:23:53 UTC
Also adding journalctl.log to the attachment of the relevant host(storage-ge16-vdsm1.scl.lab.tlv.redhat.com) which the issue occurred on.

Comment 17 Avihai 2019-11-06 14:25:05 UTC
Created attachment 1633339 [details]
journaltctl log of the host which the issue occured on

Comment 18 Ben Marzinski 2019-11-06 20:39:15 UTC
Just to get this straight, the issue is that after removing the storage, there are still multipath devices around, which look like this

360002ac0000000000000004900021f6b dm-1 ##,##
size=50G features='0' hwhandler='1 alua' wp=rw

correct?

This is a multipath device that simply has no paths.  Looking at the device before the scsi devices were removed shows

360002ac0000000000000004900021f6b dm-1 3PARdata,VV
size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  |- 3:0:0:3 sdi 8:128 active undef running
  `- 4:0:0:3 sdh 8:112 active undef running

and sdi and sdh clearly no longer exist in the lsblk output.  When all paths are lost to a multipath device, multipathd will try to remove the device. However, if it is currently open, this will fail.  multipathd will only try to remove the multipath device once. You can add

deferred_remove yes

to /etc/multipath.conf to tell multipathd to schedule a deferred remove of the device when all paths are lost. Once a deferred remove is scheduled, the multipath device will not be able to be opened, and when the last opener closes the device, it will be removed. If a path returns before the last opener closes the device, the deferred remove will be cancelled, and the multipath device will not be deleted. Another parameter you can set to help avoid this issue is

flush_on_last_del yes

This disables queue_if_no_path when the last path to the device is removed.  This will cause all outstanding and future IO to error out, instead of queueing.  This will allow current openers to close the device, instead of hanging, waiting on the queued IO. It should be noted that both of these options can be problematic if all paths to your multipath device disappear unexpectedly, while the device is still in use.  Setting deferred_remove will cause new opens of the device to fail, which might be a problem in some scenarios. Setting flush_on_last_del will cause IO errors to occur, instead of having the IO be queued until a path is restored.

Ideally, if you know you are removing the paths, you should close the multipath devices first.

Also, setting a device configuration with "all_devs" does not work in RHEL8. Instead, there is a new top level section of /etc/multipath.conf called "overrides". You can set parameters in this to override all of the device configurations, just like the "all_devs" special device configuration did in RHEL7.

Comment 19 Vojtech Juranek 2019-11-07 22:57:51 UTC
Thanks Ben for your input. Based on it (multipath device not being removed when it's open and operation is not retried), I was able to reproduce locally and find out why vdsm fails now (which does't mean I have a good fix for it, but at least can move back to old vdsm behaviour).

Comment 20 Avihai 2019-11-14 10:06:51 UTC
(In reply to Ben Marzinski from comment #18)
> Just to get this straight, the issue is that after removing the storage,
> there are still multipath devices around, which look like this
> 
> 360002ac0000000000000004900021f6b dm-1 ##,##
> size=50G features='0' hwhandler='1 alua' wp=rw
> 
> correct?
This would be the easy answer but unfortunately NO. 
The issue is that we did not change the path's/switches/storage-servers in any way.
The only change that was done was to upgrade RHV hosts from RHEL7.7-> RHEL8.1.

Before doing the create/remove/detach storage domain operation we see ALL LUNs and their path are active.
During create/remove/detach storage domain operation for some reason some of the LUNS(for some reason it's usually /dev/dm-1 on several ENVS(!)) do not have multipaths anymore.

The question is what is causing these besids actual switch/storage issues?
We are seeing this issue in multiple ENV with different switches and storages so the chance this is an environment issue is low.
This started to occur on multiple ENV's after RHV hosts changed from RHEL7.7 -> RHEL8.1.


> This is a multipath device that simply has no paths.  Looking at the device
> before the scsi devices were removed shows
> 
> 360002ac0000000000000004900021f6b dm-1 3PARdata,VV
> size=50G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=0 status=active
>   |- 3:0:0:3 sdi 8:128 active undef running
>   `- 4:0:0:3 sdh 8:112 active undef running
> 
> and sdi and sdh clearly no longer exist in the lsblk output.  When all paths
> are lost to a multipath device, multipathd will try to remove the device.
> However, if it is currently open, this will fail.  multipathd will only try
> to remove the multipath device once. You can add
> 
> deferred_remove yes
> 
> to /etc/multipath.conf to tell multipathd to schedule a deferred remove of
> the device when all paths are lost. Once a deferred remove is scheduled, the
> multipath device will not be able to be opened, and when the last opener
> closes the device, it will be removed. If a path returns before the last
> opener closes the device, the deferred remove will be cancelled, and the
> multipath device will not be deleted. Another parameter you can set to help
> avoid this issue is
> 
> flush_on_last_del yes
> 
> This disables queue_if_no_path when the last path to the device is removed. 
> This will cause all outstanding and future IO to error out, instead of
> queueing.  This will allow current openers to close the device, instead of
> hanging, waiting on the queued IO. It should be noted that both of these
> options can be problematic if all paths to your multipath device disappear
> unexpectedly, while the device is still in use.  Setting deferred_remove
> will cause new opens of the device to fail, which might be a problem in some
> scenarios. Setting flush_on_last_del will cause IO errors to occur, instead
> of having the IO be queued until a path is restored.
> 
> Ideally, if you know you are removing the paths, you should close the
> multipath devices first.
Again, we did not remove any paths, something in RHEL8.1 cause paths to be removed.
I have no idea why, maybe someone can investigate deeper?

> Also, setting a device configuration with "all_devs" does not work in RHEL8.
> Instead, there is a new top level section of /etc/multipath.conf called
> "overrides". You can set parameters in this to override all of the device
> configurations, just like the "all_devs" special device configuration did in
> RHEL7.
Maybe a change is multipath.config can help in this case?
If so we should change it to be the default in a form of another patch.

Comment 21 Ben Marzinski 2019-11-14 17:28:48 UTC
(In reply to Avihai from comment #20)
> This would be the easy answer but unfortunately NO. 
> The issue is that we did not change the path's/switches/storage-servers in
> any way.
> The only change that was done was to upgrade RHV hosts from RHEL7.7->
> RHEL8.1.
> 
> Before doing the create/remove/detach storage domain operation we see ALL
> LUNs and their path are active.
> During create/remove/detach storage domain operation for some reason some of
> the LUNS(for some reason it's usually /dev/dm-1 on several ENVS(!)) do not
> have multipaths anymore.
> 
> The question is what is causing these besids actual switch/storage issues?
> We are seeing this issue in multiple ENV with different switches and
> storages so the chance this is an environment issue is low.
> This started to occur on multiple ENV's after RHV hosts changed from RHEL7.7
> -> RHEL8.1.

Is the multipath config the same between RHEL-7.7 and RHEL-8.1?

Would it be possible to run

# multipathd show config
# multipath -ll
# date

before doing the create/remove/detach storage domain operation, and

# date
# multipath -ll

afterwards, and send me the output of these commands, along with the log messages from the time between the two "date" commands?

Comment 22 Avihai 2019-11-24 12:52:48 UTC
(In reply to Ben Marzinski from comment #21)
> (In reply to Avihai from comment #20)
> > This would be the easy answer but unfortunately NO. 
> > The issue is that we did not change the path's/switches/storage-servers in
> > any way.
> > The only change that was done was to upgrade RHV hosts from RHEL7.7->
> > RHEL8.1.
> > 
> > Before doing the create/remove/detach storage domain operation we see ALL
> > LUNs and their path are active.
> > During create/remove/detach storage domain operation for some reason some of
> > the LUNS(for some reason it's usually /dev/dm-1 on several ENVS(!)) do not
> > have multipaths anymore.
> > 
> > The question is what is causing these besids actual switch/storage issues?
> > We are seeing this issue in multiple ENV with different switches and
> > storages so the chance this is an environment issue is low.
> > This started to occur on multiple ENV's after RHV hosts changed from RHEL7.7
> > -> RHEL8.1.
> 
> Is the multipath config the same between RHEL-7.7 and RHEL-8.1?
Yes it looks the same.

> Would it be possible to run
> 
> # multipathd show config
> # multipath -ll
> # date
> 
> before doing the create/remove/detach storage domain operation, and
> 
> # date
> # multipath -ll
> 
> afterwards, and send me the output of these commands, along with the log
> messages from the time between the two "date" commands?

I think I already added this in https://bugzilla.redhat.com/show_bug.cgi?id=1766595#c14 .

Also Ilan saw the same issue on his ENV and reboot did not help to solve it.
Ilan did you change something at the storage side?

Comment 23 Ilan Zuckerman 2019-11-24 13:27:55 UTC
Same issue reproduced after destroying The SD, and trying to recreate it with new lun. Rebooting the host didnt help. the lun ID which was created on netapp side is 38304437415d4b6a59684363

The following info is after the reboot:
http://pastebin.test.redhat.com/816590

Now, doing the create new domain.

The following info is after the failed attempt to create the domain:
http://pastebin.test.redhat.com/816591
Attaching messages of the host and engine log.

Comment 24 Ilan Zuckerman 2019-11-24 13:28:36 UTC
Created attachment 1639214 [details]
messages from the host

Comment 25 Ilan Zuckerman 2019-11-24 13:31:56 UTC
Created attachment 1639215 [details]
messages from the host

Comment 26 Ilan Zuckerman 2019-11-24 13:32:55 UTC
Created attachment 1639216 [details]
engine log with failed attempt to creade sd

Comment 27 Ilan Zuckerman 2019-11-24 16:17:58 UTC
here is the actual error on vdsm log

2019-11-24 15:55:22,806+0000 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='48ef87cc-2866-44e9-8275-0aab8d8a105c') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-2'] failed with rc=1 out=b'' err=b''
2019-11-24 15:55:22,806+0000 INFO  (jsonrpc/1) [storage.TaskManager.Task] (Task='48ef87cc-2866-44e9-8275-0aab8d8a105c') aborting: Task is aborted: "value=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-2'] failed with rc=1 out=b'' err=b'' abortedcode=100" (task:1200)
2019-11-24 15:55:22,806+0000 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-2'] failed with rc=1 out=b'' err=b'' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-91>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1988, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2006, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 263, in pathListIter
    "serial": get_scsi_serial(dmId),
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 202, in get_scsi_serial
    return supervdsm.getProxy().multipath_get_scsi_serial(physdev)
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in multipath_get_scsi_serial
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
vdsm.common.cmdutils.Error: Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-2'] failed with rc=1 out=b'' err=b''

Comment 28 Ben Marzinski 2019-11-26 00:13:27 UTC
Wait, is the /dev/dm-2 from

2019-11-24 15:55:22,806+0000 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH getDeviceList error=Command ['/usr/lib/udev/scsi_id', '--page=0x80', '--whitelisted', '--export', '--replace-whitespace', '--device=/dev/dm-2'] failed with rc=1 out=b'' err=b'' (dispatcher:87)

in Comment #27 the same device as the dm-2 from

eui.325836304b5047900025385800000001 dm-2 NVME,SAMSUNG MZ1LW960HMJP-00003              
size=894G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:2:1:1  nvme0n1 259:0 active ready running

in http://pastebin.test.redhat.com/816591 from Comment 23?  Because if it is, that is a multipath device that is made up of nvme devices. There is no way that scsi_id would ever work on that, since the multipath device isn't made up of scsi devices.

Comment 29 Avihai 2019-12-19 07:46:21 UTC
(In reply to Ilan Zuckerman from comment #26)
> Created attachment 1639216 [details]
> engine log with failed attempt to creade sd

(In reply to Ilan Zuckerman from comment #23)
> Same issue reproduced after destroying The SD, and trying to recreate it
> with new lun. Rebooting the host didnt help. the lun ID which was created on
> netapp side is 38304437415d4b6a59684363
> 
> The following info is after the reboot:
> http://pastebin.test.redhat.com/816590
> 
> Now, doing the create new domain.
> 
> The following info is after the failed attempt to create the domain:
> http://pastebin.test.redhat.com/816591
> Attaching messages of the host and engine log.

Ilan, we are not seeing this issue anymore at our ENV's(clean reprovisioned 4.4 via jenkins jobs)
The original Upgrade scenario (4.3 ->4.4) is blocked due to bug 1784010 and can not be tested till its fixed.
 
As you also encountered this issue at your env, can you please see if this still occurs in latest rhv(4.4.0.9)

Comment 30 Ilan Zuckerman 2019-12-29 13:36:07 UTC
(In reply to Avihai from comment #29)
> (In reply to Ilan Zuckerman from comment #26)
> > Created attachment 1639216 [details]
> > engine log with failed attempt to creade sd
> 
> (In reply to Ilan Zuckerman from comment #23)
> > Same issue reproduced after destroying The SD, and trying to recreate it
> > with new lun. Rebooting the host didnt help. the lun ID which was created on
> > netapp side is 38304437415d4b6a59684363
> > 
> > The following info is after the reboot:
> > http://pastebin.test.redhat.com/816590
> > 
> > Now, doing the create new domain.
> > 
> > The following info is after the failed attempt to create the domain:
> > http://pastebin.test.redhat.com/816591
> > Attaching messages of the host and engine log.
> 
> Ilan, we are not seeing this issue anymore at our ENV's(clean reprovisioned
> 4.4 via jenkins jobs)
> The original Upgrade scenario (4.3 ->4.4) is blocked due to bug 1784010 and
> can not be tested till its fixed.
>  
> As you also encountered this issue at your env, can you please see if this
> still occurs in latest rhv(4.4.0.9)

After upgrading to rhv-release-4.4.0-13-999.noarch i wasnt able to reproduce the issue.

Comment 31 Avihai 2020-01-01 13:49:42 UTC
Reopening as the issue was seen again in rhv-4.4.0-13 only in host clean reprovision scenario meaning:

1) Install RHEL8.1 OS from scratch on host
2) Install vdsm on host

In this stage:
- No prior ISCSI storage domain exists in the system/engine.
- No multipath's/ISCSI sessions exist in the host.

3) Via webadmin, create new ISCSI storage domain from this host.

The same issue is seen.

Observations:
1) Only a Host does not have prior any ISCSI connections before this initial ISCSI connection(fresh after host installation )
2) Host that has prior ISCSI connections will not encounter this issue.
3) Host that manually do discovery+ ISCSI sessions logging will not encounter this issue.


I noticed that if do the following the issue does not occur anymore:
1) Connect to the problematic host via ssh
2) Discovery from the host to storage manually and logging to the target

VDSM(-7 Hours before engine in timezone):
2020-01-01 08:09:23,495-0500 INFO  (jsonrpc/1) [vdsm.api] FINISH getDeviceList error=not enough values to unpack (expected 2, got 1) from=::ffff:10.35.161.127,35954, flow_id=3b4deedb-5950-403f-9269-a43e82317e90, task_id=80924ba3-34a8-46b4-ad9c-1098b469217d (api:52)
2020-01-01 08:09:23,496-0500 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='80924ba3-34a8-46b4-ad9c-1098b469217d') Unexpected error (task:874)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 881, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-89>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1971, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1989, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 259, in pathListIter
    pathStatuses = devicemapper.getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line 170, in getPathsStatus
    return supervdsm.getProxy().devicemapper_getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in devicemapper_getPathsStatus
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
ValueError: not enough values to unpack (expected 2, got 1)

2020-01-01 08:09:23,498-0500 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH getDeviceList error=not enough values to unpack (expected 2, got 1) (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1192, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 881, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-89>", line 2, in getDeviceList
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1971, in getDeviceList
    checkStatus=checkStatus)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1989, in _getDeviceList
    for dev in multipath.pathListIter(guids):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 259, in pathListIter
    pathStatuses = devicemapper.getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/devicemapper.py", line 170, in getPathsStatus
    return supervdsm.getProxy().devicemapper_getPathsStatus()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in devicemapper_getPathsStatus
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
    raise convert_to_error(kind, result)
ValueError: not enough values to unpack (expected 2, got 1)
2020-01-01 08:09:23,499-0500 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getDeviceList failed (error 600) in 0.72 seconds (__init__:312)


Engine:
2020-01-01 15:09:23,522+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-48) [3b4deedb-5950-403f-9269-a43e82317e90] Command 'GetDeviceListVDSCommand(HostName = host_mixed_1, GetDeviceListVDSCommandParameters:{hostId='5e09c783-72a5-4fde-896e-8d52a7ac80b2', storageType='ISCSI', checkStatus='false', lunIds='null'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600
2020-01-01 15:09:23,522+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (default task-48) [3b4deedb-5950-403f-9269-a43e82317e90] FINISH, GetDeviceListVDSCommand, return: , log id: 7d33e1ff
2020-01-01 15:09:23,522+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-48) [3b4deedb-5950-403f-9269-a43e82317e90] Query 'GetDeviceListQuery' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
2020-01-01 15:09:23,522+02 ERROR [org.ovirt.engine.core.bll.storage.disk.lun.GetDeviceListQuery] (default task-48) [3b4deedb-5950-403f-9269-a43e82317e90] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)

Comment 32 Avihai 2020-01-01 13:59:41 UTC
Created attachment 1649061 [details]
engine and vdsm supervdsm logs of the 4.4.0-13 reproduction

Comment 33 Nir Soffer 2020-01-05 00:26:51 UTC
Avihay, I cannot reproduce the issue reported here with rhel 8.1,
which is the version rhv 4.4. should be tested with.

Maybe you test with newer some rhel beta/alpha?

Can you attach more data to the bug?

cat /etc/redhat-release
rpm -q device-mapper-multipath

Comment 34 Avihai 2020-01-05 09:50:20 UTC
(In reply to Nir Soffer from comment #33)
> Avihay, I cannot reproduce the issue reported here with rhel 8.1,
> which is the version rhv 4.4. should be tested with.
> 
> Maybe you test with newer some rhel beta/alpha?

I am working with official RHEL8.1:

>cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.1 (Ootpa)

>[root@storage-ge6-vdsm3 ~]# uname -a
Linux storage-ge6-vdsm3.scl.lab.tlv.redhat.com 4.18.0-147.3.1.el8_1.x86_64 #1 SMP Wed Nov 27 01:11:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


> Can you attach more data to the bug?
> 
> cat /etc/redhat-release
> rpm -q device-mapper-multipath

The issue was reproduced and fixed by Vojtech Juranek so do not waist your time on this.
I added you to the mail thread with all the details.

Comment 35 Ilan Zuckerman 2020-03-26 08:42:04 UTC
Verified according those steps:

1) Install RHEL8.1 OS from scratch on host

2) Install vdsm on host

At this stage:

- No prior ISCSI storage domain exists in the system/engine.

- No multipath's/ISCSI sessions exist in the host.

3) Via webadmin, create new ISCSI storage domain from this host.


Actual result:

No errors on engine / vdsm log.
Luns were discovered on the UI without significant delays.
ISCSI domain was added successfully.

Comment 36 Sandro Bonazzola 2020-05-20 20:02:24 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.