Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1812264

Summary: RHEL8 - unnecessary ISCSI login attempt to the same target fail with "failed to setup iscsi subsystem" and cause host activation to fail leaving the host in Non Operational state
Product: Red Hat Enterprise Linux 8 Reporter: Shir Fishbain <sfishbai>
Component: iscsi-initiator-utilsAssignee: Chris Leech <cleech>
Status: CLOSED NOTABUG QA Contact: Filip Suba <fsuba>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: aefrat, bzlotnik, cleech, coli, fsuba, jinzhao, juzhang, loberman, lsvaty, michal.skrivanek, mtessun, nsoffer, rhandlin, virt-maint
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-23 13:08:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1788631    
Attachments:
Description Flags
Logs
none
journal_log
none
2 different envs with different storages logs
none
New_logs_4.4.0-0.29
none
new_logs_Netapp
none
journalctl_log_with_Netapp
none
Screenshots
none
New_Logs none

Description Shir Fishbain 2020-03-10 21:13:53 UTC
Description of problem:
1) The host is trying to login to ISCSI target via REST API it fails with the following errors:

Engine:
2020-01-07 17:54:03,399+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-144) [hosts_syncAction_70177856-58c7-4124] The connection with details '00000000-0000-0000-0000-0
00000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2020-01-07 17:54:03,411+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-144) [hosts_syncAction_70177856-58c7-4124] Transaction rolled-back for command 'org.ovirt
.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'.
2020-01-07 17:54:03,418+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-144) [] Operation Failed: [failed to setup iscsi subsystem]


VDSM(-7H before):
2020-01-07 10:54:00,439-0500 INFO  (jsonrpc/3) [vdsm.api] START connectStorageServer(domType=3, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'password': '********', 'port': '3260', 'iqn': 'iqn.2000-
05.com.3pardata:20210002ac021f6b', 'connection': '3par-iscsi-1.scl.lab.tlv.redhat.com', 'id': '00000000-0000-0000-0000-000000000000', 'user': '', 'tpgt': '1'}], options=None) from=::ffff:10.35.162.6,42990, flow
_id=hosts_syncAction_70177856-58c7-4124, task_id=aa06fce6-8e82-4ab8-9dd9-4059487fc416 (api:48)
2020-01-07 10:54:02,755-0500 ERROR (jsonrpc/3) [storage.HSM] Could not connect to storageServer (hsm:2418)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 326, in node_login
    "-p", portal, "-l"])
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 129, in run_cmd
    out = commands.run(cmd, sudo=True)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/sbin/iscsiadm', '-m', 'node', '-T', 'iqn.2000-05.com.3pardata:20210002ac021f6b', '-I', 'default', '-p', '3par-iscsi-1.scl.lab.tlv.redhat.com:3260,1', '-l'] failed with rc=15 out=b'Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogin to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.\n' err=b'iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260].\niscsiadm: initiator reported error (15 - session exists)\niscsiadm: Could not log into all portals\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2415, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsi.py", line 219, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 334, in node_login
    raise IscsiNodeError(e.rc, e.out, e.err)
vdsm.storage.iscsiadm.IscsiNodeError: (15, b'Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogin to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.\n', b'iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260].\niscsiadm: initiator reported error (15 - session exists)\niscsiadm: Could not log into all portals\n')
2020-01-07 10:54:02,756-0500 INFO  (jsonrpc/3) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)

2020-01-07 10:54:03,337-0500 ERROR (jsonrpc/3) [storage.Multipath] Could not resize device 360002ac0000000000000003d00021f6b (multipath:111)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 109, in resize_devices
    _resize_if_needed(guid)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 117, in _resize_if_needed
    for slave in devicemapper.getSlaves(name)]
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 117, in <listcomp>
    for slave in devicemapper.getSlaves(name)]
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 187, in getDeviceSize
    bs, phyBs = getDeviceBlockSizes(devName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 179, in getDeviceBlockSizes
    QUEUE, "logical_block_size"))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 198, in read_int
    with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/block/sdb/queue/logical_block_size'

2) 2.1) Host was in maintenance-> manually ran ISCSI logout all targets + ISCSI login as described in preioves comment.
2.2) Activate host -> host fails to activate with the following errors in VDSM and indeed all multipath connections are gone:

Engine log( +7 hours after host/VDSM timediff):
2020-02-23 16:55:09,115+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [b2ea4c7] EVENT_ID: STORAGE_DOMAI
N_ERROR(996), The error message for connection 3par-iscsi-1.scl.lab.tlv.redhat.com iqn.2000-05.com.3pardata:20210002ac021f6b (LUN 360002ac0000000000000001700021f6b) returned by VDSM was: Failed to setup iSCSI s
ubsystem
2020-02-23 16:55:09,116+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [b2ea4c7] The connection with details 
'3par-iscsi-1.scl.lab.tlv.redhat.com iqn.2000-05.com.3pardata:20210002ac021f6b (LUN 360002ac0000000000000001700021f6b)' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2020-02-23 16:55:09,125+02 INFO  [org.ovirt.engine.core.bll.storage.pool.ConnectHostToStoragePoolServersCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [b2ea4c7] Host 'host_mix
ed_2' storage connection was failed 
2020-02-23 16:55:09,133+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [b2ea4c7] EVENT_ID: CONNECT_STORA
GE_SERVERS_FAILED(994), Failed to connect Host host_mixed_2 to Storage Servers
2020-02-23 16:55:09,180+02 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-63) [5fbbb126] Running command: SetNonOperationalVds
Command internal: true. Entities affected :  ID: 2eeac6c2-9120-4681-acd1-50b99a8d9a4d Type: VDS

VDSM log ERRORS( -7 hours before engine time timediff):
2020-02-23 09:55:06,178-0500 DEBUG (jsonrpc/3) [storage.TaskManager.Task] (Task='752506cc-8faa-42e9-bd6a-c14fe922825d') moving from state preparing -> state preparing (task:618)
2020-02-23 09:55:06,178-0500 INFO  (jsonrpc/3) [vdsm.api] START connectStorageServer(domType=3, spUUID='5b2e106b-0408-4b79-b012-b9dcd3c71334', conList=[{'password': '********', 'port': '3260', 'iqn': 'iqn.2000-
05.com.3pardata:20210002ac021f6b', 'connection': '3par-iscsi-1.scl.lab.tlv.redhat.com', 'ipv6_enabled': 'false', 'id': '51d3ffa7-0c32-4c32-997d-7acffde25fe5', 'user': '', 'tpgt': '21'}], options=None) from=::ff
ff:10.35.161.118,53708, flow_id=b2ea4c7, task_id=752506cc-8faa-42e9-bd6a-c14fe922825d (api:48)
2020-02-23 09:55:06,398-0500 DEBUG (jsonrpc/3) [storage.ISCSI] iface.net_ifacename not provided, skipping rp filter setup (iscsi:579)
2020-02-23 09:55:08,540-0500 ERROR (jsonrpc/3) [storage.HSM] Could not connect to storageServer (hsm:2421)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 326, in node_login
    "-p", portal, "-l"])
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 129, in run_cmd
    out = commands.run(cmd, sudo=True)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/sbin/iscsiadm', '-m', 'node', '-T', 'iqn.2000-05.com.3pardata:20210002ac021f6b', '-I', 'default', '-p', '3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21', '-l'] failed with rc
=15 out=b'Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogging in to [iface: default, target: iqn.2000-05.com.3pardata:202
10002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogin to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.\n' er
r=b'iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260].\niscsiadm: initiator reported error (15 - session exists)\
niscsiadm: Could not log into all portals\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsi.py", line 219, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 334, in node_login
    raise IscsiNodeError(e.rc, e.out, e.err)
vdsm.storage.iscsiadm.IscsiNodeError: (15, b'Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogging in to [iface: default, t
arget: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]\nLogin to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv
.redhat.com,3260] successful.\n', b'iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260].\niscsiadm: initiator repor
ted error (15 - session exists)\niscsiadm: Could not log into all portals\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 301, in node_delete
    "-p", portal, "--op=delete"])
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 129, in run_cmd
    out = commands.run(cmd, sudo=True)
  File "/usr/lib/python3.6/site-packages/vdsm/common/commands.py", line 101, in run
    raise cmdutils.Error(args, p.returncode, out, err)
vdsm.common.cmdutils.Error: Command ['/sbin/iscsiadm', '-m', 'node', '-T', 'iqn.2000-05.com.3pardata:20210002ac021f6b', '-I', 'default', '-p', '3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21', '--op=delete'] faile
d with rc=6 out=b'' err=b'iscsiadm: Could not stat /var/lib/iscsi/nodes/iqn.2000-05.com.3pardata:20210002ac021f6b/3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1/default to delete node: No such file or directory\nis
csiadm: Could not execute operation on all records: encountered iSCSI database failure\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2418, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsi.py", line 224, in addIscsiNode
    removeIscsiNode(iface, target)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsi.py", line 238, in removeIscsiNode
    iscsiadm.node_delete(iface.name, target.address, target.iqn)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/iscsiadm.py", line 306, in node_delete
    raise IscsiNodeError(e.rc, e.out, e.err)
vdsm.storage.iscsiadm.IscsiNodeError: (6, b'', b'iscsiadm: Could not stat /var/lib/iscsi/nodes/iqn.2000-05.com.3pardata:20210002ac021f6b/3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1/default to delete node: No such file or directory\niscsiadm: Could not execute operation on all records: encountered iSCSI database failure\n')

2020-02-23 09:55:09,057-0500 ERROR (jsonrpc/3) [storage.Multipath] Could not resize device 360002ac0000000000000001600021f6b (multipath:111)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 109, in resize_devices
    _resize_if_needed(guid)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 117, in _resize_if_needed
    for slave in devicemapper.getSlaves(name)]
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 117, in <listcomp>
    for slave in devicemapper.getSlaves(name)]
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 187, in getDeviceSize
    bs, phyBs = getDeviceBlockSizes(devName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 179, in getDeviceBlockSizes
    QUEUE, "logical_block_size"))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/multipath.py", line 198, in read_int
    with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/block/sdb/queue/logical_block_size'

NO ISCSI sessions:
[root@storage-ge4-vdsm2 ~]# iscsiadm -m session
iscsiadm: No active sessions.

# multipath is dead on this host alone (all other hosts in the same cluster and storage is up and connected to the targets):
[root@storage-ge4-vdsm2 ~]# multipath -l
360002ac0000000000000001800021f6b dm-18 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001600021f6b dm-0 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001a00021f6b dm-21 ##,##
size=50G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001900021f6b dm-19 ##,##
size=50G features='0' hwhandler='1 alua' wp=rw
360002ac0000000000000001700021f6b dm-17 ##,##
size=150G features='0' hwhandler='1 alua' wp=rw


messages log:
Feb 23 09:55:07 storage-ge4-vdsm2 systemd[1]: Started Session c20016 of user root.
Feb 23 09:55:07 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: load table [0 314572800 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 8:16 1]
Feb 23 09:55:07 storage-ge4-vdsm2 multipathd[596]: sdb [8:16]: path added to devmap 360002ac0000000000000001600021f6b
Feb 23 09:55:08 storage-ge4-vdsm2 systemd-udevd[11947]: inotify_add_watch(11, /dev/sdg1, 10) failed: No such file or directory
Feb 23 09:55:08 storage-ge4-vdsm2 systemd[1]: Started Session c20017 of user root.
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: scsi 3:0:0:1: alua: Detached
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: scsi 3:0:0:2: alua: Detached
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: scsi 3:0:0:3: alua: Detached
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: scsi 3:0:0:4: alua: Detached
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: scsi 3:0:0:5: alua: Detached
Feb 23 09:55:08 storage-ge4-vdsm2 iscsid[1041]: iscsid: Connection837:0 to [target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] through [iface: default] is shutdown.
Feb 23 09:55:08 storage-ge4-vdsm2 systemd[1]: Started Session c20018 of user root.
Feb 23 09:55:08 storage-ge4-vdsm2 systemd[1]: Started Session c20019 of user root.
Feb 23 09:55:08 storage-ge4-vdsm2 systemd[1]: Started Session c20020 of user root.
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 8:16: cannot find block device
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: checker failed path 8:16 in map 360002ac0000000000000001600021f6b
Feb 23 09:55:08 storage-ge4-vdsm2 kernel: device-mapper: multipath: Failing path 8:16.
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: Entering recovery mode: max_retries=4
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: remaining active paths: 0
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 8:16: cannot find block device
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b Last path deleted, disabling queueing
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: map in use
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: can't flush
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: queueing disabled
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: load table [0 314572800 multipath 0 1 alua 0 0]
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdb [8:16]: path removed from map 360002ac0000000000000001600021f6b
Feb 23 09:55:09 storage-ge4-vdsm2 kernel: scsi 3:0:0:0: alua: Detached
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdc: path already removed
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdd: path already removed
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sde: path already removed
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdf: path already removed
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdg: path already removed


Version-Release number of selected component (if applicable):
engine 4.4.0-0.13.master.el7
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
libvirt-client-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64
qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4.x86_64
host OS is Red Hat Enterprise Linux release 8.1 (Ootpa)
Kernel 4.18.0-147.3.1.el8_1.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create 4.2 DC (with compatibility level 4.2)
2. Create 4.2 cluster 
3. Add host to this cluster
4. Discovery via RESTAPI to the storage domain works.
Discovery REST request:
2020-01-07 17:53:59,702 - MainThread - hosts - DEBUG - Action request content is --  url:/ovirt-engine/api/hosts/a5e32c47-498b-4d24-92a1-9a8bf332a03e/iscsidiscover body:<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <iscsi>
        <address>3par-iscsi-1.scl.lab.tlv.redhat.com</address>
    </iscsi>
</action>

Replay:
2020-01-07 17:54:00,228 - MainThread - hosts - DEBUG - Response body for action request is: 
<action>
    <async>false</async>
    <discovered_targets>
        <iscsi_details>
            <address>10.35.146.1</address>
            <port>3260</port>
            <portal>10.35.146.1:3260,21</portal>
            <target>iqn.2000-05.com.3pardata:20210002ac021f6b</target>
        </iscsi_details>
        <iscsi_details>
            <address>10.35.146.2</address>
            <port>3260</port>
            <portal>10.35.146.2:3260,121</portal>
            <target>iqn.2000-05.com.3pardata:21210002ac021f6b</target>
        </iscsi_details>
    </discovered_targets>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <iscsi>
        <address>3par-iscsi-1.scl.lab.tlv.redhat.com</address>
    </iscsi>
    <iscsi_targets>
        <iscsi_target>iqn.2000-05.com.3pardata:20210002ac021f6b</iscsi_target>
        <iscsi_target>iqn.2000-05.com.3pardata:21210002ac021f6b</iscsi_target>
    </iscsi_targets>
    <status>complete</status>
</action>


5. Login via RESTAPI one of the discovered targets:

2020-01-07 17:54:00,324 - MainThread - hosts - DEBUG - Action request content is --  url:/ovirt-engine/api/hosts/a5e32c47-498b-4d24-92a1-9a8bf332a03e/iscsilogin 

body:<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <iscsi>
        <address>3par-iscsi-1.scl.lab.tlv.redhat.com</address>
        <target>iqn.2000-05.com.3pardata:20210002ac021f6b</target>
    </iscsi>
</action>


Actual results:
ISCSI login fails with the following REST response:
2020-01-07 17:54:03,431 - MainThread - api_utils - ERROR - Failed to syncAction element NOT as expected:
        Status: 400
        Reason: Bad Request
        Detail: [failed to setup iscsi subsystem]

2020-01-07 17:54:03,435 - MainThread - hosts - ERROR - Response code is not valid, expected is: [200, 201], actual is: 400 

Actual results:

1) In both versions (4.3 and 4.4) the configuration/creation of the login process of iscsi is the same. 
The suggestion from RHV was to delete the nodes from the host, add them again and then try to login to iscsi:

The following describes the error with the iscsi login on an RHV4.4/RHEL8.2 host:
# ISCSI Logout:
[anaconda root@storage-ge11-vdsm1 ~]# iscsiadm -m session -u                    
iscsiadm: No matching sessions found

# Removing the nodes:
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# rm -rf /var/lib/iscsi/nodes/*

# Adding the nodes
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 10.35.146.1,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1
iqn.2000-05.com.3pardata:20210002ac021f6b] added

# ISCSI login to one target - the login fails: 
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260]
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260].
iscsiadm: initiator reported error (15 - session exists)
iscsiadm: Could not log into all portals


Expected results:


Additional info:

The following describes the iscsi login on a RHV4.3/RHEL7.7 host:

# ISCSI Loggout:
[root@storage-ge2-vdsm1 ~]# iscsiadm -m session -u
Logging out of session [sid: 58, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
Logout of [sid: 58, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.

#Removing the nodes
[root@storage-ge2-vdsm1 ~]# rm -rf /var/lib/iscsi/nodes/*

#Adding the nodes
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 10.35.146.1,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added

# ISCSI login to one target - the login succeeds:
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] (multiple)
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] (multiple)
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] successful.

We drilled down to the ISCSI DB, after removing the sessions, we saw in RHV4.3/RHEL7.7 that there are multiple nodes of connection addresses (node.conn[0].address = 3par-iscsi-1.scl.lab.tlv.redhat.com, node.conn[0].address = 10.35.146.2, node.conn[0].address = 10.35.146.1). 
The host can deal with the multiple nodes and make the login successfully. However, on RHV4.4/RHEL8.2 the host cannot deal with the multiple nodes.
How is the ISCSI login supposed to work? Are multiple nodes supposed to be supported in RHV4.4/RHEL8.2?

Comment 1 Shir Fishbain 2020-03-10 21:16:10 UTC
Created attachment 1669086 [details]
Logs

Comment 2 Shir Fishbain 2020-03-10 21:17:07 UTC
Created attachment 1669087 [details]
journal_log

Comment 3 Shir Fishbain 2020-03-10 21:18:04 UTC
Created attachment 1669088 [details]
2 different envs with different storages logs

Comment 4 Martin Tessun 2020-03-11 08:29:39 UTC
This sounds like iscsi-initiator-utils to me.

Some observations/question from my side:

1. Why can RHV not deal with multiple initiators?
   multipathd should take care of this
2. The error looks like the interface was shut down/could not reach the iscsi storage for some time
   Feb 23 09:55:08 storage-ge4-vdsm2 iscsid[1041]: iscsid: Connection837:0 to [target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: \
                                                           3par-iscsi-1.scl.lab.tlv.redhat.com,3260] through [iface: default] is shutdown.

(2) leads to no more paths available:
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: Entering recovery mode: max_retries=4
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: remaining active paths: 0
Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 8:16: cannot find block device
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b Last path deleted, disabling queueing
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: map in use
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: can't flush
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: queueing disabled
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: 360002ac0000000000000001600021f6b: load table [0 314572800 multipath 0 1 alua 0 0]
Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdb [8:16]: path removed from map 360002ac0000000000000001600021f6b

So can you describe what the issue is with the multiple initiators in RHV, as RHV seems to be using multipathd for solving this.
Maybe this is just that vdsm does not log in to all targets?

Comment 5 Shir Fishbain 2020-03-11 16:01:33 UTC
(In reply to Martin Tessun from comment #4)
> This sounds like iscsi-initiator-utils to me.
> 
> Some observations/question from my side:
> 
> 1. Why can RHV not deal with multiple initiators?
>    multipathd should take care of this
> 2. The error looks like the interface was shut down/could not reach the
> iscsi storage for some time
>    Feb 23 09:55:08 storage-ge4-vdsm2 iscsid[1041]: iscsid: Connection837:0
> to [target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: \
>                                                           
> 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] through [iface: default] is
> shutdown.

Benny or Nir, you can please shed some light on the multiple initiators issue? 

> (2) leads to no more paths available:
> Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: Entering recovery mode: max_retries=4
> Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: remaining active paths: 0
> Feb 23 09:55:08 storage-ge4-vdsm2 multipathd[596]: 8:16: cannot find block
> device
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b Last path deleted, disabling queueing
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: map in use
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: can't flush
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: queueing disabled
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]:
> 360002ac0000000000000001600021f6b: load table [0 314572800 multipath 0 1
> alua 0 0]
> Feb 23 09:55:09 storage-ge4-vdsm2 multipathd[596]: sdb [8:16]: path removed
> from map 360002ac0000000000000001600021f6b
> 
> So can you describe what the issue is with the multiple initiators in RHV,
> as RHV seems to be using multipathd for solving this.
As I described in the description of the bug : 
The multiple initiators in RHV supported just in 4.3 and not in 4.4. In both versions, the configuration/creation of the login process of iscsi is the same. 

We drilled down to the ISCSI DB, after removing the sessions, we saw in RHV4.3/RHEL7.7 that there are multiple nodes of connection addresses , the connection address are: 
node.conn[0].address = 3par-iscsi-1.scl.lab.tlv.redhat.com 
node.conn[0].address = 10.35.146.2
node.conn[0].address = 10.35.146.1

The host can deal with the multiple nodes and make the login successfully just in RHV4.3/RHEL7.7. 
However, on RHV4.4/RHEL8.2 the host cannot deal with the multiple nodes.

The real question here is: whether it was made on purpose on version 4.3 in order to solve a specific problem with iscsi connection? 
and because of that is missing on 4.4 version.
Or else, whether the multiple initiators were supposed to be on 4.4 version as it is on and operating on 4.3?
Are multiple nodes supposed to be supported in RHV4.4/RHEL8.2?
> Maybe this is just that vdsm does not log in to all targets?
Benny or Nir, you can help with the question?

Comment 6 Benny Zlotnik 2020-03-12 08:17:08 UTC
I think the key issue here is a change in behaviour. For example, here, in el8:
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260]
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260].
iscsiadm: initiator reported error (15 - session exists)
iscsiadm: Could not log into all portals

iscsiadm identify that two targets are the same and doesn't create a second session.
While in 7.7, it will identify and let you create multiple session:
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] (multiple)
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] (multiple)
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] successful.

Now, should there really be two sessions created for this use-case? I'm not sure, but this is likely something related to the automation.
The problem is the different behaviour, in the two cases. If the change is intentional and it should not have worked as it does in 7.7, then we need to either provide a workaround in vdsm (ignore the failed login and use just one session) or QE will need to figure out why are they logging into the FQDN and not the IP provided by the discovery

Comment 7 Shir Fishbain 2020-03-12 09:57:16 UTC
(In reply to Benny Zlotnik from comment #6)
> I think the key issue here is a change in behaviour. For example, here, in
> el8:
> [anaconda root@storage-ge11-vdsm1
> iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T
> iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
> Logging in to [iface: default, target:
> iqn.2000-05.com.3pardata:20210002ac021f6b, portal:
> 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
> Logging in to [iface: default, target:
> iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260]
> Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b,
> portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
> iscsiadm: Could not login to [iface: default, target:
> iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260].
> iscsiadm: initiator reported error (15 - session exists)
> iscsiadm: Could not log into all portals
> 
> iscsiadm identify that two targets are the same and doesn't create a second
> session.
> While in 7.7, it will identify and let you create multiple session:
> [root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T
> iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
> Logging in to [iface: default, target:
> iqn.2000-05.com.3pardata:20210002ac021f6b, portal:
> 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] (multiple)
> Logging in to [iface: default, target:
> iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260]
> (multiple)
> Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b,
> portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
> Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b,
> portal: 10.35.146.1,3260] successful.
> 
> Now, should there really be two sessions created for this use-case? I'm not
> sure, but this is likely something related to the automation.
> The problem is the different behaviour, in the two cases. If the change is
> intentional and it should not have worked as it does in 7.7, then we need to
> either provide a workaround in vdsm (ignore the failed login and use just
> one session) or QE will need to figure out why are they logging into the
> FQDN and not the IP provided by the discovery

Benny,
As you probably recall, we have run the MANUAL tests together on February 26th - And the outcome was that the iscsi login still failed on version RHV4.4/RHEL8.2.
The conclusion that we both agreed on was that the issue is with the multiple initiators, while version 4.3 works successfully and version 4.4 does not. We also agreed on opening a bug to RHEL about the issue. 


The following describes the error with the iscsi login on an RHV4.4/RHEL8.2 host that ran manually:
# ISCSI Logout:
[anaconda root@storage-ge11-vdsm1 ~]# iscsiadm -m session -u                    
iscsiadm: No matching sessions found

# Removing the nodes:
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# rm -rf /var/lib/iscsi/nodes/*

# Adding the nodes
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 10.35.146.1,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1
iqn.2000-05.com.3pardata:20210002ac021f6b] added

# ISCSI login to one target - the login fails: 
[anaconda root@storage-ge11-vdsm1 iqn.2000-05.com.3pardata:20210002ac021f6b]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260]
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
iscsiadm: Could not login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260].
iscsiadm: initiator reported error (15 - session exists)
iscsiadm: Could not log into all portals


The following describes the iscsi login on an RHV4.3/RHEL7.7 host that ran manually:
# ISCSI Loggout:
[root@storage-ge2-vdsm1 ~]# iscsiadm -m session -u
Logging out of session [sid: 58, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260]
Logout of [sid: 58, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.

#Removing the nodes
[root@storage-ge2-vdsm1 ~]# rm -rf /var/lib/iscsi/nodes/*

#Adding the nodes
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 3par-iscsi-1.scl.lab.tlv.redhat.com,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 --op=new
New iSCSI node [tcp:[hw=,ip=,net_if=,iscsi_if=default] 10.35.146.1,3260,1 iqn.2000-05.com.3pardata:20210002ac021f6b] added

# ISCSI login to one target - the login succeeds:
[root@storage-ge2-vdsm1 ~]# /sbin/iscsiadm -m node -T iqn.2000-05.com.3pardata:20210002ac021f6b -I default -p 10.35.146.1:3260,1 -l
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] (multiple)
Logging in to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] (multiple)
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 3par-iscsi-1.scl.lab.tlv.redhat.com,3260] successful.
Login to [iface: default, target: iqn.2000-05.com.3pardata:20210002ac021f6b, portal: 10.35.146.1,3260] successful.

Comment 8 Benny Zlotnik 2020-03-12 10:52:02 UTC
>Benny,
>As you probably recall, we have run the MANUAL tests together on February 26th - And the outcome was that the iscsi login still failed on version RHV4.4/RHEL8.2.
>The conclusion that we both agreed on was that the issue is with the multiple initiators, while version 4.3 works successfully and version 4.4 does not. We also agreed on opening a bug to RHEL about the issue. 
This is not the point, there are 2 things at play here:
1. Automation requires 2 sessions for the same target (why is that?) and logs in to the FQDN, which is why we have the FQDN node
2. There is a different behaviour in how iscsiadm treats creating multiple sessions for the same target between el7 and el8

So we need to understand why the behaviour changed, since the reason that this has worked before might be a bug and it was fixed, but I don't know enough to tell, for this we need input from open iscsi folks

Comment 10 Nir Soffer 2020-03-31 13:56:05 UTC
Before we waste platform folks time we should check:
- Can this be reproduced from the UI?
- Can we change the flow using the API to work in similar way to the UI?

Comment 11 Avihai 2020-04-01 07:14:50 UTC
(In reply to Nir Soffer from comment #10)
> Before we waste platform folks time we should check:
Nir, the fact that the issue reproduced in REST and manually without RHV via iscsiadm commands should be enough here to proceed as customers will use this in RHEL8/RHV4.4 and encounter the issue.

We also reproduced the more disturbing issue of losing connectivity(ISCSI sessions and multipath path deleted) from the host which had this ISCSI logging error after RHV host deactivate/activate so the issue is sever when encountered.

I agree that for understanding filly about the severity of the issue is here we should also check RHV webadmin/UI behavior.
All the info is already here we just need platform to confirm this issue exists and see if they can fix it or not(and document why if this will not be fixed)

> - Can this be reproduced from the UI?
Shir will try and publish results.

> - Can we change the flow using the API to work in similar way to the UI?
Can you elaborate on this? what exactly is the difference than what we described using REST/manual iscsiadm we did?

Comment 13 Shir Fishbain 2020-04-06 05:22:09 UTC
Created attachment 1676482 [details]
New_logs_4.4.0-0.29

Comment 15 Shir Fishbain 2020-05-17 06:11:07 UTC
Created attachment 1689311 [details]
new_logs_Netapp

Comment 16 Shir Fishbain 2020-05-17 06:54:14 UTC
Created attachment 1689313 [details]
journalctl_log_with_Netapp

Comment 17 Chris Leech 2020-05-19 04:41:04 UTC
I've tried reproducing this given the steps outlined and I'm not seeing the same behavior.
What I find most interesting is that you're seeing both portals listed during a login when you specified a single portal on the command line, when I do that I only see a single login to the specified portal.

I'm now wondering if there's a target side redirect between the portals happening, and maybe a regression in our handling of that? (possibly only when there's already a node record in place for the redirected target portal?)
I'll look for code changes that might impact that.

In the mean time, the following would also help:

1) a copy of /etc/iscsid.conf and the /var/lib/iscsi/node/* record files

2) setting the debug log level of iscsid to at least 4 and then collecting the output from the journald
    - I like to edit the service file with 'systemctl edit --full iscsid.service' and add options (--debug=4) to the ExecStart and then restart the service
    - A filtered log from 'journalctl -b -u iscsid' should be enough to look at the debug output

3) If you only create a single node record, and then try a login, what does the iscsiadm output look like?  Could you try that from each of the two portals?

Comment 18 Chris Leech 2020-05-19 17:44:14 UTC
Wait, are you creating two node records for the same portal by using a DNS name in one and the IP address in the other?

$ dig 3par-iscsi-1.scl.lab.tlv.redhat.com
...
3par-iscsi-1.scl.lab.tlv.redhat.com. 234 IN A	10.35.146.1

That's fishy, but I can look at it.  There's no changes in the login code, so it's probably not a redirect issue like I was thinking.

Comment 19 Chris Leech 2020-05-20 16:06:21 UTC
The change here was actually fixing an issue in the Open-iSCSI tools to maintain the configured number of active sessions for a node record properly.  Previously the tools would allow for additional sessions to be created during the window of time while the first session was still being activated.

https://github.com/open-iscsi/open-iscsi/commit/a37c4295a2e3f8712921a52db0d35d3506efb19b

iscsiadm does only look at the saved target portal address in the node record, which could be an IP address or DNS name, but when the login request is sent to iscsid it is always resolved to an IP address.  So while iscsiadm saw two node records, iscsid received two login requests for the same target portal with a configuration of only 1 session allowed.

The correct way to create multiple sessions to the same target portal from the same initiator interface is to configure node.session.nr_sessions to the desired value, not to trick iscsiadm into managing multiple records by using a mix of DNS and IP or multiple DNS aliases.

Comment 20 Nir Soffer 2020-05-20 16:19:37 UTC
(In reply to Chris Leech from comment #19)
I think the RHV is not adding multiple sessions for same interface, and this
issue is a user issue configuring the system incorrectly, so RHV should not
be affected by the change to allow only one session per interface.

If I remember correctly, we can discover storage server using DNS or ip
address. But when vdsm return the discovered nodes to engine, it always
return ip address, and engine keep the ip address in the database, and
connect the storage using ip address later.

Shir, please check if this happens when adding storage domain from the UI.

If this happens when using the UI, engine may need a fix.

If this happens only when using the SDK, then this is wrong usage of the
SDK, which should be blocked by engine.

Comment 22 Shir Fishbain 2020-05-26 15:57:28 UTC
Created attachment 1692338 [details]
Screenshots

Comment 25 Nir Soffer 2020-06-08 16:48:33 UTC
(In reply to Shir Fishbain from comment #21)
> (In reply to Nir Soffer from comment #20)
> > (In reply to Chris Leech from comment #19)
> > I think the RHV is not adding multiple sessions for same interface, and this
> > issue is a user issue configuring the system incorrectly, so RHV should not
> > be affected by the change to allow only one session per interface.
> > 
> > If I remember correctly, we can discover storage server using DNS or ip
> > address. But when vdsm return the discovered nodes to engine, it always
> > return ip address, and engine keep the ip address in the database, and
> > connect the storage using ip address later.
> 
> > Shir, please check if this happens when adding storage domain from the UI.
> yes, it always returns an IP address in the UI. Adding 2 screenshots.

The screenshots do not help. We need to know the exact flow and see details
logs showing the details of the interaction between engine, vdsm, and iscsiadm.

Lets try this:

1. Put host in maintenance and stop vdsm.

2. Clean a host from all iSCSI sessions a nodes.

3. Turn on DEBUG level in vdsm so we don't miss any detail

4. Start vdsm

5. Discover the iSCSI target using the DNS name
   (e.g 3par-iscsi-1.scl.lab.tlv.redhat.com)

6. Login to the relevant target

7. Add storage domain using a LUN from the new target.

Does it work?

Now repeat the test with ip address instead of the DNS name.

Attach the engine and vdsm logs showing the time from time vdsm was
started until the end of the test.

Attach engine database dump so we can inspect the information engine 
kept about the connections.

vdsm logs will show the target info sent by engine and the exact commands
used by vdsm.

Comment 26 Michal Skrivanek 2020-06-18 14:31:20 UTC
if we now suspect wrong usage or possibly an engine bug please move that discussion back to bug 1788631, this one tracks RHEL issue which per comment #19 is intentional -> should be closed NOTABUG

Comment 28 Shir Fishbain 2020-06-20 23:36:36 UTC
Created attachment 1698215 [details]
New_Logs

Comment 29 Avihai 2020-06-22 12:49:04 UTC
Hi Nir,

1) Please checkout Shir replay and logs you requested.
The issue is still there and this is a very common scenario a user/customer will tackle so let's try and fix it pre 4.4 GA.

2) Do we need any fix from RHEL here? 

If not please ACK and we will close the RHEL bug as suggested by Michal and move to handle this in RHV bug 1788631.

If I understand correctly this looks the issue is caused by design on RHEL8 by comment #19 and does not affect RHEL users.(Only an Error message in ISCSI login)

It looks like RHV should adjust to this change as it can not add SD after this error is seen.

Comment 30 Nir Soffer 2020-06-23 13:08:07 UTC
Based on comment 19 this is expected behavior.

We will continue to work RHV side in bug 1788631.