Description of problem: There are 3 targets for the iSCSI storage domain in three different networks which is available through the interface as below. ==== iscsiadm -m session -P3|egrep "Current Portal:|Iface Name:" Current Portal: 192.168.1.2:3260,1 Iface Name: eth2 Current Portal: 192.168.3.4:3260,1 Iface Name: eth4 Current Portal: 192.168.2.4:3260,1 Iface Name: eth3 === Three iSCSI multipathing is created for all the 3 targets under Data Center. However, if any of the two targets go down and if you are activating a host from maintenance mode, it will fail to activate and will go into "non-operational" status since "ConnectStorageServerVDSCommand" will timeout in the engine side. === 2020-01-01 00:52:26,067-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [62de0282] Command 'ConnectStorageServerVDSCommand(HostName = 192.168.0.8, StorageServerConnectionManagementVDSParameters:{hostId='59fbfff0-826b-4cb9-a0ca-6a4bff87e41f', storagePoolId='7b3b5614-d62c-11e9-9f8c-566f0ebd0003', storageType='ISCSI', connectionList='[StorageServerConnections:{id='d480c8a1-f08b-49a5-ace2-19d89399f9c2', connection='192.168.1.2', iqn='iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='eth2', netIfaceName='eth2'}, StorageServerConnections:{id='d0956332-bd3a-41d4-a381-5958d8557df2', connection='192.168.2.4', iqn='iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='eth3', netIfaceName='eth3'}, StorageServerConnections:{id='374247ba-68c7-4beb-a64b-d89084ab7670', connection='192.168.3.4', iqn='iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='eth4', netIfaceName='eth4'}]', sendNetworkEventOnFailure='true'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2020-01-01 00:52:26,078-05 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-21641) [62de0282] EVENT_ID: VDS_HOST_NOT_RESPONDING(9,027), Host 192.168.0.8 is not responding. Host cannot be fenced automatically because power management for the host is disabled. 2020-01-01 00:52:26,318-05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-68) [11871461] Host '192.168.0.8'(59fbfff0-826b-4cb9-a0ca-6a4bff87e41f) is already in NonOperational status for reason 'STORAGE_DOMAIN_UNREACHABLE'. SetNonOperationalVds command is skipped. === The issue is because the connectStorageServer will take more than 3 minutes on the vdsm side as each iSCSI login to the unreachable target will take around 2 minutes to timeout. === 2020-01-01 11:19:25,855+0530 INFO (jsonrpc/5) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'7b3b5614-d62c-11e9-9f8c-566f0ebd0003', conList=[{u'netIfaceName': u'eth2', u'id': u'd480c8a1-f08b-49a5-ace2-19d89399f9c2', u'connection': u'192.168.1.2', u'iqn': u'iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'ifaceName': u'eth2', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eth3', u'id': u'd0956332-bd3a-41d4-a381-5958d8557df2', u'connection': u'192.168.2.4', u'iqn': u'iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'ifaceName': u'eth3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eth4', u'id': u'374247ba-68c7-4beb-a64b-d89084ab7670', u'connection': u'192.168.3.4', u'iqn': u'iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'ifaceName': u'eth4', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.0.7,51106, flow_id=62de0282, task_id=9d1e713f-cffd-4147-b048-5f9164f24990 (api:48) 2020-01-01 11:19:26,390+0530 DEBUG (jsonrpc/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b -I eth3 -p 192.168.2.4:3260,1 -l (cwd None) (commands:198) <----- Took 2 minutes to timeout ---> 2020-01-01 11:21:26,610+0530 DEBUG (jsonrpc/5) [storage.Misc.excCmd] FAILED: <err> = 'iscsiadm: Could not login to [iface: eth3, target: iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b, portal: 192.168.2.4,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:219) 2020-01-01 11:21:27,084+0530 DEBUG (jsonrpc/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b -I eth4 -p 192.168.3.4:3260,1 -l (cwd None) (commands:198) <----- Took 2 minutes to timeout ---> 2020-01-01 11:23:27,296+0530 DEBUG (jsonrpc/5) [storage.Misc.excCmd] FAILED: <err> = 'iscsiadm: Could not login to [iface: eth4, target: iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b, portal: 192.168.3.4,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:219) 2020-01-01 11:23:27,892+0530 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 242.04 seconds (__init__:312) ==== So it took around 4 minutes to complete the connectStorageServer and the engine will timeout before this and marks host as "non operational". Version-Release number of selected component (if applicable): vdsm-4.30.24-2.el7ev.x86_64 rhvm-4.3.6.7-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Create an iSCSI storage with 3 target IP. 2. Add the storage domain in RHV by login to all the 3 targets. 3. Add 3 iSCSI bond to each target. 4. Shutdown the interfaces of two targets from the storage. 5. Put one host into maintenance mode and try to activate it. It will go into non-operational status. Actual results: Host fails to activate in RHV and goes to non-operational status when some of the iSCSI targets are down. Expected results: Since the storage is still available through one path, it should not mark the host as "non-operational". I think there should be a different timeout for connectStorageServer or it should somehow intelligently detect the iSCSI targets are accessible before trying to login to the target so that it should not wait for 2 minutes to timeout while trying to login to each unavailable target. Additional info:
*** Bug 1777888 has been marked as a duplicate of this bug. ***
Snippet from Vdsm hsm.connectStorageServer(): res = [] connections = [] for conDef in conList: conInfo = _connectionDict2ConnectionInfo(domType, conDef) conObj = storageServer.ConnectionFactory.createConnection(conInfo) try: self._connectStorageOverIser(conDef, conObj, domType) conObj.connect() except Exception as err: self.log.error( "Could not connect to storageServer", exc_info=True) status, _ = self._translateConnectionError(err) else: status = 0 connections.append(conObj) res.append({'id': conDef["id"], 'status': status}) Engine uses 'vdsTimeout' config (default is 180 seconds) to determine that vds command has timed out. iscsiadm default command execution timeout is 120 seconds (set by node.session.timeo.replacement_timeout in iscsi conf). If we would parallelize the connections loop, we could* finish all connect() attempts within 2 minutes and Engine is not expected to deem this as a timed-out vds command. * Disclaimers: The IscsiConnection.connect() runs the following commands which have to be executed in sequence: 1. iscsiadm new node 2. iscsiadm login node 3. iscsiadm update node 4. udev settle (for 5 seconds by default) If any command in (1-3) hits timeout we break from connect() with IscsiNodeError exception, however bad scenario could be that we spend 1:59 minutes on each step successfully, totalling 6 minutes for the entire connect() flow. So I think that we should set the default iscsiadm timeout to 40 seconds tops, assuming we go for parallezing the connect() flows per each target connection.
(In reply to Amit Bawer from comment #3) > Snippet from Vdsm hsm.connectStorageServer(): > > res = [] > connections = [] > for conDef in conList: > conInfo = _connectionDict2ConnectionInfo(domType, conDef) > conObj = > storageServer.ConnectionFactory.createConnection(conInfo) > try: > self._connectStorageOverIser(conDef, conObj, domType) > conObj.connect() > except Exception as err: > self.log.error( > "Could not connect to storageServer", exc_info=True) > status, _ = self._translateConnectionError(err) > else: > status = 0 > connections.append(conObj) > > res.append({'id': conDef["id"], 'status': status}) > > > Engine uses 'vdsTimeout' config (default is 180 seconds) to determine that > vds command has timed out. > iscsiadm default command execution timeout is 120 seconds (set by > node.session.timeo.replacement_timeout in iscsi conf). In my recreation, and also in the original vdsm.log, the hanging command is the iscsiadm login (-l) for 120 seconds. The iscsiadm login timeout is manipulated by the following /etc/iscsi/iscsid.conf parameters: node.session.initial_login_retry_max = 8 node.conn[0].timeo.login_timeout = 15 setting the iscsiadm -l command timeout to 8*15=120 seconds by default. Setting "node.session.initial_login_retry_max = 1" and restarting iscsid on the host allowed connecting to SD when 3 of the 4 target interfaces were down while re-activating host from maintenance, so current workaround for this issue is a host iscsid setup. > > If we would parallelize the connections loop, we could* finish all connect() > attempts within 2 minutes and Engine is not expected to deem this as a > timed-out vds command. > > * Disclaimers: > > The IscsiConnection.connect() runs the following commands which have to be > executed in sequence: > > 1. iscsiadm new node > 2. iscsiadm login node > 3. iscsiadm update node > 4. udev settle (for 5 seconds by default) > > If any command in (1-3) hits timeout we break from connect() with > IscsiNodeError exception, > however bad scenario could be that we spend 1:59 minutes on each step > successfully, totalling 6 minutes for the entire connect() flow. > > So I think that we should set the default iscsiadm timeout to 40 seconds > tops, assuming we go for parallezing the connect() flows per each target > connection.
does this track making calls in parallel or iscsid.conf timeout config to 40s or both?
It provides information on what is the issue and what resolution might be: a. Parallelize the iscsi connect calls for multiple target interfaces. b. Reduce the wait time for login attempt on an off-line target interface c. Combination of any of the above.
Seems that concurrent iscsiadm logins cannot be achieved as iscsiadm itself is racy and we have transaction lock for vdsm iscsiadm command sessions added back on https://gerrit.ovirt.org/#/c/2750/ for bz802347. So using a thread per each connection attempt will be effectively sequential as it would have to wait for another failing interface login spending 120 seconds of retries before acquiring the transaction lock. Hence IMO it should be a config change for iscsiadm to shorten the retry period in case of login failures.
(In reply to Amit Bawer from comment #8) > Seems that concurrent iscsiadm logins cannot be achieved as iscsiadm itself > is racy and we have transaction lock for vdsm iscsiadm command sessions added > back on https://gerrit.ovirt.org/#/c/2750/ for bz802347. Bug 802347 does not contain any useful information about the problem, and there is no analysis of the issue. There is also no input from iscsiadm developers about the "race" in iscsiadm. The patch also does not contain any justification about the global lock added. I think we need to check if the lock is really required and remove it if possible. We also need to discuss with iscsiadm authors why we have so many login attempts and if using lower number of attempts is a good idea.
(In reply to Nir Soffer from comment #9) > (In reply to Amit Bawer from comment #8) > > Seems that concurrent iscsiadm logins cannot be achieved as iscsiadm itself > > is racy and we have transaction lock for vdsm iscsiadm command sessions added > > back on https://gerrit.ovirt.org/#/c/2750/ for bz802347. > > Bug 802347 does not contain any useful information about the problem, and > there > is no analysis of the issue. There is also no input from iscsiadm developers > about the "race" in iscsiadm. > > The patch also does not contain any justification about the global lock > added. > > I think we need to check if the lock is really required and remove it if > possible. Would try a lockless version. > > We also need to discuss with iscsiadm authors why we have so many login > attempts and if using lower number of attempts is a good idea. Raised questions at the open-iscsi mailing list: https://groups.google.com/g/open-iscsi/c/OHOdIm1W274/m/KW-2saC7AgAJ
According to reply from open-iscsi maintainer about iscsiadm being racy[1] and iscsi login errors i got when trying a parallel lockless version, it seems more promising at this point to use only a configuration change as applied in the attached patch. [1] https://groups.google.com/forum/embed/#!msg/open-iscsi/OHOdIm1W274/9l5NcPQHBAAJ
Pending input from open-iscsi [2] about testing different login schemes: We have a limitation on our RHEV system where all logins to listed iscsi targets should finish within 180 seconds in total. In our current implementation we serialize the iscsiadm node logins one after the other, each is for specific target and portal. In such scheme, each login would wait 120 seconds in case a portal is down (default 15 seconds login timeout * 8 login retries), so if we have 2 or more connections down, we spend at least 240 seconds which exceeds our 180 seconds time limit and the entire operation is considered to be failed (RHEV-wise). Testing [1] different login schemes is summarized in the following table (logins to 2 targets with 2 portals each). It seems that either login-all nodes after creating them, as suggested in previous answer here, compares in total time spent with doing specific node logins concurrently (i.e. running iscsiadm -m node -T target -p portal -I interface -l in parallel per each target-portal), for both cases of all portals being online and when one portal is down: Login scheme On-line Portals Active Sessions Total Login Time (sec) --------------------------------------------------------------------------------------------- All at once 2/2 4 2.1 All at once 1/2 2 120.2 Serial target-portal 2/2 4 8.5 Serial target-portal 1/2 2 243.5 Concurrent target-portal 2/2 4 2.1 Concurrent target-portal 1/2 2 120.1 Using concurrent target-portal logins seems to be preferable in our perspective as it allows to connect only to the specified target and portals without the risk of intermixing with other potential iscsi targets. The node creation part is kept serial in all tests here and we have seen it may result in the iscsi DB issues if run in parallel. But using only node logins in parallel doesn't seems to have issues for at least 1000 tries of out tests. The question to be asked here is it advisable by open-iscsi? I know I have been answered already that iscsiadm is racy, but does it applies to node logins as well? The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider. [1] https://gerrit.ovirt.org/#/c/110432 [2] https://groups.google.com/g/open-iscsi/c/OHOdIm1W274/m/XpFw5ikzBwAJ
Chris, could you provide your advice about concurrent iscsiadm logins discussed in comment #12?
Another point i'd like to ask about is iSER fallback that we have: Currently we check during connection flow if 'iser' is set on iscsi_default_ifaces in our configuration. If yes, it is first checked if its working on server side by attempting iscsiadm -m node -T target -I iser -p portal -l iscsiadm -m node -T target -I iser -p portal -u If the login/logout worked it is kept as 'iser' instead of 'default' interface setup, otherwise it fallbacks to 'default'. This is used later for the actual node login. The thing is that this check can also waste valuable time when the portal is down, is there a way to fallback in the iscsiadm command itself, or prefer a specific interface type when trying all/parallel logins for same target+portal but with different interfaces types?
(In reply to Amit Bawer from comment #12) > Pending input from open-iscsi [2] about testing different login schemes: > > We have a limitation on our RHEV system where all logins to listed iscsi > targets should finish within 180 seconds in total. > In our current implementation we serialize the iscsiadm node logins one > after the other, > each is for specific target and portal. In such scheme, each login would > wait 120 seconds in case a portal is down > (default 15 seconds login timeout * 8 login retries), so if we have 2 or > more connections down, we spend at least 240 seconds > which exceeds our 180 seconds time limit and the entire operation is > considered to be failed (RHEV-wise). > > Testing [1] different login schemes is summarized in the following table > (logins to 2 targets with 2 portals each). > It seems that either login-all nodes after creating them, as suggested in > previous answer here, compares in total time spent > with doing specific node logins concurrently (i.e. running iscsiadm -m node > -T target -p portal -I interface -l in parallel per > each target-portal), for both cases of all portals being online and when one > portal is down: > > Login scheme On-line Portals Active Sessions Total > Login Time (sec) > ----------------------------------------------------------------------------- > ---------------- > All at once 2/2 4 > 2.1 > All at once 1/2 2 > 120.2 > Serial target-portal 2/2 4 > 8.5 > Serial target-portal 1/2 2 > 243.5 > Concurrent target-portal 2/2 4 > 2.1 > Concurrent target-portal 1/2 2 > 120.1 > > > Using concurrent target-portal logins seems to be preferable in our > perspective as it allows to connect only to the > specified target and portals without the risk of intermixing with other > potential iscsi targets. > > The node creation part is kept serial in all tests here and we have seen it > may result in the iscsi DB issues if run in parallel. > But using only node logins in parallel doesn't seems to have issues for at > least 1000 tries of out tests. > > The question to be asked here is it advisable by open-iscsi? > I know I have been answered already that iscsiadm is racy, but does it > applies to node logins as well? > > The other option is to use one login-all call without parallelism, but that > would have other implications on our system to consider. > > [1] https://gerrit.ovirt.org/#/c/110432 > [2] https://groups.google.com/g/open-iscsi/c/OHOdIm1W274/m/XpFw5ikzBwAJ (apologies for the stale needinfos, but I see this is still active again so here goes) Concurrent iscsiadm command should be fine, iscsid processes them in the order received and handles async tasks for long running commands. The iscsiadm to iscsid control messaging is over connected UNIX socketpairs, so responses are targeted to the correct iscsiadm process. With a single iscsid directing the kernel code over netlink there should be no issues. The big iscsistart issues were a combination of netlink responses from the kernel being broadcast confusing multiple iscsistart "back half" (essentially iscsid) instances, and UNIX abstract namespace collisions causing multiple iscsistart "front half" (essentially iscsiadm) to end up talking to a single back half which could be terminated before handling all the requests. These should be addressed now, but using iscsid and iscsiadm is preferred.
Verification flow: ================== * create 3 NICs on the iSCSI target server (if it's a VM, by adding NICs into HW) * create 3 portals on the iSCSI target server (e.g. targetcli /iscsi/iqn.2016-01.com.ovirt:444/tpg1/portals create $PORTAL_IP) * in oVirt discover all three portal and log into all of them * create iSCSI SD from LUN(s) provided by these portals * put one of the hosts into maintenance * make sure host is already disconnected from all iSCSI targets (e.g. by running iscsiadm -m session -P3|egrep "Current Portal:|Iface Name:" - should not show anything) * on the iSCSI target put 2 NICs down (e.g. nmcli connection down $CONN_NAME) * put the hosts back online Without fix: ============ engine command times out: 2021-12-20 06:42:41,010-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [4bf1869b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM stream8-host-test1 command ConnectStorageServerVDS failed: Message timeout which can be caused by communication issues ... 2021-12-20 06:42:41,035-05 ERROR [org.ovirt.engine.core.bll.storage.pool.ConnectHostToStoragePoolServersCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [4bf1869b] Command 'org.ovirt.engine.core.bll.storage.pool.ConnectHostToStoragePoolServersCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) and host is set as non-operational: 2021-12-20 06:42:41,114-05 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [274c39b] EVENT_ID: VDS_SET_NONOPERATIONAL_DOMAIN(522), Host stream8-host-test1 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default. Setting Host state to Non-Operational. With fix (vdsm log with debug enabled): ======================================= 1.API call ------------- 2021-12-20 07:39:13,888-0500 INFO (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=3, spUUID='19016d46-17b4-11ec-9517-5254003aa1f8', conList=[{'password': '********', 'port': '3260', 'iqn': 'iqn.2016-01.com.ovirt:444', 'connection': '192.168.1 22.254', 'ipv6_enabled': 'false', 'id': '994a711a-60f3-411a-aca2-0b60f01e8b8c', 'user': '', 'tpgt': '1'}, {'password': '********', 'port': '3260', 'iqn': 'iqn.2016-01.com.ovirt:444', 'connection': '192.168.122.115', 'ipv6_enabled': 'false', 'id': '60b3a e07-db32-470d-9ed8-d1f99efcc99c', 'user': '', 'tpgt': '1'}, {'password': '********', 'port': '3260', 'iqn': 'iqn.2016-01.com.ovirt:444', 'connection': '192.168.122.210', 'ipv6_enabled': 'false', 'id': '14cd4d9d-d7ee-4a3a-9928-d9c37ec34a6f', 'user': '', 'tpgt': '1'}, {'password': '********', 'port': '3260', 'iqn': 'iqn.2016-01.com.ovirt:444', 'connection': '192.168.122.235', 'ipv6_enabled': 'false', 'id': 'f10ac130-6988-4e34-ae46-fef4fd9e7de2', 'user': '', 'tpgt': '1'}, {'password': '********', 'port': '3260', 'iqn': 'iqn.2016-01.com.ovirt:444', 'connection': '192.168.122.72', 'ipv6_enabled': 'false', 'id': '971386cf-1b46-4dea-92f4-9439237c9767', 'user': '', 'tpgt': '1'}]) from=::ffff:192.168.122.192,55666, flow_id=27e1c881, task_id=6b67e007-7e70-4dc 8-995f-7d4146d814ee (api:48) 2. Serial part - setting up nodes -------------------------------------------- 2021-12-20 07:39:13,888-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.254:3260,1 --op=new (cwd None) (commands:154) 2021-12-20 07:39:13,929-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:13,929-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.254:3260,1 -n node.startup -v manual --op=update (cwd None) ( commands:154) 2021-12-20 07:39:13,968-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:13,968-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface -I default (cwd None) (commands:154) 2021-12-20 07:39:14,008-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,009-0500 DEBUG (jsonrpc/7) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:39:14,009-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.115:3260,1 --op=new (cwd None) (commands:154) 2021-12-20 07:39:14,043-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,044-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.115:3260,1 -n node.startup -v manual --op=update (cwd None) ( commands:154) 2021-12-20 07:39:14,073-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,074-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface -I default (cwd None) (commands:154) 2021-12-20 07:39:14,101-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,102-0500 DEBUG (jsonrpc/7) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:39:14,102-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.210:3260,1 --op=new (cwd None) (commands:154) 2021-12-20 07:39:14,129-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,129-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.210:3260,1 -n node.startup -v manual --op=update (cwd None) ( commands:154) 2021-12-20 07:39:14,156-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,156-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface -I default (cwd None) (commands:154) 2021-12-20 07:39:14,183-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,183-0500 DEBUG (jsonrpc/7) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:39:14,184-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.235:3260,1 --op=new (cwd None) (commands:154) 2021-12-20 07:39:14,211-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,211-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.235:3260,1 -n node.startup -v manual --op=update (cwd None) ( commands:154) 2021-12-20 07:39:14,238-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,238-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface -I default (cwd None) (commands:154) 2021-12-20 07:39:14,266-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,266-0500 DEBUG (jsonrpc/7) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:39:14,266-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.72:3260,1 --op=new (cwd None) (commands:154) 2021-12-20 07:39:14,294-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,294-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.72:3260,1 -n node.startup -v manual --op=update (cwd None) (c ommands:154) 2021-12-20 07:39:14,321-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,322-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface -I default (cwd None) (commands:154) 2021-12-20 07:39:14,350-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 3. Concurrent part - login start ------------------------------------------ 2021-12-20 07:39:14,351-0500 DEBUG (iscsi-login/0) [root] START thread <Thread(iscsi-login/0, started daemon 139636961634048)> (func=<function tmap.<locals>.worker at 0x7effd843ca60>, args=(), kwargs={}) (concurrent:258) 2021-12-20 07:39:14,351-0500 DEBUG (iscsi-login/0) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.254:3260,1 -l (cwd None) (commands:154) 2021-12-20 07:39:14,354-0500 DEBUG (iscsi-login/1) [root] START thread <Thread(iscsi-login/1, started daemon 139636928063232)> (func=<function tmap.<locals>.worker at 0x7effd843ca60>, args=(), kwargs={}) (concurrent:258) 2021-12-20 07:39:14,356-0500 DEBUG (iscsi-login/1) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.115:3260,1 -l (cwd None) (commands:154) 2021-12-20 07:39:14,362-0500 DEBUG (iscsi-login/2) [root] START thread <Thread(iscsi-login/2, started daemon 139636902885120)> (func=<function tmap.<locals>.worker at 0x7effd843ca60>, args=(), kwargs={}) (concurrent:258) 2021-12-20 07:39:14,364-0500 DEBUG (iscsi-login/2) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.210:3260,1 -l (cwd None) (commands:154) 2021-12-20 07:39:14,370-0500 DEBUG (iscsi-login/3) [root] START thread <Thread(iscsi-login/3, started daemon 139637372745472)> (func=<function tmap.<locals>.worker at 0x7effd843ca60>, args=(), kwargs={}) (concurrent:258) 2021-12-20 07:39:14,372-0500 DEBUG (iscsi-login/3) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.235:3260,1 -l (cwd None) (commands:154) 2021-12-20 07:39:14,378-0500 DEBUG (iscsi-login/4) [root] START thread <Thread(iscsi-login/4, started daemon 139636944848640)> (func=<function tmap.<locals>.worker at 0x7effd843ca60>, args=(), kwargs={}) (concurrent:258) 2021-12-20 07:39:14,379-0500 DEBUG (iscsi-login/4) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.72:3260,1 -l (cwd None) (commands:154) 4. Concurrent part - login to available targets succeeded ------------------------------------------------------------------------------ 2021-12-20 07:39:14,516-0500 DEBUG (iscsi-login/1) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,516-0500 DEBUG (iscsi-login/1) [root] FINISH thread <Thread(iscsi-login/1, stopped daemon 139636928063232)> (concurrent:261) 2021-12-20 07:39:14,627-0500 DEBUG (iscsi-login/0) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,627-0500 DEBUG (iscsi-login/0) [root] FINISH thread <Thread(iscsi-login/0, stopped daemon 139636961634048)> (concurrent:261) 2021-12-20 07:39:14,845-0500 DEBUG (iscsi-login/2) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:39:14,845-0500 DEBUG (iscsi-login/2) [root] FINISH thread <Thread(iscsi-login/2, stopped daemon 139636902885120)> (concurrent:261) 5. Concurrent part - login to unavailable targets times out -------------------------------------------------------------------------------- 2021-12-20 07:41:14,830-0500 DEBUG (iscsi-login/3) [common.commands] FAILED: <err> = b'iscsiadm: Could not login to [iface: default, target: iqn.2016-01.com.ovirt:444, portal: 192.168.122.235,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:99) 2021-12-20 07:41:14,830-0500 DEBUG (iscsi-login/3) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) (commands:154) 2021-12-20 07:41:14,840-0500 DEBUG (iscsi-login/4) [common.commands] FAILED: <err> = b'iscsiadm: Could not login to [iface: default, target: iqn.2016-01.com.ovirt:444, portal: 192.168.122.72,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:99) 2021-12-20 07:41:14,843-0500 DEBUG (iscsi-login/4) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) (commands:154) 2021-12-20 07:41:14,893-0500 DEBUG (iscsi-login/4) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:14,893-0500 DEBUG (iscsi-login/4) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.72:3260,1 -u (cwd None) (commands:154) 2021-12-20 07:41:14,903-0500 DEBUG (iscsi-login/3) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:14,930-0500 DEBUG (iscsi-login/4) [common.commands] FAILED: <err> = b'iscsiadm: No matching sessions found\n'; <rc> = 21 (commands:99) 2021-12-20 07:41:14,930-0500 DEBUG (iscsi-login/4) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) (commands:154) 2021-12-20 07:41:14,958-0500 DEBUG (iscsi-login/4) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:14,959-0500 DEBUG (iscsi-login/4) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.72:3260,1 --op=delete (cwd None) (commands:154) 2021-12-20 07:41:14,988-0500 DEBUG (iscsi-login/4) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:14,988-0500 DEBUG (iscsi-login/4) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:41:14,988-0500 DEBUG (iscsi-login/4) [root] FINISH thread <Thread(iscsi-login/4, stopped daemon 139636944848640)> (concurrent:261) 2021-12-20 07:41:14,989-0500 DEBUG (iscsi-login/3) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.235:3260,1 -u (cwd None) (commands:154) 2021-12-20 07:41:14,992-0500 ERROR (jsonrpc/7) [storage.storageServer] Could not login to storageServer (storageServer:637) NoneType: None 2021-12-20 07:41:15,018-0500 DEBUG (iscsi-login/3) [common.commands] FAILED: <err> = b'iscsiadm: No matching sessions found\n'; <rc> = 21 (commands:99) 2021-12-20 07:41:15,019-0500 DEBUG (iscsi-login/3) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m iface (cwd None) (commands:154) 2021-12-20 07:41:15,046-0500 DEBUG (iscsi-login/3) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:15,046-0500 DEBUG (iscsi-login/3) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2016-01.com.ovirt:444 -I default -p 192.168.122.235:3260,1 --op=delete (cwd None) (commands:154) 2021-12-20 07:41:15,075-0500 DEBUG (iscsi-login/3) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:15,075-0500 DEBUG (iscsi-login/3) [storage.iscsi] iface.net_ifacename not provided, skipping rp filter setup (iscsi:586) 2021-12-20 07:41:15,076-0500 DEBUG (iscsi-login/3) [root] FINISH thread <Thread(iscsi-login/3, stopped daemon 139637372745472)> (concurrent:261) 2021-12-20 07:41:15,076-0500 ERROR (jsonrpc/7) [storage.storageServer] Could not login to storageServer (storageServer:637) 6. Waiting for udev events ------------------------------------ 2021-12-20 07:41:15,076-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd None) (commands:154) 2021-12-20 07:41:15,090-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 7. Refreshing domain after login -------------------------------------------- 2021-12-20 07:41:15,091-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache (resize=True) (sdc:80) 2021-12-20 07:41:15,091-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Trying to enter sampling method (vdsm.storage.iscsi.<function rescan at 0x7f0028047268>) (misc:343) 2021-12-20 07:41:15,091-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Got in to sampling method (misc:346) 2021-12-20 07:41:15,091-0500 INFO (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices (iscsi:449) 2021-12-20 07:41:15,091-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m session -R (cwd None) (commands:154) 2021-12-20 07:41:15,133-0500 INFO (jsonrpc/7) [storage.iscsi] Scanning iSCSI devices: 0.04 seconds (utils:390) 2021-12-20 07:41:15,133-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Returning last result (misc:353) 2021-12-20 07:41:15,133-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Trying to enter sampling method (vdsm.storage.hba.<function rescan at 0x7f00287b2d08>) (misc:343) 2021-12-20 07:41:15,133-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Got in to sampling method (misc:346) 2021-12-20 07:41:15,133-0500 INFO (jsonrpc/7) [storage.hba] Scanning FC devices (hba:59) 2021-12-20 07:41:15,190-0500 INFO (jsonrpc/7) [storage.hba] Scanning FC devices: 0.06 seconds (utils:390) 2021-12-20 07:41:15,190-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Returning last result (misc:353) 2021-12-20 07:41:15,190-0500 INFO (jsonrpc/7) [storage.multipath] Waiting until multipathd is ready (multipath:112) 2021-12-20 07:41:15,190-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=10 (cwd None) (commands:154) 2021-12-20 07:41:15,205-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:99) 2021-12-20 07:41:17,266-0500 INFO (jsonrpc/7) [storage.multipath] Waited 2.08 seconds for multipathd (tries=2, ready=2) (multipath:139) 2021-12-20 07:41:17,266-0500 INFO (jsonrpc/7) [storage.multipath] Resizing multipath devices (multipath:220) 2021-12-20 07:41:17,283-0500 INFO (jsonrpc/7) [storage.multipath] Resizing multipath devices: 0.02 seconds (utils:390) 2021-12-20 07:41:17,283-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Refreshing storage domain cache: 2.19 seconds (utils:390) 2021-12-20 07:41:17,283-0500 DEBUG (jsonrpc/7) [storage.samplingmethod] Returning last result (misc:353) 2021-12-20 07:41:17,288-0500 DEBUG (jsonrpc/7) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405088203d71b564fe0aaac74f6c$|^/dev/mapper/3600140514f301cb91f04badae7506e46$|^/dev/mapper/360014055ba493238c484d9284005d1bf$|^/dev/mapper/360014059fff83490d9b4ddfbc7e43c20$|^/dev/mapper/36001405a77a1ee25cbf4439b7ddd2062$|^/dev/mapper/36001405cd5268d7e3b64b40a16200729$|^/dev/mapper/36001405e7bafcb012c74e59a51e98ccb$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name (cwd None) (commands:154) 2021-12-20 07:41:17,358-0500 DEBUG (jsonrpc/7) [common.commands] SUCCESS: <err> = b''; <rc> = 0 (commands:186) 2021-12-20 07:41:17,358-0500 DEBUG (jsonrpc/7) [storage.hsm] Found SD uuids: ('66bbf96e-a291-4975-8773-4e6db522dd28', '7f61fd81-9d64-44ce-b9db-825befa0d0d5', 'ed3fb387-44be-4464-9625-f36b3a2cff19', 'f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7') (hsm:2147) 2021-12-20 07:41:17,358-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 66bbf96e-a291-4975-8773-4e6db522dd28 from storage domain cache (sdc:211) 2021-12-20 07:41:17,358-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 7f61fd81-9d64-44ce-b9db-825befa0d0d5 from storage domain cache (sdc:211) 2021-12-20 07:41:17,358-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain ed3fb387-44be-4464-9625-f36b3a2cff19 from storage domain cache (sdc:211) 2021-12-20 07:41:17,358-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7 from storage domain cache (sdc:211) 2021-12-20 07:41:17,358-0500 DEBUG (jsonrpc/7) [storage.hsm] Found SD uuids: ('66bbf96e-a291-4975-8773-4e6db522dd28', '7f61fd81-9d64-44ce-b9db-825befa0d0d5', 'ed3fb387-44be-4464-9625-f36b3a2cff19', 'f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7') (hsm:2147) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 66bbf96e-a291-4975-8773-4e6db522dd28 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 7f61fd81-9d64-44ce-b9db-825befa0d0d5 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain ed3fb387-44be-4464-9625-f36b3a2cff19 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 DEBUG (jsonrpc/7) [storage.hsm] Found SD uuids: ('66bbf96e-a291-4975-8773-4e6db522dd28', '7f61fd81-9d64-44ce-b9db-825befa0d0d5', 'ed3fb387-44be-4464-9625-f36b3a2cff19', 'f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7') (hsm:2147) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 66bbf96e-a291-4975-8773-4e6db522dd28 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain 7f61fd81-9d64-44ce-b9db-825befa0d0d5 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain ed3fb387-44be-4464-9625-f36b3a2cff19 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Removing domain f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7 from storage domain cache (sdc:211) 2021-12-20 07:41:17,359-0500 DEBUG (jsonrpc/7) [storage.hsm] knownSDs: {eb405d82-0cd7-45c0-bba0-3795a34a6e5b: vdsm.storage.nfsSD.findDomain, 66bbf96e-a291-4975-8773-4e6db522dd28: vdsm.storage.blockSD.findDomain, 7f61fd81-9d64-44ce-b9db-825befa0d0d5: vdsm.storage.blockSD.findDomain, ed3fb387-44be-4464-9625-f36b3a2cff19: vdsm.storage.blockSD.findDomain, f345fb3b-c0e9-461a-9f90-dcde6a1d9eb7: vdsm.storage.blockSD.findDomain} (hsm:2200) 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [storage.storagedomaincache] Invalidating storage domain cache (sdc:74) 8. API return ----------------- 2021-12-20 07:41:17,359-0500 INFO (jsonrpc/7) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'id': '60b3ae07-db32-470d-9ed8-d1f99efcc99c', 'status': 0}, {'id': '994a711a-60f3-411a-aca2-0b60f01e8b8c', 'status': 0}, {'id': '14cd4d9d-d7ee-4a3a-9928-d9c37ec34a6f', 'status': 0}, {'id': '971386cf-1b46-4dea-92f4-9439237c9767', 'status': 465}, {'id': 'f10ac130-6988-4e34-ae46-fef4fd9e7de2', 'status': 465}]} from=::ffff:192.168.122.192,55666, flow_id=27e1c881, task_id=6b67e007-7e70-4dc8-995f-7d4146d814ee (api:54) Connections to unavailable portals time out, but host itself is NOT set as non-operational, works normally and SD is available on the host. This applies for 2 unavailable portals. By default vdsm use maximum 10 login thread, so if there are more than 10 unavailable portals, original issue still pop up and admin has to increase number of parallel iscsi logins in vdsm.
Note: to change number of parallel login threads, one needs to create custom vdsm config file in /etc/vdsm/vdsm.conf.d/ (e.g. /etc/vdsm/vdsm.conf.d/99-admin.conf) with [iscsi] parallel_logins=16 In this case, maximum 16 login threads will be used. This config has to be pushed on all hosts.
Thank you Nir for your description and response about the WARN attached. Moving to Verified. rhv-4.5.0-596.7434eeb1dde2.226-DS-nightly vdsm-4.50.0.8-1.el8ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: RHV RHEL Host (ovirt-host) [ovirt-4.5.0] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4764