Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1777888

Summary: Cannot activate host when some iSCSI paths are failed
Product: Red Hat Enterprise Virtualization Manager Reporter: Juan Orti <jortialc>
Component: vdsmAssignee: Nobody <nobody>
Status: CLOSED DUPLICATE QA Contact: Avihai <aefrat>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.6CC: lsurette, nashok, srevivo, ycui
Target Milestone: ---Flags: lsvaty: testing_plan_complete-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-01 07:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Orti 2019-11-28 15:36:33 UTC
Description of problem:
When some iSCSI paths are down due to a maintenance on the storage side, a host that was in maintenance mode cannot be activated because it cannot log into all portals.

The remaining hosts and the storage are working fine.

Version-Release number of selected component (if applicable):
rhvh-4.3.6.2-0.20190924.0
vdsm-4.30.33-1.el7ev.x86_64
ovirt-engine-4.3.5.4-0.1.el7.noarch

How reproducible:
Always on the customer setup and working to reproduce it locally.

Steps to Reproduce:
1. make some iSCSI paths fail
2. put host into maintenance
3. activate host

Actual results:
The host cannot activate because it cannot log into all portals.

Expected results:
Host activated even when some iSCSI paths are failed

Additional info:

vdsm.log:

2019-11-27 12:18:37,723-0300 INFO  (jsonrpc/2) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'13aadec4-00a3-11ea-8697-00163e2d88c6', conList=[{u'netIfaceName': u'eno3', u'id': u'1a2607a3-a258-4e5f-a4c6-e55799fcad65', u'connection': u'192.168.1.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno7', u'id': u'dd1db0bc-f5ab-4564-9940-2e56ccd975c6', u'connection': u'192.168.2.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'3', u'ipv6_enabled': u'false', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno3', u'id': u'983e6eea-a22a-48c5-8f33-286b990e16a2', u'connection': u'192.168.1.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'2', u'ipv6_enabled': u'false', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno7', u'id': u'8bf8d908-1871-4329-96a1-6767d094f362', u'connection': u'192.168.2.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'4', u'ipv6_enabled': u'false', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.18.10,49718, flow_id=23d7c0cb, task_id=0d68a09e-fa0b-41b9-8a49-8a78134a228e (api:48)
2019-11-27 12:19:08,238-0300 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask {'params': {u'connectionParams': [{u'netIfaceName': u'eno7', u'id': u'c25bed96-b6bf-4f16-92ad-09149e7eeb55', u'connection': u'192.168.1.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno7', u'id': u'622e2218-8fc3-4581-86a5-29c8a6164262', u'connection': u'192.168.2.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno7', u'id': u'776aea8f-de90-4644-b0fe-6fa08b34bdc6', u'connection': u'192.168.1.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno7', u'id': u'0b19edc6-8bd6-490d-ac98-e50bee91d039', u'connection': u'192.168.2.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno7', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno3', u'id': u'1382f858-7aef-44db-ab8f-c95d90239b7a', u'connection': u'192.168.1.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno3', u'id': u'bdd83218-0769-4852-9ceb-c577cbf0befc', u'connection': u'192.168.2.1', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno3', u'id': u'99d1ed7d-dd12-48c9-8ee7-b80edaa673a8', u'connection': u'192.168.1.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}, {u'netIfaceName': u'eno3', u'id': u'8ca5c168-9c86-42af-ae28-9156ef4acf3b', u'connection': u'192.168.2.2', u'iqn': u'iqn.2015-11.com.hpe:storage.model.1234567890', u'user': u'', u'tpgt': u'1', u'ifaceName': u'eno3', u'password': '********', u'port': u'3260'}], u'storagepoolID': u'00000000-0000-0000-0000-000000000000', u'domainType': 3}, 'jsonrpc': '2.0', 'method': u'StoragePool.connectStorageServer', 'id': u'bb8b0514-fe33-428b-9b96-8d5cb0b78845'} at 0x7f3b88190850> timeout=60, duration=120.00 at 0x7f3b88190890> task#=0 at 0x7f3bc40d5350>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod
  result = fn(*methodArgs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1102, in connectStorageServer
  connectionParams)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
  result = ctask.prepare(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
  return m(self, *a, **kw)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1179, in prepare
  result = self._run(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
  return fn(*args, **kargs)
File: "<string>", line 2, in connectStorageServer
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2417, in connectStorageServer
  conObj.connect()
File: "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 488, in connect
  iscsi.addIscsiNode(self._iface, self._target, self._cred)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
  iscsiadm.node_login(iface.name, target.address, target.iqn)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 327, in node_login
  portal, "-l"])
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 122, in _runCmd
  return misc.execCmd(cmd, printable=printCmd, sudo=True, sync=sync)
File: "/usr/lib/python2.7/site-packages/vdsm/common/commands.py", line 213, in execCmd
  (out, err) = p.communicate(data)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 924, in communicate
  stdout, stderr = self._communicate(input, endtime, timeout)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1706, in _communicate
  orig_timeout)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1779, in _communicate_with_poll
  ready = poller.poll(self._remaining_time(endtime)) (executor:363)
2019-11-27 12:19:08,939-0300 INFO  (jsonrpc/3) [storage.ISCSI] Setting strict mode rp_filter for device 'eno7'. (iscsi:584)
2019-11-27 12:19:08,940-0300 ERROR (jsonrpc/3) [storage.HSM] Could not connect to storageServer (hsm:2420)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2417, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 488, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (8, ['Logging in to [iface: eno7, target: iqn.2015-11.com.hpe:storage.model.1234567890, portal: 192.168.1.1,3260] (multiple)'], ['iscsiadm: Could not login to [iface: eno7, target: iqn.2015-11.com.hpe:storage.model.1234567890, portal: 192.168.1.1,3260].', 'iscsiadm: initiator reported error (8 - connection timed out)', 'iscsiadm: Could not log into all portals'])

Comment 3 Juan Orti 2019-11-29 11:42:58 UTC
I've tried to reproduce locally and it worked fine. I see some warnings of blocked workers but the host activates.
I've used version rhvh-4.3.6.5-0.20191108, I've asked the customer to upgrade and test latest version.

- Configure target with 2 IPs

- Create iSCSI storage domain, logging to both IPs.

[root@rhvh43-02 ~]# multipath -l
360014058b047a528a114464a688bb42a dm-27 LIO-ORG ,rhvm43-01       
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 7:0:0:0 sdf 8:80 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 8:0:0:0 sdg 8:96 active undef running


- Remove one of the IPs of the target. The storage still works properly. One of the paths is failed:

[root@rhvh43-02 ~]# multipath -ll
360014058b047a528a114464a688bb42a dm-27 LIO-ORG ,rhvm43-01       
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:0 sdf 8:80 active ready running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 8:0:0:0 sdg 8:96 failed faulty running


- Put one host into maintenance mode:

[root@rhvh43-02 ~]# multipath -ll
360014058b047a528a114464a688bb42a dm-27 
size=100G features='0' hwhandler='0' wp=rw


- Activate again:

# multipath -ll
360014058b047a528a114464a688bb42a dm-27 LIO-ORG ,rhvm43-01       
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 9:0:0:0 sdf 8:80 active ready running


2019-11-29 12:22:10,644+0100 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=periodic/2 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor object at 0x7fcd3df0ddd0> at 0x7fcd3df0de10> timeout=15, duration=15.00 at 0x7fcd8032a050> task#=189 at 0x7fcd3ceb71d0>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 186, in __call__
  self._func()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 481, in __call__
  stats = hostapi.get_stats(self._cif, self._samples.stats())
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 79, in get_stats
  ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 177, in _getHaInfo
  stats = instance.get_all_stats()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 94, in get_all_stats
  stats = broker.get_stats_from_storage()
File: "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 143, in get_stats_from_storage
  result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
  return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
  verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
  return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
  response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1128, in getresponse
  response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 453, in begin
  version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 409, in _read_status
  line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
  data = self._sock.recv(self._rbufsize) (executor:363)
2019-11-29 12:22:37,042+0100 WARN  (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask {'params': {u'connectionParams': [{u'port': u'3260', u'connection': u'192.168.102.1', u'iqn': u'iqn.2019-11.lab.storage:storage.target01', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'password': '********', u'id': u'61953b78-a9d9-435d-8e5e-278e843e5593'}, {u'port': u'3260', u'connection': u'192.168.102.2', u'iqn': u'iqn.2019-11.lab.storage:storage.target01', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'password': '********', u'id': u'ae160229-cd4c-4c1e-a597-a7461feb7a19'}], u'storagepoolID': u'f88bcf6c-f8ab-11e9-8632-00163e6ddf36', u'domainType': 3}, 'jsonrpc': '2.0', 'method': u'StoragePool.connectStorageServer', 'id': u'121a93ef-3616-4ede-b0b8-30154a406b49'} at 0x7fcd3c5da950> timeout=60, duration=60.01 at 0x7fcd3c5da390> task#=170 at 0x7fcd3ffd7a50>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod
  result = fn(*methodArgs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1102, in connectStorageServer
  connectionParams)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
  result = ctask.prepare(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
  return m(self, *a, **kw)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1179, in prepare
  result = self._run(func, *args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
  return fn(*args, **kargs)
File: "<string>", line 2, in connectStorageServer
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2417, in connectStorageServer
  conObj.connect()
File: "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 488, in connect
  iscsi.addIscsiNode(self._iface, self._target, self._cred)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
  iscsiadm.node_login(iface.name, target.address, target.iqn)
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 327, in node_login
  portal, "-l"])
File: "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 122, in _runCmd
  return misc.execCmd(cmd, printable=printCmd, sudo=True, sync=sync)
File: "/usr/lib/python2.7/site-packages/vdsm/common/commands.py", line 213, in execCmd
  (out, err) = p.communicate(data)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 924, in communicate
  stdout, stderr = self._communicate(input, endtime, timeout)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1706, in _communicate
  orig_timeout)
File: "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1779, in _communicate_with_poll
  ready = poller.poll(self._remaining_time(endtime)) (executor:363)

Comment 4 nijin ashok 2020-01-01 07:24:53 UTC

*** This bug has been marked as a duplicate of bug 1787192 ***