Created attachment 944160 [details] vdsm and hosted-engine-setup logs Description of problem: While deploying hosted-engine using iSCSI storage for the engine's VM disk to be created on, there is an option to login only to one target. Version-Release number of selected component (if applicable): RHEL7 RHEV 3.5 vt4 ovirt-hosted-engine-setup-1.2.0-1.el7.noarch How reproducible: Always Steps to Reproduce: 1. hosted-engine --deploy 2. pick iscsi Actual results: There is no option to pick more than one iSCSI target Please specify the target name (iqn.2008-05.com.xtremio:001e675b8ee0, iqn.2008-05.com.xtremio:001e675b8ee1, iqn.2008-05.com.xtremio:001e675ba170, iqn.2008-05.com.xtremio:001e675ba171) [iqn.2008-05.com.xtremio:001e675b8ee0]: I ended up haveing the host connected to only 1 target [root@green-vdsb ~]# iscsiadm -m session tcp: [1] 10.35.146.129:3260,1 iqn.2008-05.com.xtremio:001e675b8ee0 (non-flash) Expected results: There should be an option to connect to several iSCSI targets Additional info: vdsm and hosted-engine-setup logs
It already works if you manually configure multipathing before launching hosted-engine --deploy Example with 2 NIC on the host, 2 NIC the iSCSI host and a single portal on both the interface: On the host: # iscsiadm -m iface -I eth0 --op=new # iscsiadm -m iface -I eth1 --op=new # iscsiadm -m discovery -t st -p 192.168.1.125:3260 # iscsiadm -m discovery -t st -p 192.168.2.125:3260 # iscsiadm --mode node --portal 192.168.1.125:3260,1 --login # iscsiadm --mode node --portal 192.168.2.125:3260,1 --login Than hosted-engine --deploy reports: Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: iscsi Please specify the iSCSI portal IP address: 192.168.1.125 Please specify the iSCSI portal port [3260]: Please specify the iSCSI portal user: Please specify the target name (iqn.2015-03.com.redhat:simone1, iqn.2015-03.com.redhat:simone1) [iqn.2015-03.com.redhat:simone1]: The following luns have been found on the requested target: [1] 33000000017538b4b 56GiB FreeBSD iSCSI Disk status: used, paths: 4 active [2] 33000000031f26ca3 24GiB FreeBSD iSCSI Disk status: used, paths: 4 active [3] 33000000022a29f57 16GiB FreeBSD iSCSI Disk status: free, paths: 4 active [4] 330000000d0c91c54 1GiB FreeBSD iSCSI Disk status: free, paths: 4 active [5] 330000000c399efa0 1GiB FreeBSD iSCSI Disk status: free, paths: 4 active [6] 330000000e5380848 1GiB FreeBSD iSCSI Disk status: free, paths: 4 active Please select the destination LUN (1, 2, 3, 4, 5, 6) [1]: Is it enough?
*** This bug has been marked as a duplicate of bug 1193961 ***
Yaniv, this is not the same issue as 1193961. The issue reported here is that the host cannot open more than 1 session to the storage server, no matter how many iSCSI targets are exposed from the storage server. Bug 1193961 relates to the fact that iSCSI multipath bond cannot be configured while using hosted engine (this one http://www.ovirt.org/Feature/iSCSI-Multipath).
Need an answer to comment #1 before acknowledging this.
(In reply to Sandro Bonazzola from comment #4) > Need an answer to comment #1 before acknowledging this. Can you make it clear how can I help with this bug?
(In reply to Nir Soffer from comment #5) > (In reply to Sandro Bonazzola from comment #4) > > Need an answer to comment #1 before acknowledging this. > > Can you make it clear how can I help with this bug? I think Simone and you already discussed about this bug. Simone can you update here about what's to be done?
*** Bug 1267807 has been marked as a duplicate of this bug. ***
Tal, I think we'll need this to be fixed for DR since we would like to be able to hold multiple iSCSI connections in order to allow storage replication.
Works for me, this is what I've got during deployment (I've had two portals configured within the same portal group 1 and running over two NICs, exposing two LUNs, so I chose one of LUNs during deployment): Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: iscsi Please specify the iSCSI portal IP address: 10.35.72.52 Please specify the iSCSI portal port [3260]: Please specify the iSCSI portal user: The following targets have been found: [1] iqn.2005-10.org.freenas.ctl:shetarget TPGT: 1, portals: 10.35.72.52:3260 10.35.72.53:3260 [2] iqn.2005-10.org.freenas.ctl:she1target TPGT: 1, portals: 10.35.72.52:3260 10.35.72.53:3260 Please select a target (1, 2) [1]: 1 [ INFO ] Connecting to the storage server The following luns have been found on the requested target: [1] 36589cfc00000003afad4bbbc3e3a4465 100GiB FreeNAS iSCSI Disk status: free, paths: 2 active Please select the destination LUN (1) [1]: [ INFO ] Connecting to the storage server Storage Domain type : iscsi LUN ID : 36589cfc00000003afad4bbbc3e3a4465 Image size GB : 50 iSCSI Portal IP Address : 10.35.72.52,10.35.72.53 iSCSI Target Name : iqn.2005-10.org.freenas.ctl:shetarget iSCSI Portal port : 3260,3260 Host ID : 1 iSCSI Target Portal Group Tag : 1 iSCSI Portal user : This is what host see during deployment, after choosing one of the targets: # iscsiadm -m session tcp: [1] 10.35.72.52:3260,1 iqn.2005-10.org.freenas.ctl:shetarget (non-flash) tcp: [2] 10.35.72.53:3260,1 iqn.2005-10.org.freenas.ctl:shetarget (non-flash) This is what host see after deployment is completed and host was rebooted: # iscsiadm -m session tcp: [1] 10.35.72.52:3260,1 iqn.2005-10.org.freenas.ctl:shetarget (non-flash) tcp: [2] 10.35.72.53:3260,1 iqn.2005-10.org.freenas.ctl:shetarget (non-flash)
Please test: 1. Failover, ensure that the 2nd host also sees multiple paths. 2. Block one connection, see that the HE VM is still fine and working without failing over.
1-Migration works fine, both hosts see all three paths. 2-Killed one path from three on host with SHE-VM, using "iptables -A OUTPUT -p tcp --destination-port 3260 -d 10.35.163.33 -j DROP" and engine got unreachable, host printed as follows hosted-engine --vm-status: Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 213, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 110, in print_status all_host_stats = self._get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 75, in _get_all_host_stats all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 154, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 99, in get_all_stats stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 147, in get_stats_from_storage for host_id, data in six.iteritems(result): File "/usr/lib/python2.7/site-packages/six.py", line 599, in iteritems return d.iteritems(**kw) AttributeError: 'NoneType' object has no attribute 'iteritems' Both hosts still shown: # iscsiadm -m session tcp: [1] 10.35.163.33:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [2] 10.35.163.58:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [3] 10.35.160.161:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) Deployment additional info: Deployed SHE over iSCSI on pair of hosts and added two NFS data storage domains. Deployment details from first host: Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: iscsi Please specify the iSCSI portal IP address: 10.35.163.58 Please specify the iSCSI portal port [3260]: Please specify the iSCSI portal user: The following targets have been found: [1] iqn.2005-10.org.freenas.ctl:she_data1target TPGT: 1, portals: 10.35.163.33:3260 10.35.163.58:3260 10.35.160.161:3260 [2] iqn.2005-10.org.freenas.ctl:she_deploy1target TPGT: 1, portals: 10.35.163.33:3260 10.35.163.58:3260 10.35.160.161:3260 Please select a target (1, 2) [1]: 2 [ INFO ] Connecting to the storage server The following luns have been found on the requested target: [1] 36589cfc000000f65483248c3f59e11af 70GiB FreeNAS iSCSI Disk status: free, paths: 3 active Please select the destination LUN (1) [1]: Both hosts shown iscsiadm -m session: tcp: [1] 10.35.163.33:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [2] 10.35.163.58:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [3] 10.35.160.161:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) On iSCSI side I've published from one portal group with 3 interfaces, two LUNs.
What does multipath show?
After some time I've got SHE-VM in paused state and after some 10 minutes or so it turned back to "up". Engine was not migrated to first host. Please see bellow the details: --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04 Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "up", "detail": "paused"} Score : 3400 stopped : False Local maintenance : False crc32 : 40c605b1 local_conf_timestamp : 23473 Host timestamp : 23471 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=23471 (Wed Oct 25 20:36:23 2017) host-id=2 score=3400 vm_conf_refresh_time=23473 (Wed Oct 25 20:36:25 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 29e1ffa3 local_conf_timestamp : 24539 Host timestamp : 24536 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=24536 (Wed Oct 25 20:54:08 2017) host-id=2 score=3400 vm_conf_refresh_time=24539 (Wed Oct 25 20:54:11 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
(In reply to Yaniv Kaul from comment #20) > What does multipath show? It stays the same all the time and seems like not being updated at all: iscsiadm -m session tcp: [1] 10.35.163.33:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [2] 10.35.163.58:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) tcp: [3] 10.35.160.161:3260,1 iqn.2005-10.org.freenas.ctl:she_deploy1target (non-flash) I've opened another RFE https://bugzilla.redhat.com/show_bug.cgi?id=1506330 to cover this specific issue there.
(In reply to Yaniv Kaul from comment #20) > What does multipath show? So did it use multipath disk or not?
(In reply to Yaniv Kaul from comment #23) > (In reply to Yaniv Kaul from comment #20) > > What does multipath show? > > So did it use multipath disk or not? Checked again during more than one hour and here what I see: 1.It seems like its not working at all. I don't see that it shifts to another feasible path, also "hosted-engine --vm-status" command gets stack sometimes for more than several minutes (5+). I have three paths (10.35.163.33, 10.35.160.161 and 10.35.163.58), this is what I'm doing on both hosts: iptables -A OUTPUT -p tcp --destination-port 3260 -d 10.35.163.33 -j DROP iptables -A OUTPUT -p tcp --destination-port 3260 -d 10.35.160.161 -j DROP iptables -A OUTPUT -p tcp --destination-port 3260 -d 10.35.163.58 -j ACCEPT 2.If one of the hosts not blocking any of the iSCSI targets, then sometimes SHE-VM can get started on it, instead of failing over on initial host to alternative iSCSI path. 3.iscsiadm -m session is not indicating anything regarding blocked paths and its not being updated at all.
I'm looking for the output of 'multipath -ll' output.
(In reply to Yaniv Kaul from comment #26) > I'm looking for the output of 'multipath -ll' output. alma03 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running
(In reply to Nikolai Sednev from comment #27) > (In reply to Yaniv Kaul from comment #26) > > I'm looking for the output of 'multipath -ll' output. > > alma03 ~]# multipath -ll > 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk > size=70G features='0' hwhandler='0' wp=rw > |-+- policy='service-time 0' prio=0 status=enabled > | `- 6:0:0:0 sdb 8:16 failed faulty running > |-+- policy='service-time 0' prio=0 status=enabled > | `- 7:0:0:0 sdc 8:32 failed faulty running > `-+- policy='service-time 0' prio=0 status=enabled > `- 8:0:0:0 sdd 8:48 failed faulty running > > alma04 ~]# multipath -ll > 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk > size=70G features='0' hwhandler='0' wp=rw > |-+- policy='service-time 0' prio=0 status=enabled > | `- 6:0:0:0 sdb 8:16 failed faulty running > |-+- policy='service-time 0' prio=0 status=enabled > | `- 7:0:0:0 sdc 8:32 failed faulty running > `-+- policy='service-time 0' prio=0 status=enabled > `- 8:0:0:0 sdd 8:48 failed faulty running OK, so you have lost all paths. That explains the pause and everything. Any idea why all paths are down on both hosts?
(In reply to Yaniv Kaul from comment #28) > (In reply to Nikolai Sednev from comment #27) > > (In reply to Yaniv Kaul from comment #26) > > > I'm looking for the output of 'multipath -ll' output. > > > > alma03 ~]# multipath -ll > > 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk > > size=70G features='0' hwhandler='0' wp=rw > > |-+- policy='service-time 0' prio=0 status=enabled > > | `- 6:0:0:0 sdb 8:16 failed faulty running > > |-+- policy='service-time 0' prio=0 status=enabled > > | `- 7:0:0:0 sdc 8:32 failed faulty running > > `-+- policy='service-time 0' prio=0 status=enabled > > `- 8:0:0:0 sdd 8:48 failed faulty running > > > > alma04 ~]# multipath -ll > > 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk > > size=70G features='0' hwhandler='0' wp=rw > > |-+- policy='service-time 0' prio=0 status=enabled > > | `- 6:0:0:0 sdb 8:16 failed faulty running > > |-+- policy='service-time 0' prio=0 status=enabled > > | `- 7:0:0:0 sdc 8:32 failed faulty running > > `-+- policy='service-time 0' prio=0 status=enabled > > `- 8:0:0:0 sdd 8:48 failed faulty running > > OK, so you have lost all paths. That explains the pause and everything. Any > idea why all paths are down on both hosts? To summarize, everything is working just as expected. 1-Probably I've messed up with iptables configurations at the beginning, my bad, so I've tried to flush everything from iptables on both hosts and retested: alma04 ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere [root@alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=enabled | `- 6:0:0:0 sdb 8:16 active ready running |-+- policy='service-time 0' prio=1 status=active | `- 7:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 8:0:0:0 sdd 8:48 active ready running Hosted engine started its recovering: --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04 Host ID : 2 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "up", "detail": "paused"} Score : 3400 stopped : False Local maintenance : False crc32 : 0d276911 local_conf_timestamp : 72007 Host timestamp : 72004 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=72004 (Thu Oct 26 10:05:16 2017) host-id=2 score=3400 vm_conf_refresh_time=72007 (Thu Oct 26 10:05:19 2017) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False And engine got recovered from paused state after a really long time frame, more than 15 minutes. 2-I've tried again to block two paths from three at a time and observed the results, fail over worked nicely and engine remained up and running: 2.1: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 6:0:0:0 sdb 8:16 active ready running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running 2.2: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=1 status=active | `- 7:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running 2.3: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=1 status=active `- 8:0:0:0 sdd 8:48 active ready running 3-I've flushed iptables on both hosts and returned to the same test from section 2, to ensure that SHE-VM that is running on alma04, won't be migrated from alma04 to alma03, in case of one or two iSCSI paths from three available will fail on alma04. All three paths on alma03 were running during the test: alma03 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=enabled | `- 6:0:0:0 sdb 8:16 active ready running |-+- policy='service-time 0' prio=1 status=active | `- 7:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 8:0:0:0 sdd 8:48 active ready running 3.1: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=1 status=active | `- 6:0:0:0 sdb 8:16 active ready running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running 3.2: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=1 status=active | `- 7:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=0 status=enabled `- 8:0:0:0 sdd 8:48 failed faulty running 3.3: alma04 ~]# multipath -ll 36589cfc000000f65483248c3f59e11af dm-0 FreeNAS ,iSCSI Disk size=70G features='0' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 6:0:0:0 sdb 8:16 failed faulty running |-+- policy='service-time 0' prio=0 status=enabled | `- 7:0:0:0 sdc 8:32 failed faulty running `-+- policy='service-time 0' prio=1 status=active `- 8:0:0:0 sdd 8:48 active ready running SHE-VM was not migrated during the test.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.