Bug 1909888 - [RFE] Support multiple IQN in hosted-engine.conf for Active-Active DR setup
Summary: [RFE] Support multiple IQN in hosted-engine.conf for Active-Active DR setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 4.4.3
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ovirt-4.4.6-1
: 4.4.6
Assignee: Yedidyah Bar David
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-21 22:45 UTC by Germano Veit Michel
Modified: 2021-06-03 10:25 UTC (History)
12 users (show)

Fixed In Version: ovirt-hosted-engine-ha-2.4.7
Doc Type: Enhancement
Doc Text:
With this release, ovirt-hosted-engine-ha supports multiple, comma-separated values for all iSCSI configuration items.
Clone Of:
Environment:
Last Closed: 2021-06-03 10:24:29 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2239 0 None None None 2021-06-03 10:25:14 UTC
oVirt gerrit 114500 0 master MERGED iscsi: multipath: Support also multiple IQNs 2021-05-12 07:04:18 UTC

Description Germano Veit Michel 2020-12-21 22:45:23 UTC
Description of problem:

Scenario:
- Purestorage FlashArray X20's with ActiveCluster
- 2 sites, both active, using iSCSI connections
- Fully replicated storage to both sites, active active
- All hosts connect to both SANs on both sites
- SAN IPs are different on both sites
- SAN IQNs are different on both sites (cannot be changed to the same IQN)

The storage connections on the engine side are configurable with different IQNs and different IPs for both sites, so the connectStorageServer sent by the engine is fine and the engine tells each host to connect to both SANs using the proper parameters (correct IPs and IQNs)

However, in hosted-engine.conf only the IPs and Ports are customizable with comma separated values for each storageconnection, the IQNs are not. This is a problem since hosted-engine.conf should also contain all the storage connections to guarantee correct initialization in case of power outage or host reboot without HostedEngine running. What happens is that half of the connections on hosted-engine.conf would fail as they use the incorrect IQN.

Basically what is needed is a comma separated list of iqn, like what already exists for ip and port here [1].

[1] https://github.com/oVirt/ovirt-hosted-engine-ha/blob/master/ovirt_hosted_engine_ha/lib/storage_server.py#L248 

The customer confirmed with Pure storage that the SANs on each site cannot use the same IQN.

Note this is different to BZ1566342, as here all hosts connect to both sites, full cross connectivity.

Comment 1 Martin Tessun 2021-01-28 08:31:25 UTC
So having a proper iSCSI setup would already be HA, as:

1. The Ethernet is bonded (2+ cards share the same IP)
2. In case of Active-Passive the Storage Failover should also failover the IP for the Storage

That said, I understand that we want to have multiple IQNs in case of a "cheap" Storage but to me this is a low priority feature that might not make it into the product.

Comment 2 Andrew Simmonds 2021-01-28 08:55:20 UTC
Hi Martin,

I don't think you completely understand the setup. This is a textbook implementation of Pure Storage's FlashArray ActiveCluster product that spans two data centers. It is definitely not a "cheap" setup as you mention.

Here is a solution overview PDF of ActiveCluster:
https://support.purestorage.com/@api/deki/files/8434/PureStorage_ActiveCluster_Paper_-_1220.pdf?revision=10

We have also followed the documentation here for Active/Active RHV.
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/disaster_recovery_guide/active_active

It allows us to have an RPO of Zero across 2 independent locations. Plus allows a full failure of a storage array at either site.

Their are 8 independent iscsi paths to the 2 storage arrays (4 paths per array / 2 paths per controller). 

Having the independent IQN's implemented in RHV allows the reliable bootstrapping of hosts no matter the location of the hosts (if there were to be a disaster event)

Comment 3 Martin Tessun 2021-02-04 08:17:04 UTC
Hi Andrew,

thanks for the clarification. So this is an active-active DR solution with not having IP failover as it is a DR scenario.
We will have a look if and how to add this.

Thanks!

Comment 6 Martin Tessun 2021-03-18 08:19:08 UTC
Closing for now as it is very unlikely to get fixed in RHV 4.4 as it only affects the Hosted Engine and we have DR scenarios for solving this.
As another possible workaround even the "reachable" IQN can be added/changed in the SHE config.
The overall design and implementation effort for solving this is not justified by the usecase.

Comment 9 Yedidyah Bar David 2021-05-12 07:07:10 UTC
Copying the commit message from the linked patch:

iscsi: multipath: Support also multiple IQNs

HA (but not setup) code already supported configuring more than one IP
address and port for accessing iSCSI storage, by setting the respective
conf items to a comma-separated list of multiple values.

Extend this to support also mutiple iSCSI IQNs, as requested in the
linked bug. For completeness, also support multiple tgpt, user and
password values. For backwards compatibility, the new behavior takes
effect only if the IQN contains a comma. If it does not, the old
behavior is retained - including supporting multiple IP addresses and
ports. If IQN does include a comma, all other items must also include
multiple values, comma separated, and the number of values must be the
same for all items. E.g.:

storage=10.0.0.1,10.0.0.2
iqn=iqn1,iqn2
portal=tpgt1,tpgt2
user=user1,user2
password=pass1,pass2
port=3260,3260

will make HA use two sets of values - first set comprised of the first
value for each item, and the second set with the second values. Please
note e.g. that you must include e.g. 'port=3260,3260' and not just
'port=3260'. Also note that the password must not include commas, or
the code will get the passwords wrong.

Comment 24 Nikolai Sednev 2021-05-26 14:16:05 UTC
Manual HE deployment over iSCSI is working fine, multi-path is intact. 
Tested on host with following components:
ovirt-hosted-engine-ha-2.4.7-1.el8ev.noarch
ovirt-hosted-engine-setup-2.5.0-2.el8ev.noarch
rhvm-appliance-4.4-20210402.1.el8ev.x86_64
Linux 4.18.0-305.3.1.el8_4.x86_64 #1 SMP Mon May 17 10:08:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.4 (Ootpa)

Engine base rhvm-appliance-4.4-20210402.1.el8ev.x86_64 was upgraded during "hosted-engine --deploy --ansible-extra-vars=he_pause_host=true" to these components:
ovirt-engine-setup-4.4.6.8-0.1.el8ev.noarch
Linux 4.18.0-305.3.1.el8_4.x86_64 #1 SMP Mon May 17 10:08:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.4 (Ootpa)

Based on my test results and results from https://bugzilla.redhat.com/show_bug.cgi?id=1909888#c17, from the QA side I can't do more than this at the moment, so moving to verified for now, please reopen if it still doesn't work for you for some reason.

Comment 38 errata-xmlrpc 2021-06-03 10:24:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Virtualization Host security update [ovirt-4.4.6]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2239


Note You need to log in before you can comment on or make changes to this bug.