Bug 1566342 - Active-Active DR not working with iSCSI storage domain
Summary: Active-Active DR not working with iSCSI storage domain
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.10
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.2.4
: ---
Assignee: Maor
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-12 06:24 UTC by nijin ashok
Modified: 2021-06-10 15:58 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-17 13:38:43 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1451018 0 unspecified CLOSED [DR] [RFE] Add backup storage connection for storage domain 2021-02-22 00:41:40 UTC

Internal Links: 1451018

Description nijin ashok 2018-04-12 06:24:57 UTC
Description of problem:

Active-Active DR by configuring a stretched cluster will not work for iSCSI based storage domains. The Active-Active works with the concept of a writable storage server in both sites which will be replicated in real time. So there will be two iSCSI targets in both sites.

If we are having two sites, site A and site B who is having SAN A and SAN B respectively, then The VMs running in the hosts in site A will be writing to SAN A and VMs running in the hosts in site B will be writing to SAN B and will be synchronized in real time between the SAN using your storage technology.

Currently, the storage connections can only be defined at the storage level and not the host level. So if we define a connection for a storage domain, every host will try to connect to the defined connections.

The SAN A and B can have different target IP address. The hosts in the site A will not be having connectivity to SAN B and hosts in the site B will not be having connectivity to SAN A. In this case, if we add both the connections to the storage domain, then all the hosts (from both SAN A and SAN B) will try to discover and login to both iscsi targets. Hence  ConnectStorageServerVDSCommand will timeout since one of the targets out of 2 will not be accessible from both the sites.

Currently, we have "storage_connection_extensions" where we can define host based authentication. However, this is limited to  CHAP authentication and cannot define host based iSCSI target IPs. We may need to extend this functionality to include target IP as well.

It would take 2 minutes for a single iSCSI target login to timeout.

===
2018-04-12 11:34:47,708+0530 DEBUG (jsonrpc/3) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b -I default -p 10.65.177.71:3260,1 -l (cwd None) (commands:69
2018-04-12 11:36:47,857+0530 DEBUG (jsonrpc/3) [storage.Misc.excCmd] FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b, portal: 10.65.177.71,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:93)
===

Version-Release number of selected component (if applicable):

rhevm-4.1.10

How reproducible:

100%

Steps to Reproduce:

Please refer above.

Actual results:

Active-Active DR with iSCSI based storga

Expected results:


Additional info:

Comment 1 Elad 2018-04-12 12:25:53 UTC
I think BZ 1451018 refers to the same issue

Comment 2 Yaniv Lavi 2018-04-15 08:12:00 UTC
Did the customer setup a virtual "floating" IP to eliminate the need for two different IPs? They should resolve to the correct target in each of the sites.

This logic is best to keep outside RHV to ensure correct failover.

Comment 9 Sandro Bonazzola 2018-06-15 12:17:06 UTC
This bug is not marked as blocker and we entered blocker only phase for 4.2.4.
Please consider re-targeting to 4.2.5.

Comment 10 Maor 2018-06-17 13:38:43 UTC
Closing the bug since the use case describeed here is not supported by oVirt (comment 7 - Each site which is mapped only to the respective storage (non-uniform access) and not cross-connect between the site

Nijin, there is still the usecase which we need to verify whether it does reproduce where we end up with no SPM when there is a partial path connection to an iSCSI.
If that does reproduce please open a seperate bug.

Comment 11 Franta Kust 2019-05-16 13:03:34 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.