Description of problem: Active-Active DR by configuring a stretched cluster will not work for iSCSI based storage domains. The Active-Active works with the concept of a writable storage server in both sites which will be replicated in real time. So there will be two iSCSI targets in both sites. If we are having two sites, site A and site B who is having SAN A and SAN B respectively, then The VMs running in the hosts in site A will be writing to SAN A and VMs running in the hosts in site B will be writing to SAN B and will be synchronized in real time between the SAN using your storage technology. Currently, the storage connections can only be defined at the storage level and not the host level. So if we define a connection for a storage domain, every host will try to connect to the defined connections. The SAN A and B can have different target IP address. The hosts in the site A will not be having connectivity to SAN B and hosts in the site B will not be having connectivity to SAN A. In this case, if we add both the connections to the storage domain, then all the hosts (from both SAN A and SAN B) will try to discover and login to both iscsi targets. Hence ConnectStorageServerVDSCommand will timeout since one of the targets out of 2 will not be accessible from both the sites. Currently, we have "storage_connection_extensions" where we can define host based authentication. However, this is limited to CHAP authentication and cannot define host based iSCSI target IPs. We may need to extend this functionality to include target IP as well. It would take 2 minutes for a single iSCSI target login to timeout. === 2018-04-12 11:34:47,708+0530 DEBUG (jsonrpc/3) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b -I default -p 10.65.177.71:3260,1 -l (cwd None) (commands:69 2018-04-12 11:36:47,857+0530 DEBUG (jsonrpc/3) [storage.Misc.excCmd] FAILED: <err> = 'iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.linux-iscsi.storage.x8664:sn.52fd3f4b2f0b, portal: 10.65.177.71,3260].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals\n'; <rc> = 8 (commands:93) === Version-Release number of selected component (if applicable): rhevm-4.1.10 How reproducible: 100% Steps to Reproduce: Please refer above. Actual results: Active-Active DR with iSCSI based storga Expected results: Additional info:
I think BZ 1451018 refers to the same issue
Did the customer setup a virtual "floating" IP to eliminate the need for two different IPs? They should resolve to the correct target in each of the sites. This logic is best to keep outside RHV to ensure correct failover.
This bug is not marked as blocker and we entered blocker only phase for 4.2.4. Please consider re-targeting to 4.2.5.
Closing the bug since the use case describeed here is not supported by oVirt (comment 7 - Each site which is mapped only to the respective storage (non-uniform access) and not cross-connect between the site Nijin, there is still the usecase which we need to verify whether it does reproduce where we end up with no SPM when there is a partial path connection to an iSCSI. If that does reproduce please open a seperate bug.
BZ<2>Jira Resync