Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1446449

Summary: tcmu-runner: reports remote ports incorrectly which leads to windows/esx failing to failover
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mike Christie <mchristi>
Component: iSCSIAssignee: Mike Christie <mchristi>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.3CC: bniver, ceph-eng-bugs, ceph-qe-bugs, jdillama, tchandra, uboppana
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:33:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Christie 2017-04-28 04:05:26 UTC
Description of problem:

tcmu-runner currently reports bogus remote port information when configured to return local and remote ports. At the iSCSI level we do support returning the remote iscsi targets and ports, but at the SCSI level we do not support returning the remote ALUA port states and are returning bogus state info.

ESX and windows require this information for failover/failback. Linux does not have this problem because it sends a RTPG command to each path where ESX/windows send it to one path.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jason Dillaman 2017-09-14 17:37:54 UTC
@Mike: is this an issue in tcmu-runner, the LIO module, or both?

Comment 4 Mike Christie 2017-09-14 20:30:35 UTC
(In reply to Jason Dillaman from comment #3)
> @Mike: is this an issue in tcmu-runner, the LIO module, or both?

Both, but we have limited our configuration to only support one enabled tpg with one iscsi portal per target (and then N disabled tpgs for each remote node), so we only needed tcmu-runner changes in 3.0.

Comment 5 Jason Dillaman 2017-09-14 20:59:59 UTC
Is this something I could help out with? It seems like the solution for port states and persistent group reservations is a similar problem. Of course, if the kernel will eventually just use dlm for PGRs, I'd imagine it would make sense for it to share ALUA states as well.

Comment 6 Mike Christie 2017-09-14 21:01:28 UTC
Sorry. It's already done. Just forgot to change the state.

Comment 7 Mike Christie 2017-09-14 21:03:17 UTC
I set it to modified, but the tcmu-runner QE is testing with already has the changes so  technically it is already in ON_QA. Was not sure if Tejas has a special process for it or not.

Comment 8 Jason Dillaman 2017-09-14 21:04:24 UTC
Great -- I'll add it to the errata.

Comment 10 Tejas 2017-09-15 03:03:11 UTC
Its good to learn that we already have the failover failback  code in tcmu-runner.

Comment 15 errata-xmlrpc 2017-12-05 23:33:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387