Bug 1735269 - cannot mount netapp-backed iscsi volumes after loss of path
Summary: cannot mount netapp-backed iscsi volumes after loss of path
Keywords:
Status: CLOSED DUPLICATE of bug 1697996
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Cinder Bugs List
QA Contact: Tzach Shefi
Chuck Copello
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-31 19:22 UTC by Andrew Mercer
Modified: 2019-10-03 11:45 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-03 11:45:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrew Mercer 2019-07-31 19:22:33 UTC
Description of problem:

After power outage and subsequent loss of path, volumes on netapp-backed storage devices are unable to be mounted while vnx-backed volumes can be mounted. 


Version-Release number of selected component (if applicable):


How reproducible:

each time following steps below


Steps to Reproduce:
1. set VLAN ID to inaccessible VLAN on VNX
2. disable one iSCSI interface on NetApp
3. stop vnx and netapp backed instances
4. start vnx and netapp backed instances

Actual results:

vnx backed instances start, netapp backed instances do not


Expected results:

both vnx and netapp backed volumes mount


Additional info:

See these in containers/nova log:
- Successfully reverted task state from powering-on on failure for instance
- Exception during message handling: TargetPortalNotFound: Unable to find target portal


additional info from case or customer available upon request

Comment 2 Alan Bishop 2019-08-02 15:19:54 UTC
I see an sosreport for the compute node, but we also need a fresh sosreport for the controller that's running the cinder-volume service. Also, if debug logs were not enabled, then please enable debug and capture a fresh trace of the problem.

Comment 5 Andrew Mercer 2019-08-14 18:41:34 UTC
Hello,

After a collaboration with sbr-storage they looked and mentioned [1] that this may be an issue with the cinder netapp driver. Our most recent test comprised of us changing the node.session.timeo.replacement_timeout parameter from 120 to 5 as per the netapp documents which storage also was able to verify in the logs, but this does not appear to have solved the issue.

 [1] "It looks an issue w/ the driver that if it can't find the portal it fails the instance"

Comment 10 Alan Bishop 2019-08-20 13:23:18 UTC
Marking as triaged, although root problem is still under investigation.

Comment 11 Pablo Caruana 2019-10-02 15:24:56 UTC
From logs provided and description trough customer portal investigation this sounds related to another bugzilla[1] closed last September with an Errata[2]. Details at Changelog[3]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1697996  "NetApp: Return all iSCSI targets-portals"
[2] https://access.redhat.com/errata/RHBA-2019:1732
[3] https://access.redhat.com/downloads/content/openstack-cinder/12.0.7-2.el7ost/noarch/fd431d51/package-changelog


Extracting from "Changes to the openstack-cinder component:
<...>
Previously, you could not attach a volume when the provided discovery IP was inaccessible from the host if NetApp iSCSI drivers used discovery mode for multipathing
<...>"

As test were performed under a lab environment, it would be a good idea repeating the test after updating container images using openstack-cinder-12.0.6-5.el7ost and later ones.

Comment 12 Pablo Caruana 2019-10-03 11:45:14 UTC

*** This bug has been marked as a duplicate of bug 1697996 ***


Note You need to log in before you can comment on or make changes to this bug.