Bug 1697994

Summary:	[RHOSP14] dataontap volume driver only provides single target_portal address when multiple are configured
Product:	Red Hat OpenStack	Reporter:	Pablo Caruana <pcaruana>
Component:	openstack-cinder	Assignee:	Pablo Caruana <pcaruana>
Status:	CLOSED ERRATA	QA Contact:	Tzach Shefi <tshefi>
Severity:	high	Docs Contact:	Tana <tberry>
Priority:	high
Version:	14.0 (Rocky)	CC:	aavraham, abishop, apevec, dasmith, dhill, eglynn, geguileo, gkumar, igallagh, jhakimra, jschluet, kchamart, knylande, lhh, lyarwood, msufiyan, pcaruana, pgrist, rheslop, sbauza, sgordon, shdunne, tenobreg, vromanso
Target Milestone:	z3	Keywords:	Triaged, ZStream
Target Release:	14.0 (Rocky)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-cinder-13.0.3-3.el7ost	Doc Type:	Bug Fix
Doc Text:	Previously, NetApp drivers would fail to attach a volume if the IP provided for discovery was not accessible from the host. The NetApp iSCSI drivers have been updated to return `target_iqns`, `target_portals`, and `target_luns` parameters when these options are available.	Story Points:	---
Clone Of:	1653051
Clones:	1697996 (view as bug list)		Environment:
Last Closed:	2019-07-02 19:43:59 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1653051, 1697996

Description Pablo Caruana 2019-04-09 12:11:20 UTC

NetApp iSCSI drivers use discovery mode for multipathing, which means
that we'll fail to attach a volume if the IP provided for the discovery
is not accessible from the host.

Something similar would happen when using singlepath, as we are only
trying to connect to one target/portal.

This bugzilla is for backporting to  RHOSP 14 platform this particular patch changes NetApp drivers so the return target_iqns, target_portals, and target_luns parameters whenever there are more than
one option.

Comment 10 Tzach Shefi 2019-06-17 12:11:09 UTC

Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch
I'd hit a problem see at the bottom. 


On a netapp iscsi backend, I'd created and attached a volume to an instance. 
On compute nodes we see the basic flow. 

3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua'
 wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 6:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 7:0:0:0 sdb 8:16 active ready running


[root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.   (x.y represent internal IPs..) 
        Current Portal: X.Y.16.12:3260,1042
        Persistent Portal: X.Y.16.12:3260,1042
        Current Portal: X.Y.16.11:3260,1041
        Persistent Portal: X.Y.16.11:3260,1041


Now let's detach the volume and fail one of the paths, via FW block rule.
(overcloud) [stack@undercloud-0 ~]$ nova volume-detach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0

We see no active connection at this state:
[root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16
iscsiadm: No active sessions.

Now lets add an iptables drop rule on compute node:
sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode random --probability 1 -j DROP

Now retry to attach volume again


(overcloud) [stack@undercloud-0 ~]$ nova volume-attach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
| serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 |
| volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
+----------+--------------------------------------+

And we see only one connection vi ip x.y.16.12
[root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16       Current Portal: 10.46.16.12:3260,1042
        Persistent Portal: 10.46.16.12:3260,1042

[root@compute-0 ~]#  multipath -ll -v2                                  
3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode              
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active                      
  `- 8:0:0:0 sda 8:0 active ready running 

I detached the volume again, now blocked x.y.16.11 but now I fail to attach the volume. 
 
First of all would my verification steps be suited to verify the bz? 
If not what/how else should I verify? 

What do we do with the fact that I fail to attach volume once I block x.y.16.11?
I'm not sure about the terms but wouldn't this fit the error the bz fixes?
How do I know if x.y.16.11 is the discovery provided ip?

Comment 11 Pablo Caruana 2019-06-17 12:58:58 UTC

(In reply to Tzach Shefi from comment #10)
> Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch
> I'd hit a problem see at the bottom. 
> 
> 
> On a netapp iscsi backend, I'd created and attached a volume to an instance. 
> On compute nodes we see the basic flow. 
> 
> 3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode
> size=2.0G features='4 queue_if_no_path pg_init_retries 50
> retain_attached_hw_handle' hwhandler='1 alua'
>  wp=rw
> |-+- policy='service-time 0' prio=50 status=active
> | `- 6:0:0:0 sda 8:0  active ready running
> `-+- policy='service-time 0' prio=10 status=enabled
>   `- 7:0:0:0 sdb 8:16 active ready running
> 
> 
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.   (x.y represent
> internal IPs..) 
>         Current Portal: X.Y.16.12:3260,1042
>         Persistent Portal: X.Y.16.12:3260,1042
>         Current Portal: X.Y.16.11:3260,1041
>         Persistent Portal: X.Y.16.11:3260,1041
> 
> 
> Now let's detach the volume and fail one of the paths, via FW block rule.
> (overcloud) [stack@undercloud-0 ~]$ nova volume-detach
> 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0
> 
> We see no active connection at this state:
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16
> iscsiadm: No active sessions.
> 
> Now lets add an iptables drop rule on compute node:
> sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode
> random --probability 1 -j DROP
> 
> Now retry to attach volume again
> 
> 
> (overcloud) [stack@undercloud-0 ~]$ nova volume-attach
> 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0
> auto
> +----------+--------------------------------------+
> | Property | Value                                |
> +----------+--------------------------------------+
> | device   | /dev/vdb                             |
> | id       | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
> | serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 |
> | volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
> +----------+--------------------------------------+
> 
> And we see only one connection vi ip x.y.16.12
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16       Current Portal:
> 10.46.16.12:3260,1042
>         Persistent Portal: 10.46.16.12:3260,1042
> 
> [root@compute-0 ~]#  multipath -ll -v2                                  
> 3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode              
> size=2.0G features='4 queue_if_no_path pg_init_retries 50
> retain_attached_hw_handle' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=50 status=active                      
>   `- 8:0:0:0 sda 8:0 active ready running 
> 
> I detached the volume again, now blocked x.y.16.11 but now I fail to attach
> the volume. 
>  
> First of all would my verification steps be suited to verify the bz? 
> If not what/how else should I verify? 
> 
> What do we do with the fact that I fail to attach volume once I block
> x.y.16.11?
> I'm not sure about the terms but wouldn't this fit the error the bz fixes?
> How do I know if x.y.16.11 is the discovery provided ip?

Test looks valid, assuming no other setup leftovers, most important element is confirming that addresses, ports pairs are passed correctly. Anyway the best thing to understand better what is going is reproducing it with both nova and cinder components in debug to track the volume request down both a compute/controller side, ideally with any associated trace call.

Comment 12 Tzach Shefi 2019-06-18 08:31:50 UTC

Verified on: 
openstack-cinder-13.0.5-2.el7ost.noarch

Created a iscis volume, attached to instance we see both paths.

[root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.  
> internal IPs..) 
>         Current Portal: X.Y.16.12:3260,1042
>         Persistent Portal: X.Y.16.12:3260,1042
>         Current Portal: X.Y.16.11:3260,1041
>         Persistent Portal: X.Y.16.11:3260,1041


Per the second issue I'd hit at the end of comment 11, it's a known nova multipath bug.
For the this bz verification we can ignore it.

Comment 16 errata-xmlrpc 2019-07-02 19:43:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1678