1697994 – [RHOSP14] dataontap volume driver only provides single target_portal address when multiple are configured

Bug 1697994 - [RHOSP14] dataontap volume driver only provides single target_portal address when multiple are configured

Summary: [RHOSP14] dataontap volume driver only provides single target_portal address ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	14.0 (Rocky)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z3
Target Release:	14.0 (Rocky)
Assignee:	Pablo Caruana
QA Contact:	Tzach Shefi
Docs Contact:	Tana
URL:
Whiteboard:
Depends On:
Blocks:	1653051 1697996
TreeView+	depends on / blocked

Reported:	2019-04-09 12:11 UTC by Pablo Caruana
Modified:	2019-09-09 15:56 UTC (History)
CC List:	24 users (show)
Fixed In Version:	openstack-cinder-13.0.3-3.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, NetApp drivers would fail to attach a volume if the IP provided for discovery was not accessible from the host. The NetApp iSCSI drivers have been updated to return `target_iqns`, `target_portals`, and `target_luns` parameters when these options are available.
Clone Of:	1653051
Clones:	1697996 (view as bug list)
Environment:
Last Closed:	2019-07-02 19:43:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	651183	0	None	None	None	2019-04-09 12:11:19 UTC
Red Hat Product Errata	RHBA-2019:1678	0	None	None	None	2019-07-02 19:44:14 UTC

Description Pablo Caruana 2019-04-09 12:11:20 UTC

NetApp iSCSI drivers use discovery mode for multipathing, which means
that we'll fail to attach a volume if the IP provided for the discovery
is not accessible from the host.

Something similar would happen when using singlepath, as we are only
trying to connect to one target/portal.

This bugzilla is for backporting to  RHOSP 14 platform this particular patch changes NetApp drivers so the return target_iqns, target_portals, and target_luns parameters whenever there are more than
one option.

Comment 10 Tzach Shefi 2019-06-17 12:11:09 UTC

Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch
I'd hit a problem see at the bottom. 


On a netapp iscsi backend, I'd created and attached a volume to an instance. 
On compute nodes we see the basic flow. 

3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua'
 wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 6:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 7:0:0:0 sdb 8:16 active ready running


[root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.   (x.y represent internal IPs..) 
        Current Portal: X.Y.16.12:3260,1042
        Persistent Portal: X.Y.16.12:3260,1042
        Current Portal: X.Y.16.11:3260,1041
        Persistent Portal: X.Y.16.11:3260,1041


Now let's detach the volume and fail one of the paths, via FW block rule.
(overcloud) [stack@undercloud-0 ~]$ nova volume-detach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0

We see no active connection at this state:
[root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16
iscsiadm: No active sessions.

Now lets add an iptables drop rule on compute node:
sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode random --probability 1 -j DROP

Now retry to attach volume again


(overcloud) [stack@undercloud-0 ~]$ nova volume-attach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
| serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 |
| volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
+----------+--------------------------------------+

And we see only one connection vi ip x.y.16.12
[root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16       Current Portal: 10.46.16.12:3260,1042
        Persistent Portal: 10.46.16.12:3260,1042

[root@compute-0 ~]#  multipath -ll -v2                                  
3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode              
size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active                      
  `- 8:0:0:0 sda 8:0 active ready running 

I detached the volume again, now blocked x.y.16.11 but now I fail to attach the volume. 
 
First of all would my verification steps be suited to verify the bz? 
If not what/how else should I verify? 

What do we do with the fact that I fail to attach volume once I block x.y.16.11?
I'm not sure about the terms but wouldn't this fit the error the bz fixes?
How do I know if x.y.16.11 is the discovery provided ip?

Comment 11 Pablo Caruana 2019-06-17 12:58:58 UTC

(In reply to Tzach Shefi from comment #10)
> Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch
> I'd hit a problem see at the bottom. 
> 
> 
> On a netapp iscsi backend, I'd created and attached a volume to an instance. 
> On compute nodes we see the basic flow. 
> 
> 3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode
> size=2.0G features='4 queue_if_no_path pg_init_retries 50
> retain_attached_hw_handle' hwhandler='1 alua'
>  wp=rw
> |-+- policy='service-time 0' prio=50 status=active
> | `- 6:0:0:0 sda 8:0  active ready running
> `-+- policy='service-time 0' prio=10 status=enabled
>   `- 7:0:0:0 sdb 8:16 active ready running
> 
> 
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.   (x.y represent
> internal IPs..) 
>         Current Portal: X.Y.16.12:3260,1042
>         Persistent Portal: X.Y.16.12:3260,1042
>         Current Portal: X.Y.16.11:3260,1041
>         Persistent Portal: X.Y.16.11:3260,1041
> 
> 
> Now let's detach the volume and fail one of the paths, via FW block rule.
> (overcloud) [stack@undercloud-0 ~]$ nova volume-detach
> 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0
> 
> We see no active connection at this state:
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16
> iscsiadm: No active sessions.
> 
> Now lets add an iptables drop rule on compute node:
> sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode
> random --probability 1 -j DROP
> 
> Now retry to attach volume again
> 
> 
> (overcloud) [stack@undercloud-0 ~]$ nova volume-attach
> 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0
> auto
> +----------+--------------------------------------+
> | Property | Value                                |
> +----------+--------------------------------------+
> | device   | /dev/vdb                             |
> | id       | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
> | serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 |
> | volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 |
> +----------+--------------------------------------+
> 
> And we see only one connection vi ip x.y.16.12
> [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16       Current Portal:
> 10.46.16.12:3260,1042
>         Persistent Portal: 10.46.16.12:3260,1042
> 
> [root@compute-0 ~]#  multipath -ll -v2                                  
> 3600a098038304479363f4c4870453456 dm-0 NETAPP  ,LUN C-Mode              
> size=2.0G features='4 queue_if_no_path pg_init_retries 50
> retain_attached_hw_handle' hwhandler='1 alua' wp=rw
> `-+- policy='service-time 0' prio=50 status=active                      
>   `- 8:0:0:0 sda 8:0 active ready running 
> 
> I detached the volume again, now blocked x.y.16.11 but now I fail to attach
> the volume. 
>  
> First of all would my verification steps be suited to verify the bz? 
> If not what/how else should I verify? 
> 
> What do we do with the fact that I fail to attach volume once I block
> x.y.16.11?
> I'm not sure about the terms but wouldn't this fit the error the bz fixes?
> How do I know if x.y.16.11 is the discovery provided ip?

Test looks valid, assuming no other setup leftovers, most important element is confirming that addresses, ports pairs are passed correctly. Anyway the best thing to understand better what is going is reproducing it with both nova and cinder components in debug to track the volume request down both a compute/controller side, ideally with any associated trace call.

Comment 12 Tzach Shefi 2019-06-18 08:31:50 UTC

Verified on: 
openstack-cinder-13.0.5-2.el7ost.noarch

Created a iscis volume, attached to instance we see both paths.

[root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y.  
> internal IPs..) 
>         Current Portal: X.Y.16.12:3260,1042
>         Persistent Portal: X.Y.16.12:3260,1042
>         Current Portal: X.Y.16.11:3260,1041
>         Persistent Portal: X.Y.16.11:3260,1041


Per the second issue I'd hit at the end of comment 11, it's a known nova multipath bug.
For the this bz verification we can ignore it.

Comment 16 errata-xmlrpc 2019-07-02 19:43:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1678

Note You need to log in before you can comment on or make changes to this bug.

aavraham
abishop
apevec
dasmith
dhill
eglynn
geguileo
gkumar
igallagh
jhakimra
jschluet
kchamart
knylande
lhh
lyarwood
msufiyan
pcaruana
pgrist
rheslop
sbauza
sgordon
shdunne
tenobreg
vromanso