NetApp iSCSI drivers use discovery mode for multipathing, which means that we'll fail to attach a volume if the IP provided for the discovery is not accessible from the host. Something similar would happen when using singlepath, as we are only trying to connect to one target/portal. This bugzilla is for backporting to RHOSP 14 platform this particular patch changes NetApp drivers so the return target_iqns, target_portals, and target_luns parameters whenever there are more than one option.
Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch I'd hit a problem see at the bottom. On a netapp iscsi backend, I'd created and attached a volume to an instance. On compute nodes we see the basic flow. 3600a098038304479363f4c4870453456 dm-0 NETAPP ,LUN C-Mode size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 6:0:0:0 sda 8:0 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 7:0:0:0 sdb 8:16 active ready running [root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y. (x.y represent internal IPs..) Current Portal: X.Y.16.12:3260,1042 Persistent Portal: X.Y.16.12:3260,1042 Current Portal: X.Y.16.11:3260,1041 Persistent Portal: X.Y.16.11:3260,1041 Now let's detach the volume and fail one of the paths, via FW block rule. (overcloud) [stack@undercloud-0 ~]$ nova volume-detach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 We see no active connection at this state: [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16 iscsiadm: No active sessions. Now lets add an iptables drop rule on compute node: sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode random --probability 1 -j DROP Now retry to attach volume again (overcloud) [stack@undercloud-0 ~]$ nova volume-attach 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 auto +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/vdb | | id | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 | | serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 | | volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 | +----------+--------------------------------------+ And we see only one connection vi ip x.y.16.12 [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16 Current Portal: 10.46.16.12:3260,1042 Persistent Portal: 10.46.16.12:3260,1042 [root@compute-0 ~]# multipath -ll -v2 3600a098038304479363f4c4870453456 dm-0 NETAPP ,LUN C-Mode size=2.0G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active `- 8:0:0:0 sda 8:0 active ready running I detached the volume again, now blocked x.y.16.11 but now I fail to attach the volume. First of all would my verification steps be suited to verify the bz? If not what/how else should I verify? What do we do with the fact that I fail to attach volume once I block x.y.16.11? I'm not sure about the terms but wouldn't this fit the error the bz fixes? How do I know if x.y.16.11 is the discovery provided ip?
(In reply to Tzach Shefi from comment #10) > Pablo while testing this on: openstack-cinder-13.0.5-2.el7ost.noarch > I'd hit a problem see at the bottom. > > > On a netapp iscsi backend, I'd created and attached a volume to an instance. > On compute nodes we see the basic flow. > > 3600a098038304479363f4c4870453456 dm-0 NETAPP ,LUN C-Mode > size=2.0G features='4 queue_if_no_path pg_init_retries 50 > retain_attached_hw_handle' hwhandler='1 alua' > wp=rw > |-+- policy='service-time 0' prio=50 status=active > | `- 6:0:0:0 sda 8:0 active ready running > `-+- policy='service-time 0' prio=10 status=enabled > `- 7:0:0:0 sdb 8:16 active ready running > > > [root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y. (x.y represent > internal IPs..) > Current Portal: X.Y.16.12:3260,1042 > Persistent Portal: X.Y.16.12:3260,1042 > Current Portal: X.Y.16.11:3260,1041 > Persistent Portal: X.Y.16.11:3260,1041 > > > Now let's detach the volume and fail one of the paths, via FW block rule. > (overcloud) [stack@undercloud-0 ~]$ nova volume-detach > 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 > > We see no active connection at this state: > [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16 > iscsiadm: No active sessions. > > Now lets add an iptables drop rule on compute node: > sudo iptables -s X.Y.16.11 -p tcp --sport 3260 -I INPUT -m statistic --mode > random --probability 1 -j DROP > > Now retry to attach volume again > > > (overcloud) [stack@undercloud-0 ~]$ nova volume-attach > 02244b34-bc4e-4fcb-8325-058d5b84bcb2 7f21a2b4-2313-496f-9e3b-8c29963eeba0 > auto > +----------+--------------------------------------+ > | Property | Value | > +----------+--------------------------------------+ > | device | /dev/vdb | > | id | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 | > | serverId | 02244b34-bc4e-4fcb-8325-058d5b84bcb2 | > | volumeId | 7f21a2b4-2313-496f-9e3b-8c29963eeba0 | > +----------+--------------------------------------+ > > And we see only one connection vi ip x.y.16.12 > [root@compute-0 ~]# iscsiadm -m session -P 3 | grep 16 Current Portal: > 10.46.16.12:3260,1042 > Persistent Portal: 10.46.16.12:3260,1042 > > [root@compute-0 ~]# multipath -ll -v2 > 3600a098038304479363f4c4870453456 dm-0 NETAPP ,LUN C-Mode > size=2.0G features='4 queue_if_no_path pg_init_retries 50 > retain_attached_hw_handle' hwhandler='1 alua' wp=rw > `-+- policy='service-time 0' prio=50 status=active > `- 8:0:0:0 sda 8:0 active ready running > > I detached the volume again, now blocked x.y.16.11 but now I fail to attach > the volume. > > First of all would my verification steps be suited to verify the bz? > If not what/how else should I verify? > > What do we do with the fact that I fail to attach volume once I block > x.y.16.11? > I'm not sure about the terms but wouldn't this fit the error the bz fixes? > How do I know if x.y.16.11 is the discovery provided ip? Test looks valid, assuming no other setup leftovers, most important element is confirming that addresses, ports pairs are passed correctly. Anyway the best thing to understand better what is going is reproducing it with both nova and cinder components in debug to track the volume request down both a compute/controller side, ideally with any associated trace call.
Verified on: openstack-cinder-13.0.5-2.el7ost.noarch Created a iscis volume, attached to instance we see both paths. [root@compute-0 ~]# iscsiadm -m session -P 3 | grep X.Y. > internal IPs..) > Current Portal: X.Y.16.12:3260,1042 > Persistent Portal: X.Y.16.12:3260,1042 > Current Portal: X.Y.16.11:3260,1041 > Persistent Portal: X.Y.16.11:3260,1041 Per the second issue I'd hit at the end of comment 11, it's a known nova multipath bug. For the this bz verification we can ignore it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1678