Bug 1268051 - volume attach failed with iser
Summary: volume attach failed with iser
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z3
: 7.0 (Kilo)
Assignee: Lee Yarwood
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-01 17:01 UTC by Lenny Verkhovsky
Modified: 2023-02-22 23:02 UTC (History)
16 users (show)

Fixed In Version: openstack-nova-2015.1.2-4.el7ost
Doc Type: Bug Fix
Doc Text:
The fix for bug 1228295 changed the return value when no host devices were found when connecting to an iSCSI or iSER volume. As a consequence, when no host devices were found, an exception was thrown as os.path.exists(None) was called. This in turn caused the connect volume attempt to fail. This update reverts the return value when no host devices are present, and adds an additional check to ensure os.path.exists(None) is never called. As a result, an exception is no longer thrown when no host devices are present and the connect logic correctly retries finding any host devices present.
Clone Of:
Environment:
openstack-kilo RHEL7.1
Last Closed: 2015-12-21 17:06:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch (694 bytes, patch)
2015-10-06 15:34 UTC, Lenny Verkhovsky
no flags Details | Diff
logs and confs (648.24 KB, application/x-bzip)
2015-10-11 07:02 UTC, Lenny Verkhovsky
no flags Details
logs (48.22 KB, application/x-gzip)
2015-10-18 12:50 UTC, Lenny Verkhovsky
no flags Details
logs (40.61 KB, application/x-gzip)
2015-10-21 11:00 UTC, Lenny Verkhovsky
no flags Details
logs (138.38 KB, application/x-gzip)
2015-10-22 06:50 UTC, Lenny Verkhovsky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-16693 0 None None None 2022-07-09 08:01:49 UTC
Red Hat Product Errata RHSA-2015:2673 0 normal SHIPPED_LIVE Moderate: openstack-nova security and bug fix advisory 2015-12-21 21:51:35 UTC

Description Lenny Verkhovsky 2015-10-01 17:01:31 UTC
Description of problem:


Regarding code in /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py

        # Check host_device only when transport is used, since otherwise it is
        # directly derived from properties. Only needed for unit tests
        # Loop until at least 1 path becomes available
        while ((self._get_transport() != "default" and not host_devices)
               or all(map(lambda x: not os.path.exists(x), host_devices))):


Since in Nova kilo host_device initialized to None, and stays None in case /dev/disk/by-path wasn’t updated on the first try (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.py#L842-L846), but in packstack code host_device modified to a list, then it won’t pass the os.path.exists() for the None initialized host_device param like in (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.py#L446-L447) since os.path.exists crash for None values.

One solution is to remove all the None values from the host_devices

Version-Release number of selected component (if applicable):
openstack-kilo

Actual results:
volume attach fails  

Expected results:
volume attach passes and set to be 'in-use'

Additional info:

Comment 2 Lenny Verkhovsky 2015-10-06 15:34:39 UTC
Created attachment 1080290 [details]
patch

Comment 3 Eoghan Glynn 2015-10-09 11:13:33 UTC
Please provide logs and proper NVR versioning of the nova RPMs in use, i.e.:

  $ sudo rpm -qa | grep nova

Comment 4 Lenny Verkhovsky 2015-10-11 07:02:50 UTC
Created attachment 1081770 [details]
logs and confs

logs and confs

Comment 5 Lenny Verkhovsky 2015-10-11 07:03:13 UTC
# sudo rpm -qa | grep nova
python-novaclient-2.23.0-1.el7ost.noarch
openstack-nova-console-2015.1.0-18.el7ost.noarch
openstack-nova-common-2015.1.0-18.el7ost.noarch
openstack-nova-scheduler-2015.1.0-18.el7ost.noarch
openstack-nova-conductor-2015.1.0-18.el7ost.noarch
openstack-nova-api-2015.1.0-18.el7ost.noarch
openstack-nova-cert-2015.1.0-18.el7ost.noarch
openstack-nova-compute-2015.1.0-18.el7ost.noarch
python-nova-2015.1.0-18.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-18.el7ost.noarch

Comment 6 Lee Yarwood 2015-10-16 15:14:39 UTC
(In reply to Lenny Verkhovsky from comment #0)
> Description of problem:
> 
> 
> Regarding code in
> /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py
> 
>         # Check host_device only when transport is used, since otherwise it
> is
>         # directly derived from properties. Only needed for unit tests
>         # Loop until at least 1 path becomes available
>         while ((self._get_transport() != "default" and not host_devices)
>                or all(map(lambda x: not os.path.exists(x), host_devices))):
> 
> 
> Since in Nova kilo host_device initialized to None, and stays None in case
> /dev/disk/by-path wasn’t updated on the first try
> (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> py#L842-L846), but in packstack code host_device modified to a list, then it
> won’t pass the os.path.exists() for the None initialized host_device param
> like in
> (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> py#L446-L447) since os.path.exists crash for None values.
> 
> One solution is to remove all the None values from the host_devices

I might be missing something here but all I see in the logs is an incorrect rootwrap filter due to pci identifiers for the HBA being included in the paths :

21982 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
21983 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ln --symbolic --force /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0  /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0
21984 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Exit code: 99
21985 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Stdout: u''
21986 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Stderr: u'/usr/bin/nova-rootwrap: Unauthorized command: ln --symbolic --force /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 (no filter matched)\n'

With the following filter set by default in OSP 7 for ln :

/usr/share/nova/rootwrap/compute.filters:212:ln: RegExpFilter, ln, root, ln, --symbolic, --force, /dev/mapper/ip-.*-iscsi-iqn.*, /dev/disk/by-path/ip-.*-iscsi-iqn.*

Changed upstream here :

rootwrap: update ln --symbolic filter for FS and FC type volume drivers
https://review.openstack.org/#/c/197713/

Lenny, does the following filter regex work for you instead of the default in /usr/share/nova/rootwrap/compute.filters or do you still see issues ?

ln: RegExpFilter, ln, root, ln, --symbolic, --force, /dev/mapper/.*, .*

Comment 7 Lenny Verkhovsky 2015-10-18 12:50:27 UTC
Created attachment 1084157 [details]
logs


Reproduce:  testr run tempest.scenario.test_minimum_basic.TestMinimumBasicScenario


Traceback 
2015-10-18 15:42:21.204 29999 TRACE oslo_messaging.rpc.dispatcher     or all(map(lambda x: not os.path.exists(x), host_devices))):
2015-10-18 15:42:21.204 29999 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py", line 476, in <lambda>
2015-10-18 15:42:21.204 29999 TRACE oslo_messaging.rpc.dispatcher     or all(map(lambda x: not os.path.exists(x), host_devices))):
2015-10-18 15:42:21.204 29999 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/genericpath.py", line 18, in exists
2015-10-18 15:42:21.204 29999 TRACE oslo_messaging.rpc.dispatcher     os.stat(path)
20

Comment 8 Lenny Verkhovsky 2015-10-18 13:08:02 UTC
(In reply to Lee Yarwood from comment #6)
> (In reply to Lenny Verkhovsky from comment #0)
> > Description of problem:
> > 
> > 
> > Regarding code in
> > /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py
> > 
> >         # Check host_device only when transport is used, since otherwise it
> > is
> >         # directly derived from properties. Only needed for unit tests
> >         # Loop until at least 1 path becomes available
> >         while ((self._get_transport() != "default" and not host_devices)
> >                or all(map(lambda x: not os.path.exists(x), host_devices))):
> > 
> > 
> > Since in Nova kilo host_device initialized to None, and stays None in case
> > /dev/disk/by-path wasn’t updated on the first try
> > (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> > py#L842-L846), but in packstack code host_device modified to a list, then it
> > won’t pass the os.path.exists() for the None initialized host_device param
> > like in
> > (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> > py#L446-L447) since os.path.exists crash for None values.
> > 
> > One solution is to remove all the None values from the host_devices
> 
> I might be missing something here but all I see in the logs is an incorrect
> rootwrap filter due to pci identifiers for the HBA being included in the
> paths :
> 
> 21982 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> ProcessExecutionError: Unexpected error while running command.
> 21983 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ln --symbolic --force
> /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:
> volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 
> /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.
> openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0
> 21984 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Exit
> code: 99
> 21985 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> Stdout: u''
> 21986 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> Stderr: u'/usr/bin/nova-rootwrap: Unauthorized command: ln --symbolic
> --force
> /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:
> volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0
> /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.
> openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 (no filter
> matched)\n'
> 
> With the following filter set by default in OSP 7 for ln :
> 
> /usr/share/nova/rootwrap/compute.filters:212:ln: RegExpFilter, ln, root, ln,
> --symbolic, --force, /dev/mapper/ip-.*-iscsi-iqn.*,
> /dev/disk/by-path/ip-.*-iscsi-iqn.*
> 
> Changed upstream here :
> 
> rootwrap: update ln --symbolic filter for FS and FC type volume drivers
> https://review.openstack.org/#/c/197713/
> 
> Lenny, does the following filter regex work for you instead of the default
> in /usr/share/nova/rootwrap/compute.filters or do you still see issues ?
> 
> ln: RegExpFilter, ln, root, ln, --symbolic, --force, /dev/mapper/.*, .*

This is indeed another issue ( encrypt/decrypt ). We are checking this and will open another bug if needed. Those logs were also created with pacthed volume.py.

I've attached new logs with reproduction of the issue in the next comment.

Comment 9 Lenny Verkhovsky 2015-10-19 07:00:47 UTC
(In reply to Lenny Verkhovsky from comment #8)
> (In reply to Lee Yarwood from comment #6)
> > (In reply to Lenny Verkhovsky from comment #0)
> > > Description of problem:
> > > 
> > > 
> > > Regarding code in
> > > /usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py
> > > 
> > >         # Check host_device only when transport is used, since otherwise it
> > > is
> > >         # directly derived from properties. Only needed for unit tests
> > >         # Loop until at least 1 path becomes available
> > >         while ((self._get_transport() != "default" and not host_devices)
> > >                or all(map(lambda x: not os.path.exists(x), host_devices))):
> > > 
> > > 
> > > Since in Nova kilo host_device initialized to None, and stays None in case
> > > /dev/disk/by-path wasn’t updated on the first try
> > > (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> > > py#L842-L846), but in packstack code host_device modified to a list, then it
> > > won’t pass the os.path.exists() for the None initialized host_device param
> > > like in
> > > (https://github.com/openstack/nova/blob/stable/kilo/nova/virt/libvirt/volume.
> > > py#L446-L447) since os.path.exists crash for None values.
> > > 
> > > One solution is to remove all the None values from the host_devices
> > 
> > I might be missing something here but all I see in the logs is an incorrect
> > rootwrap filter due to pci identifiers for the HBA being included in the
> > paths :
> > 
> > 21982 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> > ProcessExecutionError: Unexpected error while running command.
> > 21983 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> > Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ln --symbolic --force
> > /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:
> > volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 
> > /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.
> > openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0
> > 21984 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher Exit
> > code: 99
> > 21985 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> > Stdout: u''
> > 21986 2015-10-08 10:58:39.597 3415 TRACE oslo_messaging.rpc.dispatcher
> > Stderr: u'/usr/bin/nova-rootwrap: Unauthorized command: ln --symbolic
> > --force
> > /dev/mapper/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:
> > volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0
> > /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.
> > openstack:volume-b21ab8b2-745c-4992-a29f-88993a9b591b-lun-0 (no filter
> > matched)\n'
> > 
> > With the following filter set by default in OSP 7 for ln :
> > 
> > /usr/share/nova/rootwrap/compute.filters:212:ln: RegExpFilter, ln, root, ln,
> > --symbolic, --force, /dev/mapper/ip-.*-iscsi-iqn.*,
> > /dev/disk/by-path/ip-.*-iscsi-iqn.*
> > 
> > Changed upstream here :
> > 
> > rootwrap: update ln --symbolic filter for FS and FC type volume drivers
> > https://review.openstack.org/#/c/197713/
> > 
> > Lenny, does the following filter regex work for you instead of the default
> > in /usr/share/nova/rootwrap/compute.filters or do you still see issues ?
> > 
> > ln: RegExpFilter, ln, root, ln, --symbolic, --force, /dev/mapper/.*, .*
> 
> This is indeed another issue ( encrypt/decrypt ). We are checking this and
> will open another bug if needed. Those logs were also created with pacthed
> volume.py.
> 
> I've attached new logs with reproduction of the issue in the next comment.

https://review.openstack.org/#/c/197713 fixed the encryption issue
tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes 
is passing now.

Comment 10 Lee Yarwood 2015-10-20 13:40:14 UTC
(In reply to Lenny Verkhovsky from comment #9)
> https://review.openstack.org/#/c/197713 fixed the encryption issue
> tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes 
> is passing now.

Thanks, apologies for the delay, I've opened the following RHBZ to track this into RHEL OSP 7.0.z :

Nova volume encryptors attach volume fails for NFS and FC (rootwrap) 
https://bugzilla.redhat.com/show_bug.cgi?id=1273466

(In reply to Lenny Verkhovsky from comment #8)
> This is indeed another issue ( encrypt/decrypt ). We are checking this and
> will open another bug if needed. Those logs were also created with pacthed
> volume.py.
> 
> I've attached new logs with reproduction of the issue in the next comment.

Thanks that's better, as you've pointed out we have additionally backported the now abandoned Nova iscsi-multipath-enhancement [1] blueprint into RHEL OSP 7 [2].

As a result _get_host_devices can return a list containing a _single_ None value. 

IMHO there's no need to loop over all host_devices and we can just address this within the lambda function while testing if the path exists :

# git diff
diff --git a/nova/virt/libvirt/volume.py b/nova/virt/libvirt/volume.py
index 3f3cc2b..7f668de 100644
--- a/nova/virt/libvirt/volume.py
+++ b/nova/virt/libvirt/volume.py
@@ -470,7 +470,8 @@ class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
         # directly derived from properties. Only needed for unit tests
         # Loop until at least 1 path becomes available
         while ((self._get_transport() != "default" and not host_devices)
-               or all(map(lambda x: not os.path.exists(x), host_devices))):
+               or all(map(lambda x: x is None or not os.path.exists(x),
+                           host_devices))):
             if tries >= self.num_scan_tries:
                 raise exception.NovaException(_("iSCSI device not found at %s")
                                               % (host_devices))

Can you test this and confirm that it works as expected for you? Happy to reconsider your original patch if you feel I've missed something here.

Thanks in advance,

Lee

[1] https://review.openstack.org/#/c/134683/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1228295

Comment 11 Lenny Verkhovsky 2015-10-21 06:29:00 UTC
Lee, the proposed patch is not enough.
the while loop keeps scanning until it fails at 
if tries >= self.num_scan_tries:

2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py", line 480, in connect_volume
2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher     % (host_devices))
2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher NovaException: iSCSI device not found at [None]


Please, consider the following patch that fixes the root cause:

--- a/nova/virt/libvirt/volume.py
+++ b/nova/virt/libvirt/volume.py
+++ /tmp/volume.py      2015-10-21 09:21:03.957542634 +0300
@@ -900,11 +900,11 @@ class LibvirtISCSIVolumeDriver(LibvirtBa
         if self._get_transport() == "default":
             return [("/dev/disk/by-path/%s" % device)]
         else:
-            host_device = None
+            host_device = []
             look_for_device = glob.glob('/dev/disk/by-path/*%s' % device)
             if look_for_device:
-                host_device = look_for_device[0]
-            return [host_device]
+                host_device = [look_for_device[0]]
+            return host_device

     def _reconnect(self, iscsi_properties):
         # Note: iscsiadm does not support changing iface.iscsi_ifacename

Comment 12 Lee Yarwood 2015-10-21 10:17:54 UTC
(In reply to Lenny Verkhovsky from comment #11)
> Lee, the proposed patch is not enough.
> the while loop keeps scanning until it fails at 
> if tries >= self.num_scan_tries:
> 
> 2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher   File
> "/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py", line 480, in
> connect_volume
> 2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher     %
> (host_devices))
> 2015-10-21 09:06:30.120 12553 TRACE oslo_messaging.rpc.dispatcher
> NovaException: iSCSI device not found at [None]

Thanks Lenny, can you provide the full log including the retries prior to this? Also the current iscsi_iface setting with `openstack-config --get /etc/nova/nova.conf libvirt iscsi_iface`.

> Please, consider the following patch that fixes the root cause:
> 
> --- a/nova/virt/libvirt/volume.py
> +++ b/nova/virt/libvirt/volume.py
> +++ /tmp/volume.py      2015-10-21 09:21:03.957542634 +0300
> @@ -900,11 +900,11 @@ class LibvirtISCSIVolumeDriver(LibvirtBa
>          if self._get_transport() == "default":
>              return [("/dev/disk/by-path/%s" % device)]
>          else:
> -            host_device = None
> +            host_device = []
>              look_for_device = glob.glob('/dev/disk/by-path/*%s' % device)
>              if look_for_device:
> -                host_device = look_for_device[0]
> -            return [host_device]
> +                host_device = [look_for_device[0]]
> +            return host_device
> 
>      def _reconnect(self, iscsi_properties):
>          # Note: iscsiadm does not support changing iface.iscsi_ifacename

While this corrects the issue of os.path.exists() being called with a None value I think you're missing the impact of _get_host_devices not returning None during connect_volumes when no device paths are present.

The following will ensure that the retry loop is used correctly, without calling os.path.exists with None while still allowing host_devices to be refreshed at the end of each iteration :

# git diff
diff --git a/nova/virt/libvirt/volume.py b/nova/virt/libvirt/volume.py
index 3f3cc2b..caefef4 100644
--- a/nova/virt/libvirt/volume.py
+++ b/nova/virt/libvirt/volume.py
@@ -469,7 +469,7 @@ class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
         # Check host_device only when transport is used, since otherwise it is
         # directly derived from properties. Only needed for unit tests
         # Loop until at least 1 path becomes available
-        while ((self._get_transport() != "default" and not host_devices)
+        while ((self._get_transport() != "default" and host_devices == [None])
                or all(map(lambda x: not os.path.exists(x), host_devices))):
             if tries >= self.num_scan_tries:
                 raise exception.NovaException(_("iSCSI device not found at %s")
@@ -488,12 +488,13 @@ class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
             # For offloaded open-iscsi transports, host_device cannot be
             # guessed unlike iscsi_tcp where it can be obtained from
             # properties, so try and get it again.
-            if not host_devices and self._get_transport() != "default":
+            if host_devices == [None] and self._get_transport() != "default":
                 host_devices = self._get_host_devices(iscsi_properties)
 
             tries = tries + 1
 
-            if all(map(lambda x: not os.path.exists(x), host_devices)):
+            if (host_devices == [None] or
+                    all(map(lambda x: not os.path.exists(x), host_devices))):
                 time.sleep(tries ** 2)
             else:
                 break

Can you revert your changes and test the above in your environment?

If this fails please provide a full DEBUG enabled log from nova-compute.

Thanks in advance,

Lee

Comment 13 Lenny Verkhovsky 2015-10-21 10:39:44 UTC
Hi Lee, 
your latest patch was applied and verified as working.
( Do you still need logs for patch with retries error ? )

Comment 14 Lee Yarwood 2015-10-21 10:46:43 UTC
(In reply to Lenny Verkhovsky from comment #13)
> Hi Lee, 
> your latest patch was applied and verified as working.
> ( Do you still need logs for patch with retries error ? )

Please, just to review and ensure I've not missed anything here as I don't have an ISER environment to hand.

Comment 15 Lenny Verkhovsky 2015-10-21 11:00:42 UTC
Created attachment 1085089 [details]
logs

Small update in patch ( see below )
log and dmesg log attached


--- volume.py   
+++ volume.py    
@@ -470,7 +470,8 @@ class LibvirtISCSIVolumeDriver(LibvirtBa
         # directly derived from properties. Only needed for unit tests
         # Loop until at least 1 path becomes available
         while ((self._get_transport() != "default" and not host_devices)
-               or all(map(lambda x: not os.path.exists(x), host_devices))):
+               or all(map(lambda x: x is None or not os.path.exists(x),
+                           host_devices))):
             if tries >= self.num_scan_tries:
                 raise exception.NovaException(_("iSCSI device not found at %s")
                                               % (host_devices))
@@ -493,7 +494,8 @@ class LibvirtISCSIVolumeDriver(LibvirtBa

             tries = tries + 1

-            if all(map(lambda x: not os.path.exists(x), host_devices)):
+            if all(map(lambda x: x is None or not os.path.exists(x),
+                        host_devices)):
                 time.sleep(tries ** 2)
             else:
                 break

Comment 16 Lee Yarwood 2015-10-21 11:15:24 UTC
(In reply to Lenny Verkhovsky from comment #15)
> Created attachment 1085089 [details]
> logs
> 
> Small update in patch ( see below )
> log and dmesg log attached

I don't see a successful attach in the logs, are these the correct logs?

333 2015-10-21 13:53:19.358 30448 TRACE oslo_messaging.rpc.dispatcher NovaException: iSCSI device not found at [None]

Can you also provide output for the following commands :

# openstack-config --get /etc/nova/nova.conf libvirt iscsi_iface
# ll /dev/disk/by-path/*ip-*-iscsi-*-lun-*

Comment 17 Lenny Verkhovsky 2015-10-21 11:30:05 UTC
I believe that logs are correct. I've re-run it with the same result.



# openstack-config --get /etc/nova/nova.conf libvirt iscsi_iface
Parameter not found: iscsi_iface
[root@r-ufm173 openstack-tempest-kilo]# ll /dev/disk/by-path/*ip-*-iscsi-*-lun-*
lrwxrwxrwx. 1 root root 9 Oct 18 16:30 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-23b19a42-d9ee-4d32-bea8-6aea8b75faf6-lun-0 -> ../../sdb
lrwxrwxrwx. 1 root root 9 Oct 20 21:48 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-2774e7f0-b163-4a11-b703-c85ac6a3dbe0-lun-0 -> ../../sdl
lrwxrwxrwx. 1 root root 9 Oct 21 09:05 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-2dd18dfd-f805-45fd-ab33-6cc9b4b9a728-lun-0 -> ../../sdn
lrwxrwxrwx. 1 root root 9 Oct 20 17:24 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-4349d98a-7119-4014-9cb3-6cc9fee06517-lun-0 -> ../../sdf
lrwxrwxrwx. 1 root root 9 Oct 20 17:55 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-4bb04f8e-6cbc-4695-86b3-93e32e7fac9f-lun-0 -> ../../sdi
lrwxrwxrwx. 1 root root 9 Oct 20 17:18 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-53b7d22a-913c-45a1-a19a-c77a39580d50-lun-0 -> ../../sde
lrwxrwxrwx. 1 root root 9 Oct 20 17:48 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-8aedc697-126a-4adb-a674-ed9719acdf1f-lun-0 -> ../../sdh
lrwxrwxrwx. 1 root root 9 Oct 20 21:23 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-9096d9d3-1fd3-4f1b-8c93-1aa2c8268f07-lun-0 -> ../../sdj
lrwxrwxrwx. 1 root root 9 Oct 20 17:36 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-9dc5477b-4ae2-4e88-b8c4-f69da6b647bd-lun-0 -> ../../sdg
lrwxrwxrwx. 1 root root 9 Oct 21 13:52 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b766c2e3-3a43-4ca9-a674-5bfa87eda1e0-lun-0 -> ../../sdo
lrwxrwxrwx. 1 root root 9 Oct 21 08:53 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-c4329d31-23d2-45f4-aee3-3fa220cc9f85-lun-0 -> ../../sdm
lrwxrwxrwx. 1 root root 9 Oct 20 21:30 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d94eb716-84f1-4142-8bf7-97ec2fde95f4-lun-0 -> ../../sdk

Comment 18 Lee Yarwood 2015-10-21 13:36:58 UTC
(In reply to Lenny Verkhovsky from comment #17)
> I believe that logs are correct. I've re-run it with the same result.

The following snippet from the logs clearly shows a failure to attach the b766c2e3 volume to the 1d1485ff instance. Was this volume and instance part of your test run that succeeded?

223 2015-10-21 13:52:12.670 30448 INFO nova.compute.manager [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] Attaching volume b766c2e3-3a43-4ca9-a674-5bfa87eda1e0 to /dev/vdb
224 2015-10-21 13:52:22.781 30448 WARNING nova.virt.libvirt.volume [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 0
225 2015-10-21 13:52:23.904 30448 WARNING nova.virt.libvirt.volume [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 1
226 2015-10-21 13:52:28.029 30448 WARNING nova.virt.libvirt.volume [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 2
227 2015-10-21 13:52:37.153 30448 WARNING nova.virt.libvirt.volume [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 3
[..]
233 2015-10-21 13:52:53.286 30448 WARNING nova.virt.libvirt.volume [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 4
234 2015-10-21 13:53:18.410 30448 ERROR nova.virt.block_device [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] Driver failed to attach volume b766c2e3-3a43-4ca9-a674-5bfa87eda1e0 at /dev/vdb
235 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] Traceback (most recent call last):
236 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 253, in attach
237 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     device_type=self['device_type'], encryption=encryption)
238 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1070, in attach_volume
239 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     self._connect_volume(connection_info, disk_info)
240 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1021, in _connect_volume
241 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     driver.connect_volume(connection_info, disk_info)
242 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner
243 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     return f(*args, **kwargs)
244 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py", line 477, in connect_volume
245 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     % (host_devices))
246 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] NovaException: iSCSI device not found at [None]
247 2015-10-21 13:53:18.410 30448 TRACE nova.virt.block_device [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]
248 2015-10-21 13:53:19.127 30448 ERROR nova.compute.manager [req-0e6ae146-2388-4521-9d86-a425dfbf3b70 3ca462dfcecd4c77b0745169f1088755 54f794084de847658a57b2033b4320d5 - - -] [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] Failed to attach b766c2e3-3a43-4ca9-a674-5bfa87eda1e0 at /dev/vdb
249 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] Traceback (most recent call last):
250 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4798, in _attach_volume
251 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     do_check_attach=False, do_driver_attach=True)
252 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 49, in wrapped
253 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     ret_val = method(obj, context, *args, **kwargs)
254 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 262, in attach
255 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     connector)
256 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
257 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     six.reraise(self.type_, self.value, self.tb)
258 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 253, in attach
259 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     device_type=self['device_type'], encryption=encryption)
260 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1070, in attach_volume
261 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     self._connect_volume(connection_info, disk_info)
262 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1021, in _connect_volume
263 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     driver.connect_volume(connection_info, disk_info)
264 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner
265 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     return f(*args, **kwargs)
266 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume.py", line 477, in connect_volume
267 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]     % (host_devices))
268 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d] NovaException: iSCSI device not found at [None]
269 2015-10-21 13:53:19.127 30448 TRACE nova.compute.manager [instance: 1d1485ff-030f-45e6-a0eb-330ddea4f41d]

We can see this volume listed under /dev/disk/by-path/ below :

lrwxrwxrwx. 1 root root 9 Oct 21 13:52 /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b766c2e3-3a43-4ca9-a674-5bfa87eda1e0-lun-0 -> ../../sdo

So either the path used during _get_host_devices to find this device is wrong _or_ we are not giving enough time for this device to come online in your environment.

Can you add iscsi_properties and host_devices to the output of the LOG.warn() line within the retry loop? This should provide some more context here once reproduced :

# git diff
diff --git a/nova/virt/libvirt/volume.py b/nova/virt/libvirt/volume.py
index 3f3cc2b..97fe542 100644
--- a/nova/virt/libvirt/volume.py
+++ b/nova/virt/libvirt/volume.py
@@ -469,15 +469,18 @@ class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
         # Check host_device only when transport is used, since otherwise it is
         # directly derived from properties. Only needed for unit tests
         # Loop until at least 1 path becomes available
-        while ((self._get_transport() != "default" and not host_devices)
+        while ((self._get_transport() != "default" and host_devices == [None])
                or all(map(lambda x: not os.path.exists(x), host_devices))):
             if tries >= self.num_scan_tries:
                 raise exception.NovaException(_("iSCSI device not found at %s")
                                               % (host_devices))
 
-            LOG.warn(_LW("ISCSI volume not yet found at: %(disk_dev)s. "
+            LOG.warn(_LW("ISCSI volume not yet found for: %(disk_dev)s with"
+                         "iscsi_properties: %(iscsi_props), "
+                         "host_devices returned: %(host_devices)."
                          "Will rescan & retry.  Try number: %(tries)s"),
-                     {'disk_dev': disk_dev, 'tries': tries})
+                     {'disk_dev': disk_dev, 'iscsi_props':iscsi_properties,
+                      'host_devices': host_devices, 'tries': tries})
 
             # The rescan isn't documented as being necessary(?), but it helps
             if self.use_multipath:
@@ -488,12 +491,13 @@ class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
             # For offloaded open-iscsi transports, host_device cannot be
             # guessed unlike iscsi_tcp where it can be obtained from
             # properties, so try and get it again.
-            if not host_devices and self._get_transport() != "default":
+            if host_devices == [None] and self._get_transport() != "default":
                 host_devices = self._get_host_devices(iscsi_properties)
 
             tries = tries + 1
 
-            if all(map(lambda x: not os.path.exists(x), host_devices)):
+            if (host_devices == [None] or
+                    all(map(lambda x: not os.path.exists(x), host_devices))):
                 time.sleep(tries ** 2)
             else:
                 break

Comment 19 Lenny Verkhovsky 2015-10-22 06:50:54 UTC
Created attachment 1085414 [details]
logs

I've attached logs.

1. the following dev was attached and left after test 
/dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.openstack:volume-03a54fcb-d13b-44fc-a78e-b0c5353b2f99-lun-0 -> ../../sdq

2. I've used my patch from c#15 with log message from c#18

Comment 20 Lee Yarwood 2015-10-22 08:03:54 UTC
(In reply to Lenny Verkhovsky from comment #19)
> Created attachment 1085414 [details]
> logs
> 
> I've attached logs.

You have missed nova-compute.log for this run, can you please provide it?

> 1. the following dev was attached and left after test 
> /dev/disk/by-path/pci-0000:02:00.0-ip-1.1.1.1:3260-iscsi-iqn.2010-10.org.
> openstack:volume-03a54fcb-d13b-44fc-a78e-b0c5353b2f99-lun-0 -> ../../sdq

In the tempest logs we can see volume 03a54fcb fail to reach an in-use (attached) status :

tmp/20151022_tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.log

3453 Traceback (most recent call last):
3454   File "/usr/share/openstack-tempest-kilo/tempest/test.py", line 126, in wrapper
3455     return f(self, *func_args, **func_kwargs)
3456   File "/usr/share/openstack-tempest-kilo/tempest/scenario/test_minimum_basic.py", line 134, in test_minimum_basic_scenario
3457     self.nova_volume_attach()
3458   File "/usr/share/openstack-tempest-kilo/tempest/scenario/test_minimum_basic.py", line 84, in nova_volume_attach
3459     self.volumes_client.wait_for_volume_status(volume['id'], 'in-use')
3460   File "/usr/share/openstack-tempest-kilo/tempest/services/volume/json/volumes_client.py", line 182, in wait_for_volume_status
3461     raise exceptions.TimeoutException(message)
3462 TimeoutException: Request timed out
3463 Details: Volume 03a54fcb-d13b-44fc-a78e-b0c5353b2f99 failed to reach in-use status (current: available) within the required time (196 s).
3464 Ran 1 tests in 237.729s (+169.925s)
 
> 2. I've used my patch from c#15 with log message from c#18

The patch in c#15 is not required here. Can you please attach nova-compute logs from this run for review with the additional logging provided by the patch in c#18?

Comment 21 Lee Yarwood 2015-10-22 10:38:55 UTC
Thanks again for your time on the phone Lenny.Removing the NEEDINFO against you.

Just to confirm that the patch in c#18 does correct this for you. The logs provided in c#19 are using your patch from c#15. Apologies again for the confusion.

Unfortunately this fix is going to be a downstream-only given the use of the now abandoned Nova iscsi-multipath-enhancement blueprint in RHEL OSP 7. I'll propose this internally now and aim to get you a test build later today.

Comment 22 Lee Yarwood 2015-10-30 16:49:19 UTC
*** Bug 1272884 has been marked as a duplicate of this bug. ***

Comment 27 nlevinki 2015-12-01 13:22:12 UTC
The fix is in the code.
the ticket has been verified by third party. (see comment 25)

Comment 29 errata-xmlrpc 2015-12-21 17:06:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2673


Note You need to log in before you can comment on or make changes to this bug.