1667922 – Failing to creating a volume from an image with 3par FC driver.

Bug 1667922 - Failing to creating a volume from an image with 3par FC driver.

Summary: Failing to creating a volume from an image with 3par FC driver.

Keywords:
Status:	CLOSED DUPLICATE of bug 1768790
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Pablo Caruana
QA Contact:	Tzach Shefi
Docs Contact:	Tana
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-21 13:09 UTC by Tzach Shefi
Modified:	2019-11-05 10:09 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1768790 (view as bug list)
Environment:
Last Closed:	2019-11-05 10:09:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Cinder.conf plus cinder logs (150.24 KB, application/gzip) 2019-01-21 13:09 UTC, Tzach Shefi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1809249	None	None	None	2019-07-08 09:41:50 UTC
Launchpad	1812665	None	None	None	2019-01-21 13:12:19 UTC
OpenStack gerrit	678037	'None'	MERGED	3PAR: Add config for NSP single path attach	2020-12-18 05:32:31 UTC

Description Tzach Shefi 2019-01-21 13:09:29 UTC

Created attachment 1522157 [details]
Cinder.conf plus cinder logs

Description of problem: A simple deployment  1 controller + 2 computes, creating an empty Cinder volume works. However creating a volume from an image fails with error described below. 
Unsure if config issue or a possible driver bug? 

Version-Release number of selected component (if applicable):
RHEL 7.6 

puppet-cinder-13.3.1-0.20181013114721.25b1ba3.el7ost.noarch
openstack-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch
python2-cinderclient-4.0.1-0.20180809133302.460229c.el7ost.noarch
python-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch
python2-os-brick-2.5.3-0.20180816081254.641337b.el7ost.noarch

python-nova-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
openstack-nova-api-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
puppet-nova-13.3.1-0.20181013120143.8ab435c.el7ost.noarch
python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch
openstack-nova-common-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
python-novajoin-1.0.21-1.el7ost.noarch

3par - HPE_3PAR 8200
HPE 3PAR OS version -  3.3.1.410 (MU2)+P32,P34,P37,P40,P41,P45

Cisco FC MDS switch 9148 - version 5.0(1a)


How reproducible:
Hit same issue on two deployments (reused same HW).
Then again it might be my cloned config issue. 

Steps to Reproduce:
1. Configure Openstack 14 with 3par FC storage and Cinder back end. 

2. Creating an empty volumes works fine
#cinder create 1 --volume-type 3parfc --name 3parEmptyVol7
Volume is created/avaliable, cinder list -> 
| 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | available | 3parEmptyVol7 | 1    | 3parfc  

Basic Cinder/3par access works fine 

3. Creating a volume from an image (cirros) fails

#cinder create 1 --volume-type 3parfc --name 3parVolFromImage1 --image cirros
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | false                                |
| consistencygroup_id            | None                                 |
| created_at                     | 2019-01-21T12:31:48.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | 0fafa271-9b7b-4dcd-a98c-9143ef916afe |
..
| status                         | creating     

But after a while we see it failed to create,
#cinder list return -> 
| 0fafa271-9b7b-4dcd-a98c-9143ef916afe | error     | 3parVolFromImage1 | 1    | 3parfc      | false    |  

On c-vol log I noticed and os-brick error -> 
2019-01-21 12:32:13.400 70 ERROR os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not found.
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel._wait_for_device_discovery' failed: NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall Traceback (most recent call last):
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 171, in _run_loop
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 219, in _wait_for_device_discovery
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall     raise exception.NoFibreChannelVolumeDeviceFound()
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.

on contoller's BM host, installed 
#yum install sysfsutils
#systool -c fc_host -v      Same output below when I run systool inside c-vol docker. 

[root@controller-0 cinder]# systool -c fc_host -v
Class = "fc_host"

  Class Device = "host6"
  Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6/fc_host/host6"
    dev_loss_tmo        = "16"
    fabric_name         = "0x2002000573a558d1"
    issue_lip           = <store method only>
    max_npiv_vports     = "254"
    node_name           = "0x50014380186af83d"
    npiv_vports_inuse   = "0"
    port_id             = "0x6b1000"
    port_name           = "0x50014380186af83c"
    port_state          = "Online"
    port_type           = "NPort (fabric via point-to-point)"
    speed               = "8 Gbit"
    supported_classes   = "Class 3"
    supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name       = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k"
    system_hostname     = ""
    tgtid_bind_type     = "wwpn (World Wide Port Name)"
    uevent              = 
    vport_create        = <store method only>
    vport_delete        = <store method only>

    Device = "host6"
    Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6"
      fw_dump             = 
      issue_logo          = <store method only>
      nvram               = "ISP "
      optrom_ctl          = <store method only>
      optrom              = 
      reset               = <store method only>
      sfp                 = ""
      uevent              = "DEVTYPE=scsi_host"
      vpd                 = "�$"


  Class Device = "host7"
  Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7/fc_host/host7"
    dev_loss_tmo        = "16"
    fabric_name         = "0x2002000573a558d1"
    issue_lip           = <store method only>
    max_npiv_vports     = "254"
    node_name           = "0x50014380186af83f"
    npiv_vports_inuse   = "0"
    port_id             = "0x6b0a00"
    port_name           = "0x50014380186af83e"
    port_state          = "Online"
    port_type           = "NPort (fabric via point-to-point)"
    speed               = "8 Gbit"
    supported_classes   = "Class 3"
    supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name       = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k"
    system_hostname     = ""
    tgtid_bind_type     = "wwpn (World Wide Port Name)"
    uevent              = 
    vport_create        = <store method only>
    vport_delete        = <store method only>

    Device = "host7"
    Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7"
      fw_dump             = 
      issue_logo          = <store method only>
      nvram               = "ISP "
      optrom_ctl          = <store method only>
      optrom              = 
      reset               = <store method only>
      sfp                 = ""
      uevent              = "DEVTYPE=scsi_host"
      vpd                 = "�$"


4. Attaching an empty volume to an instance works.
Attaching volume failed on my previous system unsure why.
But it's working now so a good sign/progress. 

Nova instance booted/running -> 
| d38e10e4-a937-4c9d-bbac-8bb708f6ac96 | inst1 | ACTIVE | -          | Running

Attach empty volume created on step .1 to instance:

#nova volume-attach d38e10e4-a937-4c9d-bbac-8bb708f6ac96 569d57ae-4a10-4fb6-9a9e-85f722ea9caf auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf |
| serverId | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 |
| volumeId | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf |
+----------+--------------------------------------+

Volume is attached, Cinder list -> 
569d57ae-4a10-4fb6-9a9e-85f722ea9caf | in-use | 3parEmptyVol7     | 1    | 3parfc      | false    | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 |




Actual results: 
Failing to create a 3par FC vol from image. 


Additional info:

All controller/compute nodes as well as 3par's 4 FC links reside in the same FC zone. 
Prior to installing Openstack, I'd successfully attached an FC volume to one of the hosts. So I gather FC zoning is fine.  
All hosts belong to same rhos-fc host set on 3par. 

The FC switch is a Cisco NX-OS MDS version 5.0(1a)
Not sure if while all ports belong to same FC zone do or don't I need to configure Cinder's zone manager? 
I noticed this -> Cinder fc zone manger requirement -> Cisco MDS NX-OS Release 6.2(9) or later, later then my current switch version.

Just in case here is the zone info
zone name hp_3par_cougar07_08_09_16 vsan 2
    member pwwn 21:00:00:1b:32:82:22:9e
    member pwwn 21:01:00:1b:32:a2:22:9e
    member pwwn 51:40:2e:c0:01:7c:3a:d8
    member pwwn 51:40:2e:c0:01:7c:38:6c
    member pwwn 21:01:00:e0:8b:a7:fd:10
    member pwwn 50:01:43:80:18:6a:f8:3e
    member pwwn 51:40:2e:c0:01:7c:38:6e
    member pwwn 21:00:00:24:ff:55:c3:c0
    member pwwn 21:00:00:24:ff:55:c3:c4
    member pwwn 21:00:00:24:ff:55:c3:c5
    member pwwn 20:01:00:02:ac:02:1f:6b
    member pwwn 20:02:00:02:ac:02:1f:6b
    member pwwn 21:01:00:02:ac:02:1f:6b
    member pwwn 21:02:00:02:ac:02:1f:6b

The last 4 one *6b are 3Par's 4 FC ports. 
All the other wwn are dual port FC HBAs attached to my controllers/computes.

Comment 1 Tzach Shefi 2019-01-22 10:35:36 UTC

on cinder.conf , just noticed these (vi line numbers included)

    865 # Protocol for transferring data between host and storage back-end. (string
    866 # value)
    867 # Possible values:
    868 # iscsi - <No description provided>
    869 # fc - <No description provided>
    870 #storage_protocol = iscsi          -> maybe I need to change this to FC? 

If this is the case 3par's guide doesn't even mention this. 
https://h20195.www2.hpe.com/v2/GetPDF.aspx/4AA5-1930ENW.pd

BTW I changed it to FC restarted docker and still same problem. 
So not sure what this does, or maybe I need to add it under back end section? 


   1031 # FC Zoning mode configured, only 'fabric' is supported now. (string value)
   1032 #zoning_mode = <None>

Later on same guide, page 14 mentions setting zone_mode=fabric
Guide needs an update as  zone_mode is now called zoning_mode.

And well I understand the effect this setting has, I think. 
I'm unsure if this setting must be configured or not. 
If I don't mind if that my 3par uses all the FC ports rather than just one, I don't care at the moment.

Comment 2 Tzach Shefi 2019-01-22 18:20:39 UTC

One more tip John suggested to check during volume create from image on controller 
#watch -d -n2 lsblk 
We never noticed the volume being mapped.

Comment 3 Tzach Shefi 2019-01-23 14:08:44 UTC

Some more bit of info:
python-3parclient 4.2.8  
Updated Cisco FC switch's firmware to version 6.2(25) 

I tried playing with Cinder.conf's 
# Protocol for transferring data between host and storage back-end. (string# value)
storage_protocol = fc  (defaults to iscsi) 

This didn't help much, either did setting SE enforce 0. 

Then I also tested with 
#FC Zoning mode configured, only 'fabric' is supported now. (string value)
zoning_mode = fabric    (default it's remarked) 

This still failed to create a vol from an image, yet error changed a bit:

│2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 550, in _copy_image_to_volume                                                      │
│2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server     raise exception.ImageCopyFailure(reason=ex)                                                                                                                                 │
│2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server ImageCopyFailure: Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: 'NoneType│
└2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server                                                                                                                                                                                 │
┌2019-01-23 09:19:30.306 41 INFO cinder.api.openstack.wsgi [req-9528e51e-b84f-47e6-b0e2-1a11c4f9e455 e8bb4c6e7fec4e33ae98517ce77b88cd a06f7770bf82412a8283a6395bcfba15 - default default] OPTIONS http://controller-0.internalapi.localdomai│
│2019-01-23 09:19:30.308 41 DEBUG cinder.api.openstack.wsgi [req-9528e5


On another attempt disconnected one of the dual FC links on controller's HBA, that didn't help/change anything.

Comment 4 Tzach Shefi 2019-01-31 08:59:49 UTC

Same problem on OSP10, create from image fails, this time the error is somewhat different 


The error 
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 4500, in create_volume
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     allow_reschedule=allow_reschedule, volume=volume)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 645, in create_volume
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     _run_flow()
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 637, in _run_flow
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     flow_engine.run()
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     failure.Failure.reraise_if_any(er_failures)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     failures[0].reraise()
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     six.reraise(*self._exc_info)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     result = task.execute(**arguments)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 938, in execute
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     **volume_spec)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 896, in _create_from_image
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     image_service)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 808, in _create_from_image_cache_or_download
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     image_service
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 684, in _create_from_image_download
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     image_service)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 565, in _copy_image_to_volume
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server     raise exception.ImageCopyFailure(reason=ex)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server ImageCopyFailure: Failed to copy image to volume: 
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server Chained Exception #1
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server  Traceback (most recent call last):
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server    File "/usr/lib/python2.7/site-packages/cinder/volume/driver.py", line 458, in _detach_volume
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server      raise exception.VolumeBackendAPIException(data=err_msg)
2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server  VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: Not found (HTTP 40

Comment 5 Tzach Shefi 2019-02-13 10:16:15 UTC

We found the issue, it turns out with OS brick and HP's AJ76A HBA don't play nicely together. 

I had this same HP HBA in my controller and one of my compute nodes.
Both of them I had hit the same error
NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.

Create volume from image failed, attach volume failed, back of a volume failed. 

However on a second compute attach volume worked, it just so happens that compute used another type of HBA. 
Working on a hunch I had swapped my controller's HBA to Qlogic, create volume from image works.

Comment 6 Pablo Caruana 2019-06-26 07:46:49 UTC

(In reply to Tzach Shefi from comment #5)
Based on that comment, is anything additional assistance you are needing or we can proceed archiving this one?

Comment 8 Pablo Caruana 2019-07-08 09:41:51 UTC

There is some work on HPE 3PAR driver side as it picks wrong port when not in multipath mode. Added the external tracker LP#1809249

Comment 9 Tzach Shefi 2019-11-05 06:49:20 UTC

Pablo, 
Agree I think we can archive bz,
as i've since then managed to create numerous 3par FC volumes from images over FC several times over OSP13/14. 
It was probably that HBA issue.

Comment 10 Pablo Caruana 2019-11-05 09:29:57 UTC

(In reply to Tzach Shefi from comment #9)
Thanks for your feedback, I'm also including some backport to reduce the chances of getting the  "Fibre Channel volume device not found" with the 3par driver.

Comment 11 Pablo Caruana 2019-11-05 10:09:12 UTC

Moved that work under https://bugzilla.redhat.com/show_bug.cgi?id=1768790. Archiving this one as per  agreed at https://bugzilla.redhat.com/show_bug.cgi?id=1667922#c9

*** This bug has been marked as a duplicate of bug 1768790 ***

Note You need to log in before you can comment on or make changes to this bug.