Created attachment 1522157 [details] Cinder.conf plus cinder logs Description of problem: A simple deployment 1 controller + 2 computes, creating an empty Cinder volume works. However creating a volume from an image fails with error described below. Unsure if config issue or a possible driver bug? Version-Release number of selected component (if applicable): RHEL 7.6 puppet-cinder-13.3.1-0.20181013114721.25b1ba3.el7ost.noarch openstack-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch python2-cinderclient-4.0.1-0.20180809133302.460229c.el7ost.noarch python-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch python2-os-brick-2.5.3-0.20180816081254.641337b.el7ost.noarch python-nova-18.0.3-0.20181011032838.d1243fe.el7ost.noarch openstack-nova-api-18.0.3-0.20181011032838.d1243fe.el7ost.noarch puppet-nova-13.3.1-0.20181013120143.8ab435c.el7ost.noarch python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch openstack-nova-common-18.0.3-0.20181011032838.d1243fe.el7ost.noarch python-novajoin-1.0.21-1.el7ost.noarch 3par - HPE_3PAR 8200 HPE 3PAR OS version - 3.3.1.410 (MU2)+P32,P34,P37,P40,P41,P45 Cisco FC MDS switch 9148 - version 5.0(1a) How reproducible: Hit same issue on two deployments (reused same HW). Then again it might be my cloned config issue. Steps to Reproduce: 1. Configure Openstack 14 with 3par FC storage and Cinder back end. 2. Creating an empty volumes works fine #cinder create 1 --volume-type 3parfc --name 3parEmptyVol7 Volume is created/avaliable, cinder list -> | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | available | 3parEmptyVol7 | 1 | 3parfc Basic Cinder/3par access works fine 3. Creating a volume from an image (cirros) fails #cinder create 1 --volume-type 3parfc --name 3parVolFromImage1 --image cirros +--------------------------------+--------------------------------------+ | Property | Value | +--------------------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2019-01-21T12:31:48.000000 | | description | None | | encrypted | False | | id | 0fafa271-9b7b-4dcd-a98c-9143ef916afe | .. | status | creating But after a while we see it failed to create, #cinder list return -> | 0fafa271-9b7b-4dcd-a98c-9143ef916afe | error | 3parVolFromImage1 | 1 | 3parfc | false | On c-vol log I noticed and os-brick error -> 2019-01-21 12:32:13.400 70 ERROR os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not found. 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel._wait_for_device_discovery' failed: NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device. 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall Traceback (most recent call last): 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 171, in _run_loop 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 219, in _wait_for_device_discovery 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall raise exception.NoFibreChannelVolumeDeviceFound() 2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device. on contoller's BM host, installed #yum install sysfsutils #systool -c fc_host -v Same output below when I run systool inside c-vol docker. [root@controller-0 cinder]# systool -c fc_host -v Class = "fc_host" Class Device = "host6" Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6/fc_host/host6" dev_loss_tmo = "16" fabric_name = "0x2002000573a558d1" issue_lip = <store method only> max_npiv_vports = "254" node_name = "0x50014380186af83d" npiv_vports_inuse = "0" port_id = "0x6b1000" port_name = "0x50014380186af83c" port_state = "Online" port_type = "NPort (fabric via point-to-point)" speed = "8 Gbit" supported_classes = "Class 3" supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit" symbolic_name = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k" system_hostname = "" tgtid_bind_type = "wwpn (World Wide Port Name)" uevent = vport_create = <store method only> vport_delete = <store method only> Device = "host6" Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6" fw_dump = issue_logo = <store method only> nvram = "ISP " optrom_ctl = <store method only> optrom = reset = <store method only> sfp = "" uevent = "DEVTYPE=scsi_host" vpd = "�$" Class Device = "host7" Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7/fc_host/host7" dev_loss_tmo = "16" fabric_name = "0x2002000573a558d1" issue_lip = <store method only> max_npiv_vports = "254" node_name = "0x50014380186af83f" npiv_vports_inuse = "0" port_id = "0x6b0a00" port_name = "0x50014380186af83e" port_state = "Online" port_type = "NPort (fabric via point-to-point)" speed = "8 Gbit" supported_classes = "Class 3" supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit" symbolic_name = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k" system_hostname = "" tgtid_bind_type = "wwpn (World Wide Port Name)" uevent = vport_create = <store method only> vport_delete = <store method only> Device = "host7" Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7" fw_dump = issue_logo = <store method only> nvram = "ISP " optrom_ctl = <store method only> optrom = reset = <store method only> sfp = "" uevent = "DEVTYPE=scsi_host" vpd = "�$" 4. Attaching an empty volume to an instance works. Attaching volume failed on my previous system unsure why. But it's working now so a good sign/progress. Nova instance booted/running -> | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 | inst1 | ACTIVE | - | Running Attach empty volume created on step .1 to instance: #nova volume-attach d38e10e4-a937-4c9d-bbac-8bb708f6ac96 569d57ae-4a10-4fb6-9a9e-85f722ea9caf auto +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/vdb | | id | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | | serverId | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 | | volumeId | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | +----------+--------------------------------------+ Volume is attached, Cinder list -> 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | in-use | 3parEmptyVol7 | 1 | 3parfc | false | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 | Actual results: Failing to create a 3par FC vol from image. Additional info: All controller/compute nodes as well as 3par's 4 FC links reside in the same FC zone. Prior to installing Openstack, I'd successfully attached an FC volume to one of the hosts. So I gather FC zoning is fine. All hosts belong to same rhos-fc host set on 3par. The FC switch is a Cisco NX-OS MDS version 5.0(1a) Not sure if while all ports belong to same FC zone do or don't I need to configure Cinder's zone manager? I noticed this -> Cinder fc zone manger requirement -> Cisco MDS NX-OS Release 6.2(9) or later, later then my current switch version. Just in case here is the zone info zone name hp_3par_cougar07_08_09_16 vsan 2 member pwwn 21:00:00:1b:32:82:22:9e member pwwn 21:01:00:1b:32:a2:22:9e member pwwn 51:40:2e:c0:01:7c:3a:d8 member pwwn 51:40:2e:c0:01:7c:38:6c member pwwn 21:01:00:e0:8b:a7:fd:10 member pwwn 50:01:43:80:18:6a:f8:3e member pwwn 51:40:2e:c0:01:7c:38:6e member pwwn 21:00:00:24:ff:55:c3:c0 member pwwn 21:00:00:24:ff:55:c3:c4 member pwwn 21:00:00:24:ff:55:c3:c5 member pwwn 20:01:00:02:ac:02:1f:6b member pwwn 20:02:00:02:ac:02:1f:6b member pwwn 21:01:00:02:ac:02:1f:6b member pwwn 21:02:00:02:ac:02:1f:6b The last 4 one *6b are 3Par's 4 FC ports. All the other wwn are dual port FC HBAs attached to my controllers/computes.
on cinder.conf , just noticed these (vi line numbers included) 865 # Protocol for transferring data between host and storage back-end. (string 866 # value) 867 # Possible values: 868 # iscsi - <No description provided> 869 # fc - <No description provided> 870 #storage_protocol = iscsi -> maybe I need to change this to FC? If this is the case 3par's guide doesn't even mention this. https://h20195.www2.hpe.com/v2/GetPDF.aspx/4AA5-1930ENW.pd BTW I changed it to FC restarted docker and still same problem. So not sure what this does, or maybe I need to add it under back end section? 1031 # FC Zoning mode configured, only 'fabric' is supported now. (string value) 1032 #zoning_mode = <None> Later on same guide, page 14 mentions setting zone_mode=fabric Guide needs an update as zone_mode is now called zoning_mode. And well I understand the effect this setting has, I think. I'm unsure if this setting must be configured or not. If I don't mind if that my 3par uses all the FC ports rather than just one, I don't care at the moment.
One more tip John suggested to check during volume create from image on controller #watch -d -n2 lsblk We never noticed the volume being mapped.
Some more bit of info: python-3parclient 4.2.8 Updated Cisco FC switch's firmware to version 6.2(25) I tried playing with Cinder.conf's # Protocol for transferring data between host and storage back-end. (string# value) storage_protocol = fc (defaults to iscsi) This didn't help much, either did setting SE enforce 0. Then I also tested with #FC Zoning mode configured, only 'fabric' is supported now. (string value) zoning_mode = fabric (default it's remarked) This still failed to create a vol from an image, yet error changed a bit: │2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 550, in _copy_image_to_volume │ │2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server raise exception.ImageCopyFailure(reason=ex) │ │2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server ImageCopyFailure: Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: 'NoneType│ └2019-01-23 09:19:28.401 45 ERROR oslo_messaging.rpc.server │ ┌2019-01-23 09:19:30.306 41 INFO cinder.api.openstack.wsgi [req-9528e51e-b84f-47e6-b0e2-1a11c4f9e455 e8bb4c6e7fec4e33ae98517ce77b88cd a06f7770bf82412a8283a6395bcfba15 - default default] OPTIONS http://controller-0.internalapi.localdomai│ │2019-01-23 09:19:30.308 41 DEBUG cinder.api.openstack.wsgi [req-9528e5 On another attempt disconnected one of the dual FC links on controller's HBA, that didn't help/change anything.
Same problem on OSP10, create from image fails, this time the error is somewhat different The error 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 4500, in create_volume 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server allow_reschedule=allow_reschedule, volume=volume) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 645, in create_volume 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server _run_flow() 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 637, in _run_flow 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server flow_engine.run() 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server for _state in self.run_iter(timeout=timeout): 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server failure.Failure.reraise_if_any(er_failures) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server failures[0].reraise() 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server six.reraise(*self._exc_info) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server result = task.execute(**arguments) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 938, in execute 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server **volume_spec) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 896, in _create_from_image 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server image_service) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 808, in _create_from_image_cache_or_download 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server image_service 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 684, in _create_from_image_download 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server image_service) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 565, in _copy_image_to_volume 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server raise exception.ImageCopyFailure(reason=ex) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server ImageCopyFailure: Failed to copy image to volume: 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server Chained Exception #1 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/driver.py", line 458, in _detach_volume 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server raise exception.VolumeBackendAPIException(data=err_msg) 2019-01-31 08:34:04.127 173653 ERROR oslo_messaging.rpc.server VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: Not found (HTTP 40
We found the issue, it turns out with OS brick and HP's AJ76A HBA don't play nicely together. I had this same HP HBA in my controller and one of my compute nodes. Both of them I had hit the same error NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device. Create volume from image failed, attach volume failed, back of a volume failed. However on a second compute attach volume worked, it just so happens that compute used another type of HBA. Working on a hunch I had swapped my controller's HBA to Qlogic, create volume from image works.
(In reply to Tzach Shefi from comment #5) Based on that comment, is anything additional assistance you are needing or we can proceed archiving this one?
There is some work on HPE 3PAR driver side as it picks wrong port when not in multipath mode. Added the external tracker LP#1809249
Pablo, Agree I think we can archive bz, as i've since then managed to create numerous 3par FC volumes from images over FC several times over OSP13/14. It was probably that HBA issue.
(In reply to Tzach Shefi from comment #9) Thanks for your feedback, I'm also including some backport to reduce the chances of getting the "Fibre Channel volume device not found" with the 3par driver.
Moved that work under https://bugzilla.redhat.com/show_bug.cgi?id=1768790. Archiving this one as per agreed at https://bugzilla.redhat.com/show_bug.cgi?id=1667922#c9 *** This bug has been marked as a duplicate of bug 1768790 ***