Bug 1710375 - [OSP13] live-migration fails with NoFibreChannelVolumeDeviceFound due to FC case sensitive scanning
Summary: [OSP13] live-migration fails with NoFibreChannelVolumeDeviceFound due to FC c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z7
: 13.0 (Queens)
Assignee: Alan Bishop
QA Contact: Tzach Shefi
URL:
Whiteboard:
: 1710846 (view as bug list)
Depends On: 1710373
Blocks: 1709988
TreeView+ depends on / blocked
 
Reported: 2019-05-15 12:52 UTC by Alan Bishop
Modified: 2019-09-09 13:27 UTC (History)
34 users (show)

Fixed In Version: python-os-brick-2.3.5-2.el7ost
Doc Type: Bug Fix
Doc Text:
With this update, operations using FC connections, such as volume live migration, no longer fail because of uppercase world wide name (WWN) for storage backend and lowercase WWN for os-brick devices.
Clone Of: 1710373
Environment:
Last Closed: 2019-07-10 13:05:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1775677 0 None None None 2019-05-15 12:52:20 UTC
OpenStack gerrit 659281 0 'None' MERGED Fix FC case sensitive scanning 2020-10-30 09:46:36 UTC
Red Hat Product Errata RHBA-2019:1738 0 None None None 2019-07-10 13:05:48 UTC

Description Alan Bishop 2019-05-15 12:52:21 UTC
+++ This bug was initially created as a clone of Bug #1710373 +++

+++ This bug was initially created as a clone of Bug #1709988 +++

+++ This bug was initially created as a clone of Bug #1679184 +++

+++ This bug was initially created as a clone of Bug #1667997 +++

Description of problem:
live-migration fails with below error. 

How reproducible:

Always 


Steps to Reproduce:
1.live migrate instance with hp3par backend
2.
3.

2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel._wait_for_device_discovery' failed
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall Traceback (most recent call last):
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 136, in _run_loop
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 157, in _wait_for_device_discovery
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall     raise exception.NoFibreChannelVolumeDeviceFound()
2019-01-18 07:51:13.619 285661 ERROR oslo.service.loopingcall NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.


The problem is due to a case sensitivity mismatch between the FC initiator target map provided by the backend against the port names of the HBAs on the system.

Comment 9 Shatadru Bandyopadhyay 2019-05-24 11:46:58 UTC
*** Bug 1710846 has been marked as a duplicate of this bug. ***

Comment 17 Tzach Shefi 2019-06-16 15:18:58 UTC
Verified on:
python2-os-brick-2.3.7-1.el7ost.noarch


On a system with 3par FC Cinder backend

(overcloud) [stack@puma51 ~]$ cinder service-list
+------------------+-------------------------+------+---------+-------+----------------------------+-----------------+
| Binary           | Host                    | Zone | Status  | State | Updated_at                 | Disabled Reason |
+------------------+-------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-scheduler | controller-0            | nova | enabled | up    | 2019-06-16T14:51:39.000000 | -               |
| cinder-volume    | controller-0@3parfc     | nova | enabled | up    | 2019-06-16T14:51:29.000000 | -               |   -> 3parfc 


Booted an instance, on compute-1
(overcloud) [stack@puma51 ~]$ nova show inst1
+--------------------------------------+----------------------------------------------------------------------------------+
| Property                             | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                           |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | compute-1.localdomain                                                            |
| OS-EXT-SRV-ATTR:hostname             | inst1                                                                            |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain 
 
 
Create an FC volume from an image, image used as to simulate data inside image. 
Attach volume to said instance:

(overcloud) [stack@puma51 ~]$ cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name         | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 81b40936-666d-4a8d-8d53-4027595dc263 | in-use | Pansible_vol | 1    | -           | true     | 9df2a03d-c927-4a36-97a5-f2e6c24c9eb8 |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+

Login to instance check that we have access to volume
login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root.                                                            
inst1 login: cirros                                                                                                                   
Password:                                                                                                                             
$ sudo -i                                                                                                                             
# lsblk                                                                                                                               
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT                                                                                           
vda     253:0    0    1G  0 disk                                                                                                      
|-vda1  253:1    0 1015M  0 part /                                                                                                    
`-vda15 253:15   0    8M  0 part                                                                                                      
vdb     253:16   0    1G  0 disk    -> attached FC volume                                                                                                     
|-vdb1  253:17   0   35M  0 part                                                                                                      
`-vdb15 253:31   0    8M  0 part  



Multipath enabled on compute-1
[root@compute-1 ~]# multipath -ll
360002ac000000000000006ab00021f6b dm-0 3PARdata,VV              
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 6:0:0:0 sdb 8:16 active ready running
  `- 6:0:1:0 sdc 8:32 active ready running


On compute-0 no multipath device no instance

Lets migrate the instance
$ nova migrate 9df2a03d-c927-4a36-97a5-f2e6c24c9eb8 --poll

Server migrating... 100% complete
Finished


We see instance moved to compute-0, and instance has access to FC volume

[root@compute-0 ~]# virsh console instance-00000001 
Connected to domain instance-00000001                
Escape character is ^]                               
c                                                    
Password:                                            
Login incorrect                                      
inst1 login: cirros                                  
Password: 
$ sudo -i
# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda     253:0    0    1G  0 disk 
|-vda1  253:1    0 1015M  0 part /
`-vda15 253:15   0    8M  0 part 
vdb     253:16   0    1G  0 disk 
|-vdb1  253:17   0   35M  0 part 
`-vdb15 253:31   0    8M  0 part 


Plus multi-path device moved to compute-0 as well. 

                                                     
[root@compute-0 ~]# multipath -ll                    
360002ac000000000000006ab00021f6b dm-0 3PARdata,VV                                                        
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw                                          
`-+- policy='service-time 0' prio=50 status=active   
  |- 7:0:0:0 sdb 8:16 active ready running           
  `- 7:0:1:0 sdc 8:32 active ready running  



Successfully migrated an instance+attached FC 3par vol from compute-1 to compute-0 
Good to verify as is. 

Adding another test, this time I'll migrate an instance booted from off an image pushed to FC volume as boot vol
#nova boot --flavor tiny --block-device source=image,id=0a0fe76a-1832-4428-a520-8b783709627b,dest=volume,size=2,shutdown=preserve,bootindex=0 isnt2 --nic net-id=ef5319b3-ae1e-4edd-8534-58652482a12f

+--------------------------------------+-------------------------------------------------+
| Property                             | Value                                           |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                          |
| OS-EXT-AZ:availability_zone          |                                                 |
| OS-EXT-SRV-ATTR:host                 | -                                               |
| OS-EXT-SRV-ATTR:hostname             | isnt2                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                               |
| OS-EXT-SRV-ATTR:instance_name        |                                                 |
| OS-EXT-SRV-ATTR:kernel_id            |                                                 |
| OS-EXT-SRV-ATTR:launch_index         | 0                                               |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                                 |
| OS-EXT-SRV-ATTR:reservation_id       | r-moua9zxo                                      |
| OS-EXT-SRV-ATTR:root_device_name     | -                                               |
| OS-EXT-SRV-ATTR:user_data            | -                                               |
| OS-EXT-STS:power_state               | 0                                               |
| OS-EXT-STS:task_state                | scheduling                                      |
| OS-EXT-STS:vm_state                  | building                                        |
| OS-SRV-USG:launched_at               | -                                               |
| OS-SRV-USG:terminated_at             | -                                               |
| accessIPv4                           |                                                 |
| accessIPv6                           |                                                 |
| adminPass                            | bv9dxTjuLRVw                                    |
| config_drive                         |                                                 |
| created                              | 2019-06-16T15:10:56Z                            |
| description                          | -                                               |
| flavor:disk                          | 1                                               |
| flavor:ephemeral                     | 0                                               |
| flavor:extra_specs                   | {}                                              |
| flavor:original_name                 | tiny                                            |
| flavor:ram                           | 512                                             |
| flavor:swap                          | 0                                               |
| flavor:vcpus                         | 1                                               |
| hostId                               |                                                 |
| host_status                          |                                                 |
| id                                   | fa4f6230-4be5-4969-8a7b-b7d5712792fa            |
| image                                | Attempt to boot from volume - no image supplied |
| key_name                             | -                                               |
| locked                               | False                                           |
| metadata                             | {}                                              |
| name                                 | isnt2                                           |
| os-extended-volumes:volumes_attached | []                                              |
| progress                             | 0                                               |
| security_groups                      | default                                         |
| status                               | BUILD                                           |
| tags                                 | []                                              |
| tenant_id                            | a7720cf2350f4f01bfa872759b107e34                |
| updated                              | 2019-06-16T15:10:56Z                            |
| user_id                              | 0e17a5d0a6dd4b098de29e16a2f16c1d                |
+--------------------------------------+-------------------------------------------------+



Instance booted up fine on compute-1
(overcloud) [stack@puma51 ~]$ nova show isnt2
+--------------------------------------+----------------------------------------------------------------------------------+
| Property                             | Value                                                                            |
+--------------------------------------+----------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                           |
| OS-EXT-AZ:availability_zone          | nova                                                                             |
| OS-EXT-SRV-ATTR:host                 | compute-1.localdomain                                                            |
| OS-EXT-SRV-ATTR:hostname             | isnt2                                                                            |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-1.localdomain                                                            |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000002     

We see multipath device on compute-1
[root@compute-1 ~]# multipath -ll
360002ac000000000000006ad00021f6b dm-0 3PARdata,VV              
size=2.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active          
  |- 6:0:0:0 sdb 8:16 active ready running                  
  `- 6:0:1:0 sdc 8:32 active ready running    

Now lets migrate instance to compute-0

$ nova migrate isnt2 --poll
Server migrating... 100% complete
Finished

We also see multipath device moved to compute-0
[root@compute-0 ~]# multipath -ll
360002ac000000000000006ad00021f6b dm-0 3PARdata,VV                                                            
size=2.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw                                              
`-+- policy='service-time 0' prio=50 status=active     
  |- 7:0:0:0 sdb 8:16 active ready running             
  `- 7:0:1:0 sdc 8:32 active ready running 


Looks good to verify.

Comment 19 errata-xmlrpc 2019-07-10 13:05:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1738

Comment 20 Alan Bishop 2019-07-12 17:03:30 UTC
*** Bug 1710846 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.