Bug 1729898 - Failing to detach a multi-attached netapp iscsi volume [NEEDINFO]
Summary: Failing to detach a multi-attached netapp iscsi volume
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Alan Bishop
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-15 09:20 UTC by Tzach Shefi
Modified: 2020-02-12 04:54 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
abishop: needinfo? (akarlsso)


Attachments (Terms of Use)
Nova compute-1 and c-vol logs (446.22 KB, application/gzip)
2019-07-15 09:20 UTC, Tzach Shefi
no flags Details


Links
System ID Priority Status Summary Last Updated
Launchpad 1839384 None None None 2019-08-07 20:39:32 UTC

Description Tzach Shefi 2019-07-15 09:20:48 UTC
Created attachment 1590660 [details]
Nova compute-1 and c-vol logs

Description of problem: While testing Cinder multi-attach volume, I'd attached a volume to three instance, while detaching volume worked fine from 2 instances the third instance failed to detach, noticed some  multipath errors on compute's nova.log see end of bz and attached log. 


Version-Release number of selected component (if applicable):
puppet-nova-14.4.1-0.20190605170411.17663a5.el8ost.noarch
openstack-nova-common-19.0.2-0.20190616040418.acd2daa.el8ost.noarch
openstack-nova-compute-19.0.2-0.20190616040418.acd2daa.el8ost.noarch
python3-nova-19.0.2-0.20190616040418.acd2daa.el8ost.noarch
python3-novaclient-13.0.1-0.20190617080642.ef842ca.el8ost.noarch
openstack-nova-migration-19.0.2-0.20190616040418.acd2daa.el8ost.noarch

python3-os-brick-2.8.1-0.20190617142514.05b6c9f.el8ost.noarch

puppet-cinder-14.4.1-0.20190420083336.1cf0604.el8ost.noarch
openstack-cinder-14.0.1-0.20190607000407.23d1a72.el8ost.noarch
python3-cinderclient-4.2.0-0.20190520060354.953243d.el8ost.noarch
python3-cinder-14.0.1-0.20190607000407.23d1a72.el8ost.noarch


How reproducible:
Unsure first time I hit this, 
if I try to detach volume again it fails over and over the same why. 

Steps to Reproduce:
1. I'd created a multi-attached netapp iscsi backed volume:

(overcloud) [stack@undercloud-0 ~]$ cinder show 5e1eed64-b107-4737-b5a4-ed37265fbb7e
+--------------------------------+------------------------------------------+
| Property                       | Value                                    |
+--------------------------------+------------------------------------------+
| attached_servers               | ['a4946e59-08e2-41a9-b00a-d052a6a33d5e'] |
| attachment_ids                 | ['f59237c1-b3d2-4c20-b826-8ddf695fae12'] |
| availability_zone              | nova                                     |
| bootable                       | false                                    |
| consistencygroup_id            | None                                     |
| created_at                     | 2019-07-14T08:30:30.000000               |
| description                    | None                                     |
| encrypted                      | False                                    |
| id                             | 5e1eed64-b107-4737-b5a4-ed37265fbb7e     |
| metadata                       | readonly : True                          |
| migration_status               | None                                     |
| multiattach                    | True                                     |
| name                           | MA-netapp-vol-RW                         |
| os-vol-host-attr:host          | hostgroup@netapp#rhos_infra_tlv2         |
| os-vol-mig-status-attr:migstat | None                                     |
| os-vol-mig-status-attr:name_id | None                                     |
| os-vol-tenant-attr:tenant_id   | 5850830f52774e0dab7b7b8e508b6a56         |
| readonly                       | True                                     |
| replication_status             | None                                     |
| size                           | 1                                        |
| snapshot_id                    | None                                     |
| source_volid                   | None                                     |
| status                         | in-use                                   |
| updated_at                     | 2019-07-15T08:15:08.000000               |
| user_id                        | 8fe58395864a43158a847d1a9ffd4e9d         |
| volume_type                    | netapp-ma                                |
+--------------------------------+------------------------------------------+

2. Attached it to three instances:

| 5e1eed64-b107-4737-b5a4-ed37265fbb7e | in-use    | MA-netapp-vol-RW | 1    | netapp-ma   | false    | 6e80a37d-a052-4a8b-b041-d0e7c46ab6eb,a4946e59-08e2-41a9-b00a-d052a6a33d5e,cae1a6bd-1199-4010-8e04-e46aa50c27a3 |



3.  When I tried to detach it detached fine from 
inst2 and isnt3, however from inst1  it fails to detach:
nova volume-detach a4946e59-08e2-41a9-b00a-d052a6a33d5e 5e1eed64-b107-4737-b5a4-ed37265fbb7e

The volume reaches detaching state but after timeout as it fails to detach it returns to in-use state. 


| 5e1eed64-b107-4737-b5a4-ed37265fbb7e | detaching | MA-netapp-vol-RW | 1    | netapp-ma   | false    | a4946e59-08e2-41a9-b00a-d052a6a33d5e |



| 5e1eed64-b107-4737-b5a4-ed37265fbb7e | in-use    | MA-netapp-vol-RW | 1    | netapp-ma   | false    | a4946e59-08e2-41a9-b00a-d052a6a33d5e |



Actual results:
Volume fails to detach

Expected results:
Volume should detach 

Additional info:  

I'm not sure which attempt this is but the look for such errors on log:

2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Failed to detach volume 5e1eed64-b107-4737-b5a4-ed37265fbb7e from /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -f 3600a09803830447a4f244c465759574f
Exit code: 1
Stdout: ''
Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See man mpathconf(8) for more details\nJul 15 08:14:28 | 3600a09803830447a4f244c465759574f: map in use\n'
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Traceback (most recent call last):
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 326, in driver_detach
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     encryption=encryption)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1701, in detach_volume
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     encryption=encryption)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1329, in _disconnect_volume
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     vol_driver.disconnect_volume(connection_info, instance)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/volume/iscsi.py", line 74, in disconnect_volume
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     self.connector.disconnect_volume(connection_info['data'], None)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 150, in trace_logging_wrapper
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     result = f(*args, **kwargs)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     return f(*args, **kwargs)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 868, in disconnect_volume
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     device_info=device_info)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 912, in _cleanup_connection
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     path_used, was_multipath)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 271, in remove_connection
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     self.flush_multipath_device(multipath_name)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 329, in flush_multipath_device
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     root_helper=self._root_helper)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     result = self.__execute(*args, **kwargs)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     return execute_root(*cmd, **kwargs)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 241, in _wrap
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     return self.channel.remote_call(name, args, kwargs)
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 203, in remote_call
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e]     raise exc_type(*result[2])
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Command: multipath -f 3600a09803830447a4f244c465759574f
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Exit code: 1
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Stdout: ''
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See man mpathconf(8) for more details\nJul 15 08:14:28 | 3600a09803830447a4f244c465759574f: map in use\n'
2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] 
2019-07-15 08:15:08.812 6 DEBUG oslo_concurrency.lockutils [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Lock "a7365e21-7434-4966-b33c-23c4c2c80e4a" released by "nova.virt.block_device.DriverVolumeBlockDevice.detach.<locals>._do_locked_detach" :: held 61.283s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2019-07-15 08:15:08.812 6 DEBUG oslo_concurrency.lockutils [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" released by "nova.compute.manager.ComputeManager.detach_volume.<locals>.do_detach_volume" :: held 61.771s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2019-07-15 08:15:08.815 6 DEBUG oslo_concurrency.lockutils [-] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 49.948s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2019-07-15 08:15:08.872 6 DEBUG oslo_concurrency.lockutils [-] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" released by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: held 0.057s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2019-07-15 08:15:08.874 6 ERROR oslo_messaging.rpc.server [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: multipath -f 3600a09803830447a4f244c465759574f
Exit code: 1
Stdout: ''
Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See ma:

Comment 2 Tzach Shefi 2019-07-15 09:30:24 UTC
This bit "/etc/multipath.conf does not exist" 
Well we do have some mpath.conf file both computes, it looks like this:

[root@compute-1 ~]# cat /etc/multipath.conf
# device-mapper-multipath configuration file

# For a complete list of the default configuration values, run either:
# # multipath -t
# or
# # multipathd show config

# For a list of configuration options with descriptions, see the
# multipath.conf man page.

defaults {
        user_friendly_names no
        find_multipaths yes
}

blacklist_exceptions {
        property "(SCSI_IDENT_|ID_WWN)"
}

blacklist {
}

Comment 3 Alan Bishop 2019-07-19 13:54:28 UTC
Tzach, I don't know how you are manually configuring multipath, but remember the containerized services won't see the multipath.conf file on the host unless you add an appropriate volume mount in the container. This is the stuff tripleo will do when bug #1584225 is resolved.

I recommend separating the QE test goals. If you're testing multi-attach, then perhaps you could do it without relying on multipath?

Comment 6 Alan Bishop 2019-07-30 16:00:18 UTC
I'm taking a look at this now, and will first determine whether it's a multi-attach or multipath issue.

Comment 7 Alan Bishop 2019-08-06 00:11:54 UTC
After struggling a bit with Tzach's original setup, I switched to my own OSP-15 deployment and was able to reproduce the problem. The error occurs when os-brick tries to flush the multipath device when disconnecting the volume from the last instance (when there should be no more connections). The call to "multipath -f <multipath name>" fails with "map in use."

Comment 8 Alan Bishop 2019-08-07 20:39:32 UTC
With help from Eric and Gorka, we tracked this to a Netapp driver bug. When the first instance is detached, the driver is terminating the connection, and this breaks the second instance's connection.


Note You need to log in before you can comment on or make changes to this bug.