Hide Forgot
Created attachment 1590660 [details] Nova compute-1 and c-vol logs Description of problem: While testing Cinder multi-attach volume, I'd attached a volume to three instance, while detaching volume worked fine from 2 instances the third instance failed to detach, noticed some multipath errors on compute's nova.log see end of bz and attached log. Version-Release number of selected component (if applicable): puppet-nova-14.4.1-0.20190605170411.17663a5.el8ost.noarch openstack-nova-common-19.0.2-0.20190616040418.acd2daa.el8ost.noarch openstack-nova-compute-19.0.2-0.20190616040418.acd2daa.el8ost.noarch python3-nova-19.0.2-0.20190616040418.acd2daa.el8ost.noarch python3-novaclient-13.0.1-0.20190617080642.ef842ca.el8ost.noarch openstack-nova-migration-19.0.2-0.20190616040418.acd2daa.el8ost.noarch python3-os-brick-2.8.1-0.20190617142514.05b6c9f.el8ost.noarch puppet-cinder-14.4.1-0.20190420083336.1cf0604.el8ost.noarch openstack-cinder-14.0.1-0.20190607000407.23d1a72.el8ost.noarch python3-cinderclient-4.2.0-0.20190520060354.953243d.el8ost.noarch python3-cinder-14.0.1-0.20190607000407.23d1a72.el8ost.noarch How reproducible: Unsure first time I hit this, if I try to detach volume again it fails over and over the same why. Steps to Reproduce: 1. I'd created a multi-attached netapp iscsi backed volume: (overcloud) [stack@undercloud-0 ~]$ cinder show 5e1eed64-b107-4737-b5a4-ed37265fbb7e +--------------------------------+------------------------------------------+ | Property | Value | +--------------------------------+------------------------------------------+ | attached_servers | ['a4946e59-08e2-41a9-b00a-d052a6a33d5e'] | | attachment_ids | ['f59237c1-b3d2-4c20-b826-8ddf695fae12'] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2019-07-14T08:30:30.000000 | | description | None | | encrypted | False | | id | 5e1eed64-b107-4737-b5a4-ed37265fbb7e | | metadata | readonly : True | | migration_status | None | | multiattach | True | | name | MA-netapp-vol-RW | | os-vol-host-attr:host | hostgroup@netapp#rhos_infra_tlv2 | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 5850830f52774e0dab7b7b8e508b6a56 | | readonly | True | | replication_status | None | | size | 1 | | snapshot_id | None | | source_volid | None | | status | in-use | | updated_at | 2019-07-15T08:15:08.000000 | | user_id | 8fe58395864a43158a847d1a9ffd4e9d | | volume_type | netapp-ma | +--------------------------------+------------------------------------------+ 2. Attached it to three instances: | 5e1eed64-b107-4737-b5a4-ed37265fbb7e | in-use | MA-netapp-vol-RW | 1 | netapp-ma | false | 6e80a37d-a052-4a8b-b041-d0e7c46ab6eb,a4946e59-08e2-41a9-b00a-d052a6a33d5e,cae1a6bd-1199-4010-8e04-e46aa50c27a3 | 3. When I tried to detach it detached fine from inst2 and isnt3, however from inst1 it fails to detach: nova volume-detach a4946e59-08e2-41a9-b00a-d052a6a33d5e 5e1eed64-b107-4737-b5a4-ed37265fbb7e The volume reaches detaching state but after timeout as it fails to detach it returns to in-use state. | 5e1eed64-b107-4737-b5a4-ed37265fbb7e | detaching | MA-netapp-vol-RW | 1 | netapp-ma | false | a4946e59-08e2-41a9-b00a-d052a6a33d5e | | 5e1eed64-b107-4737-b5a4-ed37265fbb7e | in-use | MA-netapp-vol-RW | 1 | netapp-ma | false | a4946e59-08e2-41a9-b00a-d052a6a33d5e | Actual results: Volume fails to detach Expected results: Volume should detach Additional info: I'm not sure which attempt this is but the look for such errors on log: 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Failed to detach volume 5e1eed64-b107-4737-b5a4-ed37265fbb7e from /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: multipath -f 3600a09803830447a4f244c465759574f Exit code: 1 Stdout: '' Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See man mpathconf(8) for more details\nJul 15 08:14:28 | 3600a09803830447a4f244c465759574f: map in use\n' 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Traceback (most recent call last): 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/nova/virt/block_device.py", line 326, in driver_detach 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] encryption=encryption) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1701, in detach_volume 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] encryption=encryption) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1329, in _disconnect_volume 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] vol_driver.disconnect_volume(connection_info, instance) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/volume/iscsi.py", line 74, in disconnect_volume 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] self.connector.disconnect_volume(connection_info['data'], None) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 150, in trace_logging_wrapper 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] result = f(*args, **kwargs) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] return f(*args, **kwargs) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 868, in disconnect_volume 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] device_info=device_info) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/iscsi.py", line 912, in _cleanup_connection 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] path_used, was_multipath) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 271, in remove_connection 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] self.flush_multipath_device(multipath_name) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxscsi.py", line 329, in flush_multipath_device 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] root_helper=self._root_helper) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] result = self.__execute(*args, **kwargs) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] return execute_root(*cmd, **kwargs) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 241, in _wrap 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] return self.channel.remote_call(name, args, kwargs) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 203, in remote_call 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] raise exc_type(*result[2]) 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Command: multipath -f 3600a09803830447a4f244c465759574f 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Exit code: 1 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Stdout: '' 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See man mpathconf(8) for more details\nJul 15 08:14:28 | 3600a09803830447a4f244c465759574f: map in use\n' 2019-07-15 08:15:08.156 6 ERROR nova.virt.block_device [instance: a4946e59-08e2-41a9-b00a-d052a6a33d5e] 2019-07-15 08:15:08.812 6 DEBUG oslo_concurrency.lockutils [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Lock "a7365e21-7434-4966-b33c-23c4c2c80e4a" released by "nova.virt.block_device.DriverVolumeBlockDevice.detach.<locals>._do_locked_detach" :: held 61.283s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2019-07-15 08:15:08.812 6 DEBUG oslo_concurrency.lockutils [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" released by "nova.compute.manager.ComputeManager.detach_volume.<locals>.do_detach_volume" :: held 61.771s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2019-07-15 08:15:08.815 6 DEBUG oslo_concurrency.lockutils [-] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 49.948s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327 2019-07-15 08:15:08.872 6 DEBUG oslo_concurrency.lockutils [-] Lock "a4946e59-08e2-41a9-b00a-d052a6a33d5e" released by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: held 0.057s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2019-07-15 08:15:08.874 6 ERROR oslo_messaging.rpc.server [req-e97ee24f-edb0-406f-a903-5c800a0c3842 8fe58395864a43158a847d1a9ffd4e9d 5850830f52774e0dab7b7b8e508b6a56 - default default] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: multipath -f 3600a09803830447a4f244c465759574f Exit code: 1 Stdout: '' Stderr: 'Jul 15 08:14:28 | /etc/multipath.conf does not exist, blacklisting all devices.\nJul 15 08:14:28 | You can run "/sbin/mpathconf --enable" to create\nJul 15 08:14:28 | /etc/multipath.conf. See ma:
This bit "/etc/multipath.conf does not exist" Well we do have some mpath.conf file both computes, it looks like this: [root@compute-1 ~]# cat /etc/multipath.conf # device-mapper-multipath configuration file # For a complete list of the default configuration values, run either: # # multipath -t # or # # multipathd show config # For a list of configuration options with descriptions, see the # multipath.conf man page. defaults { user_friendly_names no find_multipaths yes } blacklist_exceptions { property "(SCSI_IDENT_|ID_WWN)" } blacklist { }
Tzach, I don't know how you are manually configuring multipath, but remember the containerized services won't see the multipath.conf file on the host unless you add an appropriate volume mount in the container. This is the stuff tripleo will do when bug #1584225 is resolved. I recommend separating the QE test goals. If you're testing multi-attach, then perhaps you could do it without relying on multipath?
I'm taking a look at this now, and will first determine whether it's a multi-attach or multipath issue.
After struggling a bit with Tzach's original setup, I switched to my own OSP-15 deployment and was able to reproduce the problem. The error occurs when os-brick tries to flush the multipath device when disconnecting the volume from the last instance (when there should be no more connections). The call to "multipath -f <multipath name>" fails with "map in use."
With help from Eric and Gorka, we tracked this to a Netapp driver bug. When the first instance is detached, the driver is terminating the connection, and this breaks the second instance's connection.
Patch has merged on upstream master, upstream backports are in progress.
OSP15 is EOL, but the fix lives in the newer releases (OSP16) and it's being backported to OSP13.