Bug 1293212 - Faulty multipath devices left on compute nodes after deleting instances which have cinder attached
Faulty multipath devices left on compute nodes after deleting instances which...
Status: CLOSED DEFERRED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
6.0 (Juno)
Unspecified Unspecified
unspecified Severity medium
: ---
: 7.0 (Kilo)
Assigned To: Lee Yarwood
nlevinki
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-21 00:57 EST by Chen
Modified: 2016-07-24 23:21 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-26 07:58:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chen 2015-12-21 00:57:03 EST
Description of problem:

Faulty multipath devices left on compute nodes after deleting instances which have cinder attached

Version-Release number of selected component (if applicable):

OSP 6.0
EMC iSCSI storage

How reproducible:

100%

Steps to Reproduce:
1. Boot an instance from volume
2. Attach a cinder volume
3. Delete the instance

Actual results:

Faulty devices are left on the compute node.

Expected results:

No faulty devices should be left on the compute node.

Additional info:

Logs have been uploaded to collab-shell.
For details please check comment #1
Comment 4 Lee Yarwood 2015-12-21 06:40:27 EST
So the issue here is that when there are no additional LUNs provided by the IQN Nova simply disconnects from the target portal. This removes the path devices but keeps the multipath device in place. When there are additional LUNs provided by the IQN nova deletes the paths _and_ multipath device.

 nova/virt/libvirt/volume.py 

 223 class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
 224     """Driver to attach Network volumes to libvirt."""

 399     @utils.synchronized('connect_volume')
 400     def disconnect_volume(self, connection_info, disk_dev):
 401         """Detach the volume from instance_name."""
 [..]
 416         if self.use_multipath and multipath_device:
 417             return self._disconnect_volume_multipath_iscsi(iscsi_properties,
 418                                                            multipath_device)

 470     def _disconnect_volume_multipath_iscsi(self, iscsi_properties,
 471                                            multipath_device):
 [..]
 511         # Get a target for all other multipath devices
 512         other_iqns = [self._get_multipath_iqn(device)
 513                       for device in devices]
 514         # Get all the targets for the current multipath device
 515         current_iqns = [iqn for ip, iqn in ips_iqns]
 516 
 517         in_use = False
 518         for current in current_iqns:
 519             if current in other_iqns:
 520                 in_use = True
 521                 break
 522 
 523         # If no other multipath device attached has the same iqn
 524         # as the current device
 525         if not in_use:
 526             # disconnect if no other multipath devices with same iqn
 527             self._disconnect_mpath(iscsi_properties, ips_iqns)
 528             return 
 529         elif multipath_device not in devices: 
 530             # delete the devices associated w/ the unused multipath
 531             self._delete_mpath(iscsi_properties, multipath_device, ips_iqns)
 532 
 533         # else do not disconnect iscsi portals,
 534         # as they are used for other luns,
 535         # just remove multipath mapping device descriptor
 536         self._remove_multipath_device_descriptor(multipath_device)
 537         return


In Liberty os-brick removes the paths and mpath device first before deciding if we need to disconnect from the portal. This behaviour was present in Cinder before the fork into os-brick so I can try to port this across into Nova prior to our use of os-brick.
Comment 9 Jay Xu 2016-07-24 23:21:40 EDT
This will make the performance of volume attach/detach much better, but it can not resolve faulty device issue. As you mentioned, os-brick remove the paths and mpath device first and then disconnect from the portal. The faulty device will exists if another volume attach/detach happens between the deletion of paths/mpath devices and disconnection from the portal. Because volume attach/detach will trigger the command rescan which will generate the paths and mpath devices again.

Note You need to log in before you can comment on or make changes to this bug.