Description of problem: During a volume detach operation, Nova compute attempts to remove the volume from libvirt for the instance before proceeding to remove the storage lun from the underlying compute host. If Nova discovers that the volume was not found in the instance's libvirt definition then it ignores that error condition and returns (after issuing a warning message "Ignoring DiskNotFound exception while detaching"). However, under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet. The logic as it stands now can leave Cinder with the impression that a volume has been completely removed at the compute host while actually only the definition for the volume within libvirt was removed. This can lead Cinder to perform subsequent operations (such as unpresenting the lun to the compute host from a storage array) while the compute host actually still has active paths to the device. The logic for detecting whether a volume has been detached from an instance should also validate that the underlying storage device has also been cleaned up in addition to the libvirt check. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.Attach a volume that has LVM headers on it from a storage array via iSCSI multipath to an instance on a compute host. 2.A previous bug where LVM on the compute host would scan and import volumes targeted to instances should be encountered (assuming the LVM filter has not already been updated) meaning that LVM on the compute also opens the device. Perform a volume disconnect operation for the Cinder Volume which should fail with errors during multipath flushing in os-brick (the device is still open by LVM on the compute). However, the volume will have been removed from the instance's libvirt definition. 3.Perform a second volume disconnect where the "Ignoring DiskNotFound" warning message should be seen in the Nova compute logs. Cinder will subsequently unpresent the volume from the compute host leading to multipath pathing errors. Actual results: The volume is detached and the associated lun's paths go into error state as the lun has been unpresented from the compute host. Expected results: Nova Compute should detect that the first volume detach failed and error out on the second attempt. The lun should not be unpresented from the compute host. Additional info:
The issue seems to be valid. According to the code if libvirt is reporting that the device does not exist for the guest during detach operation, we do not try any attempt at disconnecting the logical volume from the host [0]. [0] http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?h=14.0.8#n1287
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1595