Bug 1505595 - Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition
Summary: Nova assumes that a volume is fully detached from the compute if the volume i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z8
: 10.0 (Newton)
Assignee: Sahid Ferdjaoui
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 1547576 1547578 1547580
TreeView+ depends on / blocked
 
Reported: 2017-10-24 00:22 UTC by Mark Jones
Modified: 2022-08-16 11:48 UTC (History)
15 users (show)

Fixed In Version: openstack-nova-14.1.0-11.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1547576 (view as bug list)
Environment:
Last Closed: 2018-05-17 15:33:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 515008 0 None MERGED libvirt: disconnect volume from host during detach 2020-11-20 07:07:01 UTC
Red Hat Issue Tracker OSP-4734 0 None None None 2022-08-16 11:48:34 UTC
Red Hat Knowledge Base (Solution) 3213311 0 None None None 2019-01-01 08:59:51 UTC
Red Hat Product Errata RHBA-2018:1595 0 None None None 2018-05-17 15:34:56 UTC

Description Mark Jones 2017-10-24 00:22:45 UTC
Description of problem:

During a volume detach operation, Nova compute attempts to remove the volume from libvirt for the instance before proceeding to remove the storage lun from the underlying compute host. If Nova discovers that the volume was not found in the instance's libvirt definition then it ignores that error condition and returns (after issuing a warning message "Ignoring DiskNotFound exception while detaching").

However, under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet.

The logic as it stands now can leave Cinder with the impression that a volume has been completely removed at the compute host while actually only the definition for the volume within libvirt was removed. 

This can lead Cinder to perform subsequent operations (such as unpresenting the lun to the compute host from a storage array) while the compute host actually still has active paths to the device.

The logic for detecting whether a volume has been detached from an instance should also validate that the underlying storage device has also been cleaned up in addition to the libvirt check.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Attach a volume that has LVM headers on it from a storage array via iSCSI multipath to an instance on a compute host.

2.A previous bug where LVM on the compute host would scan and import volumes targeted to instances should be encountered (assuming the LVM filter has not already been updated) meaning that LVM on the compute also opens the device. Perform a volume disconnect operation for the Cinder Volume which should fail with errors during multipath flushing in os-brick (the device is still open by LVM on the compute). However, the volume will have been removed from the instance's libvirt definition.

3.Perform a second volume disconnect where the "Ignoring DiskNotFound" warning message should be seen in the Nova compute logs. Cinder will subsequently unpresent the volume from the compute host leading to multipath pathing errors.

Actual results:

The volume is detached and the associated lun's paths go into error state as the lun has been unpresented from the compute host.

Expected results:

Nova Compute should detect that the first volume detach failed and error out on the second attempt. The lun should not be unpresented from the compute host.

Additional info:

Comment 1 Sahid Ferdjaoui 2017-10-24 06:27:10 UTC
The issue seems to be valid. According to the code if libvirt is reporting that the device does not exist for the guest during detach operation, we do not try any attempt at disconnecting the logical volume from the host [0].

[0] http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?h=14.0.8#n1287

Comment 22 errata-xmlrpc 2018-05-17 15:33:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1595


Note You need to log in before you can comment on or make changes to this bug.