Bug 1942974 - iSCSI: Not flushing and leaving levtover devices when multipath configuration has changed
Summary: iSCSI: Not flushing and leaving levtover devices when multipath configuration...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Gorka Eguileor
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks: 1943146 1943181
TreeView+ depends on / blocked
 
Reported: 2021-03-25 12:30 UTC by Gorka Eguileor
Modified: 2022-09-05 13:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1943146 (view as bug list)
Environment:
Last Closed: 2021-04-15 11:17:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1921381 0 None None None 2021-03-25 12:30:52 UTC
OpenStack gerrit 782992 0 None MERGED iSCSI: Fix flushing after multipath cfg change 2021-03-30 09:49:32 UTC
Red Hat Issue Tracker OSP-770 0 None None None 2022-09-05 13:19:39 UTC

Description Gorka Eguileor 2021-03-25 12:30:52 UTC
Description of problem:

OS-Brick disconnect_volume code assumes that the use_multipath parameter that is used to instantiate the connector has the same value than the connector that was used on the original connect_volume call.

Unfortunately this is not necessarily true, because Nova can attach a volume, then its multipath configuration can be enabled or disabled, and then a detach can be issued.

This leads to a series of serious issues such as:

- Not flushing the single path on disconnect_volume (possible data loss) and leaving it as a leftover device on the host when Nova calls terminate-connection on Cinder.

- Not flushing the multipath device (possible data loss) and leaving it as a lefover device similarly to the other case.


How reproducible:

There are many combinations depending on whether we are enabling or disabling multipathing and what kind of information the Cinder driver returns for single pathed connections, but in general it's easy to reproduce on systems that don't return target_portals and target_iqns on single pathed connections.

Steps to Reproduce:

- Enabling issue:

1. Configure Nova not to use multipathing  (volume_use_multipath = False under [libvirt])
2. Attach a volume to an instance
3. Enable multipathing in Nova (volume_use_multipath = True under [libvirt]), DEBUG log levels and restart the service.
4. Write a lot of data to the volume inside the instance
5. Detach the volume from the instance

We'll see in the logs that there is no flushing or removal of the device in logs, and the device will still be present in the system (even if Cinder has already unmapped it), and if we check the volume contents it's very likely that not all data has been saved.

- Disabling issue:

1. Configure Nova to use multipathing  (volume_use_multipath = True under [libvirt])
2. Attach a volume to an instance
3. Disable multipathing in Nova (volume_use_multipath = False under [libvirt]), DEBUG log levels and restart the service.
4. Write a lot of data to the volume inside the instance
5. Detach the volume from the instance

We'll see in the logs that there is no flushing and removal of the multipathed device in the logs, and all the individual devices are removed.  Depending on the speed of the system and storage the multipath may disappear from the system (if all the data had been written) but most likely the multipathed device will remain, without any device under it, and data will have been lost on the volume.

Comment 1 Gorka Eguileor 2021-04-15 11:17:01 UTC
Fix will be picked up in next code import.


Note You need to log in before you can comment on or make changes to this bug.