Bug 2089382 - [RHOSP 16.1] Live migration is failing when[workarounds]/rbd_volume_local_attach=True is configured
Summary: [RHOSP 16.1] Live migration is failing when[workarounds]/rbd_volume_local_att...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Artom Lifshitz
QA Contact: James Parker
URL:
Whiteboard:
Depends On: 2095780 2096418
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 14:24 UTC by cmayapka
Modified: 2023-01-20 10:50 UTC (History)
19 users (show)

Fixed In Version: openstack-nova-20.4.1-1.20220622143400.1ee93b9.el8ost
Doc Type: Bug Fix
Doc Text:
Before this update, block device mapping updates by the libvirt driver on the destination host were not persisted during live migration. With specific storage back ends or configurations, for example, when using the `n[workarounds]/rbd_volume_local_attach=True` config option, certain operations on volume attachments, for example detaching, after a live migration did not work. With this update, the Compute service (nova) correctly persists any block device mapping updates done by the libvirt driver on the destination host. Operations on affected volumes, such as detaching, succeed after live migration.
Clone Of:
: 2095780 2096418 (view as bug list)
Environment:
Last Closed: 2022-12-07 20:27:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1939545 0 None None None 2022-05-27 20:51:27 UTC
OpenStack gerrit 843680 0 None MERGED compute: Ensure updates to bdms during pre_live_migration are saved 2022-10-18 20:10:30 UTC
Red Hat Issue Tracker OSP-15348 0 None None None 2022-05-23 14:35:37 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:27:52 UTC

Description cmayapka 2022-05-23 14:24:26 UTC
Description of problem:


Version-Release number of selected component (if applicable):
 RHOSP 16.1

How reproducible:
 Everytime

Steps to Reproduce:
1. Deploy an OSP 16.1.x with Ceph backend 
2. Enable barbican with this custom policy:  https://access.redhat.com/solutions/6479601 
3. Add the the below workaround:
 ComputeExtraConfig:
    nova::config::nova_config:
      workarounds/disable_native_luksv1:
        value: true
      workarounds/rbd_volume_local_attach:
        value: true
    nova::compute::keymgr_backend: 'barbican'

4.  Create VM with user-one
5. Create encrypted volume with user-one
6. Attach volume to vm with user-one
7. At this point user-two is able to detach without issue
8. Re-attach with user-one if detached in 4.
9. Live migrate vm to the other compute node with project-admin
10. Attempt to detach with user-two

Actual results:

Live migration failed with this error message:
/var/log/containers/nova/nova-compute.log:2022-04-28 13:18:13.360 7 ERROR oslo_messaging.rpc.server [req-c36f7540-31d3-484f-8f76-d99ef9decf31 9173bf85e45f4bfb9281f1a2a8e8f31d dfad97e476f74774af6ebc92bb7730a1 - default default] Exception during message handling: os_brick.exception.VolumeEncryptionNotSupported: Volume encryption is not supported for rbd volume 5371c26c-c3fe-433f-a77e-63a1d3e1f427.

2022-05-02 21:33:10.156 7 ERROR nova.virt.block_device [req-8bc4320d-87a1-488c-81a7-93cccfd4fc5a 9173bf85e45f4bfb9281f1a2a8e8f31d dfad97e476f74774af6ebc92bb7730a1 - default default] [instance: 4f6939c9-7ced-4aea-9db8-5ef9b245e17a] Failed to detach volume 99e7a4ee-3d90-4968-987a-7ac644360e1d from /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: rbd unmap /dev/rbd1 --id openstack --mon_host 172.17.3.20:6789 --mon_host 172.17.3.26:6789 --mon_host 172.17.3.84:6789
Exit code: 16
Stdout: ''
Stderr: 'rbd: sysfs write failed\nrbd: unmap failed: (16) Device or resource busy\n'



Expected results:
Live migration succeeds. 


Additional info:

Comment 3 Artom Lifshitz 2022-05-24 16:54:33 UTC
Reproduced it in upstream CI [1] without Barbican. Should be quicker moving forward figuring out what's going on, it's much easier to add logging to a Devstack-based deployment/job than a multinode containerized env.

[1] https://review.opendev.org/c/openstack/nova/+/843146

Comment 5 Artom Lifshitz 2022-05-27 02:16:46 UTC
I have a PoC of a fix at [1]. There's still stuff left to be figured out, like making sure we're not accidentally breaking other things, so I want to be clear and set expectations correctly that we're still far from an actual releasable fix, but I did want to at least report progress.

[1] https://review.opendev.org/c/openstack/nova/+/843554

Comment 6 Artom Lifshitz 2022-05-27 20:53:41 UTC
After trying to reproduce on master without the workaround using the cryptsetup encryptor and iSCSI volumes that should have triggered this same issue, I tracked down that there is already a fix on master in the form of [1]. I've started the backport to stable/wallaby initially, with the intention of taking it all the way back to stable/train and OSP 16.x eventually.

[1] https://review.opendev.org/c/openstack/nova/+/804230

Comment 10 Eoghan Glynn 2022-07-04 10:48:13 UTC
@alifshit when will this fix be available on the 16.2.z line?

Comment 11 Jesse Pretorius 2022-07-04 13:22:34 UTC
(In reply to Eoghan Glynn from comment #10)
> @alifshit when will this fix be available on the 16.2.z line?

Targeted at 16.2.4: https://bugzilla.redhat.com/show_bug.cgi?id=2096418

Comment 19 Artom Lifshitz 2022-12-06 13:19:11 UTC
Small correction:

"With this update, you can correctly persist any block device mapping updates done by the libvirt driver on the destination host."

should read:

"With this update, any block device mapping updates done by the libvirt driver on the destination host are correctly persisted by Nova."

The user is not involved in the process at all, this is all internal logic.

Comment 24 errata-xmlrpc 2022-12-07 20:27:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795


Note You need to log in before you can comment on or make changes to this bug.