Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2089382

Summary:	[RHOSP 16.1] Live migration is failing when[workarounds]/rbd_volume_local_attach=True is configured
Product:	Red Hat OpenStack	Reporter:	cmayapka
Component:	openstack-nova	Assignee:	Artom Lifshitz <alifshit>
Status:	CLOSED ERRATA	QA Contact:	James Parker <jparker>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	16.1 (Train)	CC:	alifshit, astupnik, dasmith, eglynn, eolivare, fgadkano, gkadam, jelynch, jhakimra, jparker, jpretori, jveiraca, kchamart, ltamagno, pjagtap, sbauza, sgordon, smooney, vromanso
Target Milestone:	z9	Keywords:	Patch, Triaged
Target Release:	16.1 (Train on RHEL 8.2)
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-nova-20.4.1-1.20220622143400.1ee93b9.el8ost	Doc Type:	Bug Fix
Doc Text:	Before this update, block device mapping updates by the libvirt driver on the destination host were not persisted during live migration. With specific storage back ends or configurations, for example, when using the `n[workarounds]/rbd_volume_local_attach=True` config option, certain operations on volume attachments, for example detaching, after a live migration did not work. With this update, the Compute service (nova) correctly persists any block device mapping updates done by the libvirt driver on the destination host. Operations on affected volumes, such as detaching, succeed after live migration.	Story Points:	---
Clone Of:
Clones:	2095780 2096418 (view as bug list)		Environment:
Last Closed:	2022-12-07 20:27:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2095780, 2096418
Bug Blocks:

Description cmayapka 2022-05-23 14:24:26 UTC

Description of problem:


Version-Release number of selected component (if applicable):
 RHOSP 16.1

How reproducible:
 Everytime

Steps to Reproduce:
1. Deploy an OSP 16.1.x with Ceph backend 
2. Enable barbican with this custom policy:  https://access.redhat.com/solutions/6479601 
3. Add the the below workaround:
 ComputeExtraConfig:
    nova::config::nova_config:
      workarounds/disable_native_luksv1:
        value: true
      workarounds/rbd_volume_local_attach:
        value: true
    nova::compute::keymgr_backend: 'barbican'

4.  Create VM with user-one
5. Create encrypted volume with user-one
6. Attach volume to vm with user-one
7. At this point user-two is able to detach without issue
8. Re-attach with user-one if detached in 4.
9. Live migrate vm to the other compute node with project-admin
10. Attempt to detach with user-two

Actual results:

Live migration failed with this error message:
/var/log/containers/nova/nova-compute.log:2022-04-28 13:18:13.360 7 ERROR oslo_messaging.rpc.server [req-c36f7540-31d3-484f-8f76-d99ef9decf31 9173bf85e45f4bfb9281f1a2a8e8f31d dfad97e476f74774af6ebc92bb7730a1 - default default] Exception during message handling: os_brick.exception.VolumeEncryptionNotSupported: Volume encryption is not supported for rbd volume 5371c26c-c3fe-433f-a77e-63a1d3e1f427.

2022-05-02 21:33:10.156 7 ERROR nova.virt.block_device [req-8bc4320d-87a1-488c-81a7-93cccfd4fc5a 9173bf85e45f4bfb9281f1a2a8e8f31d dfad97e476f74774af6ebc92bb7730a1 - default default] [instance: 4f6939c9-7ced-4aea-9db8-5ef9b245e17a] Failed to detach volume 99e7a4ee-3d90-4968-987a-7ac644360e1d from /dev/vdb: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: rbd unmap /dev/rbd1 --id openstack --mon_host 172.17.3.20:6789 --mon_host 172.17.3.26:6789 --mon_host 172.17.3.84:6789
Exit code: 16
Stdout: ''
Stderr: 'rbd: sysfs write failed\nrbd: unmap failed: (16) Device or resource busy\n'



Expected results:
Live migration succeeds. 


Additional info:

Comment 3 Artom Lifshitz 2022-05-24 16:54:33 UTC

Reproduced it in upstream CI [1] without Barbican. Should be quicker moving forward figuring out what's going on, it's much easier to add logging to a Devstack-based deployment/job than a multinode containerized env.

[1] https://review.opendev.org/c/openstack/nova/+/843146

Comment 5 Artom Lifshitz 2022-05-27 02:16:46 UTC

I have a PoC of a fix at [1]. There's still stuff left to be figured out, like making sure we're not accidentally breaking other things, so I want to be clear and set expectations correctly that we're still far from an actual releasable fix, but I did want to at least report progress.

[1] https://review.opendev.org/c/openstack/nova/+/843554

Comment 6 Artom Lifshitz 2022-05-27 20:53:41 UTC

After trying to reproduce on master without the workaround using the cryptsetup encryptor and iSCSI volumes that should have triggered this same issue, I tracked down that there is already a fix on master in the form of [1]. I've started the backport to stable/wallaby initially, with the intention of taking it all the way back to stable/train and OSP 16.x eventually.

[1] https://review.opendev.org/c/openstack/nova/+/804230

Comment 10 Eoghan Glynn 2022-07-04 10:48:13 UTC

@alifshit when will this fix be available on the 16.2.z line?

Comment 11 Jesse Pretorius 2022-07-04 13:22:34 UTC

(In reply to Eoghan Glynn from comment #10)
> @alifshit when will this fix be available on the 16.2.z line?

Targeted at 16.2.4: https://bugzilla.redhat.com/show_bug.cgi?id=2096418

Comment 19 Artom Lifshitz 2022-12-06 13:19:11 UTC

Small correction:

"With this update, you can correctly persist any block device mapping updates done by the libvirt driver on the destination host."

should read:

"With this update, any block device mapping updates done by the libvirt driver on the destination host are correctly persisted by Nova."

The user is not involved in the process at all, this is all internal logic.

Comment 24 errata-xmlrpc 2022-12-07 20:27:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795