Description of problem: Nova now supports locally attaching ceph volumes using os-brick using a combination of the '[workarounds] disable_native_luksv1' (to disable native attachment using QEMU so os-brick is used instead) and '[workarounds] rbd_volume_local_attach' (to enabled local attachment) config options. This is broken on an OSP 16.1 deployment. This appears to be because the symlink os-brick expects at '/dev/rbd/{pool}/{device}' (which points to '/dev/rbdN') isn't being created. This should be created by udev rules that ceph provides [1]. Since udev isn't run within 'nova_compute' container, for these to function they must be present on the host. In OSP 13, this was the case, however, in OSP 16.1, it is not. On an OSP 13 node: [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules /usr/lib/udev/rules.d/50-rbd.rules $ rpm -qf /usr/lib/udev/rules.d/50-rbd.rules ceph-common-12.2.12-115.el7cp.x86_64 [heat-admin@compute-0 ~]$ sudo yum list 'ceph*' -q Installed Packages ceph-common.x86_64 2:12.2.12-115.el7cp @rhelosp-ceph-3-mon ... On an OSP 16.1 (beta) node: [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules ls: cannot access '/usr/lib/udev/rules.d/50-rbd.rules': No such file or directory [stack@undercloud-0 ~]$ sudo dnf list 'ceph*' --installed Installed Packages ceph-ansible.noarch The absence of this file means the symlink is not created, and nova/os-brick raises an exception when trying to decrypt the non-existent path. 2020-06-18 15:38:19.175 8 DEBUG os_brick.encryptors.luks [req-foo bar baz - default default] opening encrypted volume /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d _open_volume /usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py:109 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [req-foo bar baz - default default] [instance: foo] Failure attaching encryptor; rolling back volume connection: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d Exit code: 4 Stdout: '' Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n" 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Traceback (most recent call last): 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1588, in _connect_volume 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] self._attach_encryptor(context, connection_info, encryption) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1733, in _attach_encryptor 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] encryptor.attach_volume(context, **encryption) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 167, in attach_volume 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] self._open_volume(passphrase, **kwargs) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 113, in _open_volume 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] root_helper=self._root_helper) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] result = self.__execute(*args, **kwargs) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] return execute_root(*cmd, **kwargs) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] return self.channel.remote_call(name, args, kwargs) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] raise exc_type(*result[2]) 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Exit code: 4 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stdout: '' 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n" 2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] I see three possible solutions at the moment: 1. These udev rules should be present on the host. This could be as simple as installing the 'ceph-common' package, though that's pretty leaky. 2. The udev daemon should be present in the 'nova_compute' container. 3. os-brick should be enhanced to not require these symlinks. Version-Release number of selected component (if applicable): OSP 16.1 beta How reproducible: Always. Steps to Reproduce: 1. Deploy OSP 16 with a ceph backend 2. Attempt to map a device locally. You can either follow the steps described at [2] to have nova do this by calling os-brick, or use the 'rbd map' command like so: rbd device map $DEVICE --pool volumes --id openstack --mon_host $CEPH_HOST:6789 3. Check to see if '/dev/rbd/volumes/$DEVICE' exists. Actual results: The '/dev/rbd/volumes/$DEVICE' symlink does not exist. Expected results: The '/dev/rbd/volumes/$DEVICE' symlink should exist. Additional info: [1] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules [2] https://bugzilla.redhat.com/show_bug.cgi?id=1824120#c2
I have proposed a potential fix upstream based on option 3 above, which can be found here https://review.opendev.org/#/c/736758/2
Re-assigned to the os-brick component while we determine whether this os-brick-based solution is viable.
This issue has conditional approval for 16.1 Z1 release, it must be in the first compose and tested before release of 16.1.1. If not, we will move to TM=Z2.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3542