Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1848610

Summary: Removal of ceph-common from undercloud compute breaks local attach
Product: Red Hat OpenStack Reporter: Stephen Finucane <stephenfin>
Component: python-os-brickAssignee: Stephen Finucane <stephenfin>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: apevec, aschultz, jschluet, lhh, mburns, pbabbar, spower
Target Milestone: z1Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-os-brick-2.10.3-0.20200605063443.55fc998.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-27 15:19:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1824115    

Description Stephen Finucane 2020-06-18 15:40:43 UTC
Description of problem:

Nova now supports locally attaching ceph volumes using os-brick using a combination of the '[workarounds] disable_native_luksv1' (to disable native attachment using QEMU so os-brick is used instead) and '[workarounds] rbd_volume_local_attach' (to enabled local attachment) config options. This is broken on an OSP 16.1 deployment. This appears to be because the symlink os-brick expects at '/dev/rbd/{pool}/{device}' (which points to '/dev/rbdN') isn't being created. This should be created by udev rules that ceph provides [1]. Since udev isn't run within 'nova_compute' container, for these to function they must be present on the host. In OSP 13, this was the case, however, in OSP 16.1, it is not.

On an OSP 13 node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  /usr/lib/udev/rules.d/50-rbd.rules
  $ rpm -qf /usr/lib/udev/rules.d/50-rbd.rules
  ceph-common-12.2.12-115.el7cp.x86_64
  [heat-admin@compute-0 ~]$ sudo yum list 'ceph*' -q
  Installed Packages
  ceph-common.x86_64  2:12.2.12-115.el7cp  @rhelosp-ceph-3-mon
  ...

On an OSP 16.1 (beta) node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  ls: cannot access '/usr/lib/udev/rules.d/50-rbd.rules': No such file or directory
  [stack@undercloud-0 ~]$ sudo dnf list 'ceph*' --installed
  Installed Packages
  ceph-ansible.noarch

The absence of this file means the symlink is not created, and nova/os-brick raises an exception when trying to decrypt the non-existent path.

  2020-06-18 15:38:19.175 8 DEBUG os_brick.encryptors.luks [req-foo bar baz - default default] opening encrypted volume /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d _open_volume /usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py:109
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [req-foo bar baz - default default] [instance: foo] Failure attaching encryptor; rolling back volume connection: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  Exit code: 4
  Stdout: ''
  Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Traceback (most recent call last):
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1588, in _connect_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     self._attach_encryptor(context, connection_info, encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1733, in _attach_encryptor
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     encryptor.attach_volume(context, **encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 167, in attach_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     self._open_volume(passphrase, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 113, in _open_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     root_helper=self._root_helper)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     result = self.__execute(*args, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     return execute_root(*cmd, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     return self.channel.remote_call(name, args, kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     raise exc_type(*result[2])
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Exit code: 4
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stdout: ''
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]

I see three possible solutions at the moment:

1. These udev rules should be present on the host. This could be as simple as installing the 'ceph-common' package, though that's pretty leaky.
2. The udev daemon should be present in the 'nova_compute' container.
3. os-brick should be enhanced to not require these symlinks.

Version-Release number of selected component (if applicable):

OSP 16.1 beta

How reproducible:

Always.

Steps to Reproduce:

1. Deploy OSP 16 with a ceph backend
2. Attempt to map a device locally. You can either follow the steps described at [2] to have nova do this by calling os-brick, or use the 'rbd map' command like so:

  rbd device map $DEVICE --pool volumes --id openstack --mon_host $CEPH_HOST:6789

3. Check to see if '/dev/rbd/volumes/$DEVICE' exists.

Actual results:

The '/dev/rbd/volumes/$DEVICE' symlink does not exist.

Expected results:

The '/dev/rbd/volumes/$DEVICE' symlink should exist.
Additional info:

[1] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1824120#c2

Comment 1 Stephen Finucane 2020-06-18 17:47:50 UTC
I have proposed a potential fix upstream based on option 3 above, which can be found here https://review.opendev.org/#/c/736758/2

Comment 2 Stephen Finucane 2020-06-18 17:48:43 UTC
Re-assigned to the os-brick component while we determine whether this os-brick-based solution is viable.

Comment 5 spower 2020-07-14 18:48:06 UTC
This issue has conditional approval for 16.1 Z1 release, it must be in the first compose and tested before release of 16.1.1. If not, we will move to TM=Z2.

Comment 10 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542