Bug 1848610 - Removal of ceph-common from undercloud compute breaks local attach
Summary: Removal of ceph-common from undercloud compute breaks local attach
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z1
: 16.1 (Train on RHEL 8.2)
Assignee: Stephen Finucane
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks: 1824115
TreeView+ depends on / blocked
 
Reported: 2020-06-18 15:40 UTC by Stephen Finucane
Modified: 2020-08-27 15:19 UTC (History)
7 users (show)

Fixed In Version: python-os-brick-2.10.3-0.20200605063443.55fc998.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-27 15:19:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1884114 0 None None None 2020-06-18 17:54:47 UTC
OpenStack gerrit 736758 0 None MERGED rbd: Warn if ceph udev rules are not configured 2020-08-19 18:37:25 UTC
Red Hat Product Errata RHBA-2020:3542 0 None None None 2020-08-27 15:19:32 UTC

Description Stephen Finucane 2020-06-18 15:40:43 UTC
Description of problem:

Nova now supports locally attaching ceph volumes using os-brick using a combination of the '[workarounds] disable_native_luksv1' (to disable native attachment using QEMU so os-brick is used instead) and '[workarounds] rbd_volume_local_attach' (to enabled local attachment) config options. This is broken on an OSP 16.1 deployment. This appears to be because the symlink os-brick expects at '/dev/rbd/{pool}/{device}' (which points to '/dev/rbdN') isn't being created. This should be created by udev rules that ceph provides [1]. Since udev isn't run within 'nova_compute' container, for these to function they must be present on the host. In OSP 13, this was the case, however, in OSP 16.1, it is not.

On an OSP 13 node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  /usr/lib/udev/rules.d/50-rbd.rules
  $ rpm -qf /usr/lib/udev/rules.d/50-rbd.rules
  ceph-common-12.2.12-115.el7cp.x86_64
  [heat-admin@compute-0 ~]$ sudo yum list 'ceph*' -q
  Installed Packages
  ceph-common.x86_64  2:12.2.12-115.el7cp  @rhelosp-ceph-3-mon
  ...

On an OSP 16.1 (beta) node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  ls: cannot access '/usr/lib/udev/rules.d/50-rbd.rules': No such file or directory
  [stack@undercloud-0 ~]$ sudo dnf list 'ceph*' --installed
  Installed Packages
  ceph-ansible.noarch

The absence of this file means the symlink is not created, and nova/os-brick raises an exception when trying to decrypt the non-existent path.

  2020-06-18 15:38:19.175 8 DEBUG os_brick.encryptors.luks [req-foo bar baz - default default] opening encrypted volume /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d _open_volume /usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py:109
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [req-foo bar baz - default default] [instance: foo] Failure attaching encryptor; rolling back volume connection: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  Exit code: 4
  Stdout: ''
  Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Traceback (most recent call last):
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1588, in _connect_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     self._attach_encryptor(context, connection_info, encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1733, in _attach_encryptor
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     encryptor.attach_volume(context, **encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 167, in attach_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     self._open_volume(passphrase, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 113, in _open_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     root_helper=self._root_helper)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     result = self.__execute(*args, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     return execute_root(*cmd, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     return self.channel.remote_call(name, args, kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]     raise exc_type(*result[2])
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Exit code: 4
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stdout: ''
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]

I see three possible solutions at the moment:

1. These udev rules should be present on the host. This could be as simple as installing the 'ceph-common' package, though that's pretty leaky.
2. The udev daemon should be present in the 'nova_compute' container.
3. os-brick should be enhanced to not require these symlinks.

Version-Release number of selected component (if applicable):

OSP 16.1 beta

How reproducible:

Always.

Steps to Reproduce:

1. Deploy OSP 16 with a ceph backend
2. Attempt to map a device locally. You can either follow the steps described at [2] to have nova do this by calling os-brick, or use the 'rbd map' command like so:

  rbd device map $DEVICE --pool volumes --id openstack --mon_host $CEPH_HOST:6789

3. Check to see if '/dev/rbd/volumes/$DEVICE' exists.

Actual results:

The '/dev/rbd/volumes/$DEVICE' symlink does not exist.

Expected results:

The '/dev/rbd/volumes/$DEVICE' symlink should exist.
Additional info:

[1] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1824120#c2

Comment 1 Stephen Finucane 2020-06-18 17:47:50 UTC
I have proposed a potential fix upstream based on option 3 above, which can be found here https://review.opendev.org/#/c/736758/2

Comment 2 Stephen Finucane 2020-06-18 17:48:43 UTC
Re-assigned to the os-brick component while we determine whether this os-brick-based solution is viable.

Comment 5 spower 2020-07-14 18:48:06 UTC
This issue has conditional approval for 16.1 Z1 release, it must be in the first compose and tested before release of 16.1.1. If not, we will move to TM=Z2.

Comment 10 errata-xmlrpc 2020-08-27 15:19:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542


Note You need to log in before you can comment on or make changes to this bug.