Bug 1962304

Summary: cinder volume at DCN unable to read central cephx keyring
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-tripleo-heat-templatesAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: abishop, gcharot, gfidente, ltoscano, mburns, senrique, tshefi
Target Milestone: z2Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20220116004909.64b2e88.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-23 22:28:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1894668    

Description John Fulton 2021-05-19 17:29:18 UTC
As discovered in BZ 1894668, a dcn site was deployed with the cinder_volume service and glance was configured to use the central and dcn site ceph cluster.

The within the glance container it was possible to use the cephx keyring to make an RBD connection to central Ceph but within the cinder_volume container it was not. However, the cinder_volume container was able to make an RBD connection to the dcn ceph cluster.

Comment 2 John Fulton 2021-05-19 17:31:29 UTC
Though we can set the permissions to 644 to workaround it, I think the correct fix is to ensure the cinder user/group own the central cephx key file the same way that user owns the dcn1 cephx keyring file. The glance user owns both in the glance container.

[root@dcn1-computehci1-2 ~]# podman exec -ti glance_api ls -l  /etc/ceph/*.openstack.keyring
-rw-------. 1 glance glance 227 May 14 09:40 /etc/ceph/central.client.openstack.keyring
-rw-------. 1 glance glance 201 May 14 09:34 /etc/ceph/dcn1.client.openstack.keyring
[root@dcn1-computehci1-2 ~]# 

[root@dcn1-computehci1-2 ~]# podman exec -ti cinder_volume ls -l  /etc/ceph/*.openstack.keyring
-rw-r--r--. 1    167    167 227 May 14 09:40 /etc/ceph/central.client.openstack.keyring
-rw-------. 1 cinder cinder 201 May 14 09:34 /etc/ceph/dcn1.client.openstack.keyring
[root@dcn1-computehci1-2 ~]#

Comment 4 Alan Bishop 2021-05-19 18:37:49 UTC
As John noted in comment #3, the issue is that permissions for cinder to access the ceph keyrings are only being applied to each site's primary cluster, and not the other sites. This means a dcn site can access its own keyring, but it's unable to access the central site's keyring (insufficient permission), and that causes cinder to not be able to migrate an edge volume to the central site. I very puzzled why I didn't encounter the issue when I tested offline volume migration

Glance handles things by adding access to all ceph clusters associated with GlanceMultistoreConfig [1]. 
Starting in upstream Wallaby, nova does something similar [2] so that it can access glance images directly via the associated ceph cluster.
Cinder does something similar in Wallaby when support was added for CinderRbdMultiConfig [3].

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/glance/glance-api-container-puppet.yaml#L579-L590
[2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L1301-L1312
[3] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cinder/cinder-common-container-puppet.yaml#L195-L206

This is a cinder THT issue, so I'm updating the BZ and grabbing it for me to solve. I'm not 100% sure of the best approach, because Train does not support CinderRbdMultiConfig, and it doesn't feel right for cinder to use data in GlanceMultistoreConfig.

Comment 9 Alan Bishop 2021-06-18 14:00:15 UTC
The fix has merged on upstream stable/train.

Comment 19 Luigi Toscano 2022-03-01 10:37:06 UTC
After making sure the deployment follows the documentation (with `openstack overcloud export `, its output uses CephExternalMultiConfig as expected and and in the end the cinder_volume container on the dcn site can access the 'openstack' keyring of the central site.


openstack-tripleo-heat-templates-11.6.1-2.20220116004910.el8ost.noarch

Comment 25 errata-xmlrpc 2022-03-23 22:28:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0995