As discovered in BZ 1894668, a dcn site was deployed with the cinder_volume service and glance was configured to use the central and dcn site ceph cluster. The within the glance container it was possible to use the cephx keyring to make an RBD connection to central Ceph but within the cinder_volume container it was not. However, the cinder_volume container was able to make an RBD connection to the dcn ceph cluster.
Though we can set the permissions to 644 to workaround it, I think the correct fix is to ensure the cinder user/group own the central cephx key file the same way that user owns the dcn1 cephx keyring file. The glance user owns both in the glance container. [root@dcn1-computehci1-2 ~]# podman exec -ti glance_api ls -l /etc/ceph/*.openstack.keyring -rw-------. 1 glance glance 227 May 14 09:40 /etc/ceph/central.client.openstack.keyring -rw-------. 1 glance glance 201 May 14 09:34 /etc/ceph/dcn1.client.openstack.keyring [root@dcn1-computehci1-2 ~]# [root@dcn1-computehci1-2 ~]# podman exec -ti cinder_volume ls -l /etc/ceph/*.openstack.keyring -rw-r--r--. 1 167 167 227 May 14 09:40 /etc/ceph/central.client.openstack.keyring -rw-------. 1 cinder cinder 201 May 14 09:34 /etc/ceph/dcn1.client.openstack.keyring [root@dcn1-computehci1-2 ~]#
As John noted in comment #3, the issue is that permissions for cinder to access the ceph keyrings are only being applied to each site's primary cluster, and not the other sites. This means a dcn site can access its own keyring, but it's unable to access the central site's keyring (insufficient permission), and that causes cinder to not be able to migrate an edge volume to the central site. I very puzzled why I didn't encounter the issue when I tested offline volume migration Glance handles things by adding access to all ceph clusters associated with GlanceMultistoreConfig [1]. Starting in upstream Wallaby, nova does something similar [2] so that it can access glance images directly via the associated ceph cluster. Cinder does something similar in Wallaby when support was added for CinderRbdMultiConfig [3]. [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/glance/glance-api-container-puppet.yaml#L579-L590 [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/nova/nova-compute-container-puppet.yaml#L1301-L1312 [3] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cinder/cinder-common-container-puppet.yaml#L195-L206 This is a cinder THT issue, so I'm updating the BZ and grabbing it for me to solve. I'm not 100% sure of the best approach, because Train does not support CinderRbdMultiConfig, and it doesn't feel right for cinder to use data in GlanceMultistoreConfig.
The fix has merged on upstream stable/train.
After making sure the deployment follows the documentation (with `openstack overcloud export `, its output uses CephExternalMultiConfig as expected and and in the end the cinder_volume container on the dcn site can access the 'openstack' keyring of the central site. openstack-tripleo-heat-templates-11.6.1-2.20220116004910.el8ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenStack Platform 16.2 (openstack-tripleo-heat-templates) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0995