Description of problem: Bootstrap play of ceph cluster fails on "Ensure cephadm uses image tags instead of digests" task. The actual error output looks like following: FATAL | Ensure cephadm uses image tags instead of digests | controller-0 | error={"changed": false, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/var/lib/ceph/2696bddc-047e-4751-bdf2-259c510254f2/config/:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_central.conf:/home/assimilate_central.conf:z", "--entrypoint", "ceph", "rhos-qe-mirror-tlv.usersys.redhat.com:5002/rh-osbs/rhceph:6-115", "--fsid", "2696bddc-047e-4751-bdf2-259c510254f2", "-c", "/etc/ceph/central.conf", "-k", "/etc/ceph/central.client.admin.keyring", "config", "set", "mgr", "mgr/cephadm/use_repo_digest", "false"], "delta": "0:00:00.584120", "end": "2023-05-26 14:41:46.698484", "msg": "non-zero return code", "rc": 1, "start": "2023-05-26 14:41:46.114364", "stderr": "Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')", "stderr_lines": ["Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')"], "stdout": "", "stdout_lines": []} It comes from the following playbook: https://opendev.org/openstack/tripleo-ansible/src/branch/stable/wallaby/tripleo_ansible/roles/tripleo_cephadm/tasks/bootstrap.yaml#L78 The failing task has been added recently to the d/s build with this commit (currently latest compose - RHOS-17.1-RHEL-9-20230525.n.1): https://review.opendev.org/c/openstack/tripleo-ansible/+/883413 And that's the point It's started to fail. The task runs the following command right after the ceph cluster bootstrap, i.e.: sudo podman run --rm --net=host --ipc=host --volume /var/lib/ceph/2696bddc-047e-4751-bdf2-259c510254f2/config/:/etc/ceph:z --volume /home/ceph-admin/assimilate_central.conf:/home/assimilate_central.conf:z --entrypoint ceph rhos-qe-mirror-tlv.usersys.redhat.com:5002/rh-osbs/rhceph:6-115 --fsid c970470c-329b-5df0-a3d5-28f6e4b4ff98 -c /etc/ceph/central.conf -k /etc/ceph/central.client.admin.keyring config set mgr mgr/cephadm/use_repo_digest false And It fails on: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)') The content of directories with conf files: [cloud-admin@controller-0 ~]$ ll /etc/ceph/ -rw-------. 1 root root 63 May 26 14:40 central.client.admin.keyring -rw-r--r--. 1 root root 173 May 26 14:40 central.conf BUT cloud-admin@controller-0 ~]$ sudo ls -la /var/lib/ceph/2696bddc-047e-4751-bdf2-259c510254f2/config/ -rw-r--r--. 1 root root 63 May 26 14:41 ceph.client.admin.keyring -rw-r--r--. 1 root root 173 May 26 14:41 ceph.conf The content of the conf files is the same in the both directories but the files in /var/lib/ceph have names with default Ceph cluster name "ceph" while the right names should include "central". It's a DCN deployment with multiple ceph clusters with different names. Both of the locations with the conf files seem to be created during the ceph cluster bootstrap, At first (based on timestamps) the files in /etc/ceph and then in /var/lib/ceph. I have no idea why It creates the conf file with different names in two different locations. The command used in the failing task looks for /var/lib/ceph (since the commit: https://opendev.org/openstack/tripleo-ansible/commit/5e302b4ff7a4e211e885d9e5a298343ba15eab25). Once I copy the content of /etc/ceph to /var/lib/ceph/$FISD/config the task passes successfully. Version-Release number of selected component (if applicable): tripleo-ansible-3.3.1-1.20230518201531.358f3c3.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy Openstack with ceph cluster with not default ceph cluster name (which means not "ceph")
The bug can be VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577